← All news
Press · June 10, 2026 · 9 min read

Atlassian Will Train Its AI on Your Confluence Data. What Enterprise Leaders Must Decide Before August 17.

Atlassian Will Train Its AI on Your Confluence Data. What Enterprise Leaders Must Decide Before August 17.

Atlassian turns on Confluence and Jira AI training by default on August 17. The CIO reflex isn't where to click — it's auditing the corpus state first.

On April 16, 2026, Atlassian began rolling out a data collection policy that most enterprise IT organizations had not anticipated. Starting August 17, 2026, content from Confluence and Jira across all Atlassian Cloud customers will be used by default to train Rovo — Atlassian’s AI assistant — and the broader suite of Atlassian AI products. The policy affects approximately 300,000 organizations (Atlassian Trust Center, April 2026).

Most coverage since the announcement has focused on the mechanics: where to find the toggle, what each subscription tier permits. That guidance is necessary and available. But there is an upstream question that I rarely hear in the enterprise IT conversations I participate in — and it strikes me as the more consequential one: what is the actual state of your Confluence corpus? Because whether you allow Atlassian to use it or choose to opt out, that question deserves an answer before you treat this as a box to check.


What Atlassian Collects — and for How Long

The policy distinguishes between two categories of data, with different opt-out rules per subscription tier (Groundy, April 19, 2026; Atlassian Support).

Metadata includes readability scores, task classifications, semantic similarity scores, story points, sprint end dates, and SLA values. These are signals derived from your content, not the content itself. The fact that metadata is “de-identified” does not make it strategically neutral: semantic similarity scores aggregated across hundreds of Jira tickets can reconstruct a product portfolio structure or reveal a delivery roadmap to a well-resourced observer.

In-app data includes Confluence page titles and body text, Jira work item titles, descriptions, and comments, along with custom status names and workflow configurations. This is your operational documentation, your internal knowledge base, your formalized processes.

Maximum retention is seven years. On opt-out, in-app data is removed within thirty days and metadata within ninety days — but model weights already trained on your content before the opt-out are not retroactively erased.


The Tier Matrix: What Your Plan Actually Permits

Your Atlassian subscription tier determines what you can control (Atlassian Support):

PlanMetadata opt-outIn-app data defaultIn-app opt-out
FreeNot availableOnAvailable
StandardNot availableOnAvailable
PremiumNot availableOffAvailable
EnterpriseAvailableOffAvailable

Free and Standard customers face the most exposure: metadata collection is mandatory with no opt-out, and in-app data is enabled by default. The only lever available is disabling in-app data collection via admin.atlassian.com > Security > Data contribution — a step worth taking now, regardless of how you ultimately decide on the broader policy.

Enterprise customers have the most complete control: both categories are disabled by default, and full opt-out of both is accessible. This is the only tier where your document governance posture remains entirely your own.

One frequently misunderstood point: this is an organization-level control. Individual users cannot opt themselves out. The setting must be applied by an Atlassian administrator from the admin console, and repeated for each organization you manage.


The Question Most IT Leaders Haven’t Had Time to Ask

Here is what most of the media coverage has left unaddressed: if you allow Atlassian to train its models on your Confluence corpus, do you know what state that corpus is actually in?

This is not a rhetorical question. A study published in June 2026 by Sinequa, drawing on 740 executives at companies between $1B and $20B in revenue, found that 38.4% of organizations report that data in their AI production pipelines is not being updated (Sinequa, “Beyond the Hype: The Reality of Enterprise Agentic AI in 2026,” June 2, 2026). In the document corpus assessments we conduct at K-AI on client knowledge repositories, we consistently encounter anomalies of similar types: unarchived obsolete pages, duplicate process documentation with divergent versions, and contradictory instructions on operationally critical subjects.

Sharing such a corpus for model training means contributing its flaws alongside its value. A model trained on documents that contradict one another does not learn to resolve ambiguity — it learns to reproduce it at scale. The larger the corpus, the more the contradictory signals reinforce one another in the model’s parameters.

The decision CIOs and DPOs face before August 17 is therefore not merely a configuration decision. It is a document governance decision, and it warrants a prior assessment of the asset about to be shared.


Three Decisions to Make Before August 17

1. Check the toggle within the next few days

Regardless of your final decision, start at admin.atlassian.com > Security > Data contribution. The settings UI finished rolling out across all organizations by May 19. If you are on Free or Standard, disable in-app data collection as a precautionary measure while you complete your assessment. On Premium, confirm the in-app data toggle is in its default (disabled) state. On Enterprise, verify both.

2. Audit your corpus before deciding

The Atlassian policy creates an urgency that most organizations have not had before: a reason, with a fixed deadline, to evaluate the actual state of their internal documentation. Which Confluence spaces are actively maintained? Which contain outdated or conflicting information? Which repositories are already feeding AI assistants, Copilot pilots, or RAG systems? Without that mapping, neither the risk of sharing nor a sound governance decision is accessible.

3. Distinguish between a KM tool and a document governance layer

The Atlassian policy is not an isolated incident. It reflects a structural trend: KM tool vendors are embedding generative AI natively and funding their models with customer data. GitHub made an analogous move with Copilot (GitLab, “Atlassian will train on your data: Opt out with GitLab,” May 4, 2026). The question for enterprise IT leadership is no longer only “which KM tool are we using?” It is: “what is our document governance strategy in an environment where your data may train models you do not control?”

This is the distinction between a KM tool and a Document Knowledge Platform. A DKP does not train on your data. It audits, cleans, and monitors your corpus on your behalf — without ever sharing it with third-party training pipelines. At K-AI, client data is never used to train external models. “Start Clean, Stay Clean” applies to data sovereignty as much as it does to document quality.


Rovo, Document Quality, and What KM Tools Cannot Do for You

Rovo performs as well as the Confluence corpus it queries. This is true of every tool in this category. Glean, which surpassed $300M ARR in May 2026 on the argument that a well-governed enterprise context graph reduces AI token costs, faces the same constraint: an uncleaned corpus is more expensive to query, produces more factual errors, and degrades the performance of any AI assistant built on top of it (TechCrunch, May 28, 2026). Enterprise search and KM tools index. They do not clean.

This is not a critique. It is a description of scope — and it is precisely this scope that the Atlassian data contribution policy makes visible, however inadvertently.

For a detailed comparison of KM tools, document management systems, and Document Knowledge Platforms, see our DKP definition and selection guide published earlier this month.


Frequently Asked Questions

What exactly will Atlassian collect from Confluence and Jira starting August 17, 2026?

Atlassian collects two categories. Metadata includes readability scores, task classifications, story points, sprint end dates, SLA values, and semantic similarity scores from the Teamwork Graph. In-app data includes Confluence page titles and body text, Jira work item titles, descriptions, and comments, plus custom status names and workflow names. Data is de-identified and aggregated before model training. Maximum retention is seven years; in-app data is removed within 30 days of opt-out, metadata within 90 days. Primary source: Atlassian Trust Center.

How do I evaluate whether Confluence AI (Rovo) meets my Knowledge Management needs or whether I need a Document Knowledge Platform?

KM tools like Rovo operate on the corpus as it exists: they index, search, and generate responses. A Document Knowledge Platform operates upstream: it audits corpus quality, detects anomalies (duplicates, contradictions, obsolescence, coverage gaps), remediates, and monitors continuously. If your need is to query existing documentation, a KM tool may be sufficient. If your need is to ensure AI assistant responses are reliable, traceable, and compliant, you need a DKP layer upstream. The two are complementary, not substitutes.

Can an enterprise CIO prevent Atlassian from using Confluence data to train its AI?

Partially, depending on tier. On Free and Standard, metadata collection is mandatory and cannot be disabled; only in-app data can be turned off. On Premium, in-app data is disabled by default and metadata cannot be refused. On Enterprise, both categories can be refused and are disabled by default. The setting is at admin.atlassian.com > Security > Data contribution, accessible to organization administrators only.

Are Confluence data practices still GDPR-compliant under the new Atlassian training policy?

Atlassian maintains that its GDPR obligations are unchanged under this policy. However, for EU-regulated organizations, the purpose limitation principle (GDPR Article 5(1)(b)) may mean that using customer data for AI training constitutes a new processing purpose requiring a compatibility analysis or updated privacy notices. Organizations with existing Data Processing Agreements with Atlassian should have legal counsel or their DPO assess whether the policy change triggers re-notification obligations before August 17. This is an obligation to analyze, not a predetermined conclusion.

What is the difference between a Knowledge Management tool like Confluence and a Document Knowledge Platform (DKP)?

A KM tool such as Confluence, Notion, or SharePoint enables teams to create, organize, and share documentation. A Document Knowledge Platform audits the quality of that documentation for AI: it detects conflicts between documents, divergent duplicates, outdated content, and coverage gaps, then monitors the corpus continuously to keep it AI-ready. A DKP is an infrastructure layer upstream of KM tools — it does not replace them; it conditions their reliability for AI use cases. For a full definition, see our DKP guide.


Further Reading

The August 17 deadline creates a short decision window. If you would like to assess the actual state of your document corpus before making your decision, our team offers an initial diagnostic. Reach us at contact@k-ai.ai.

K-AI works with CMA CGM, Veolia, PwC, BNP Paribas, TotalEnergies, and CEVA Logistics. Partners: AWS, Snowflake, Microsoft, Wavestone, Devoteam.


Sources Cited

  1. Atlassian, “Data practices built for responsible AI” (Trust Center), April 2026 — https://www.atlassian.com/trust/ai/data-contribution
  2. Atlassian Support, “Data contribution settings,” May 2026 — https://support.atlassian.com/security-and-access-policies/docs/data-contribution-settings/
  3. Groundy Editorial, “Atlassian Turned On AI Training Data Collection by Default: Here’s What to Disable,” April 19, 2026 — https://groundy.com/articles/atlassian-turned-on-ai-training-data-collection-by-default-heres-what-to-disable/
  4. GitLab, “Atlassian will train on your data: Opt out with GitLab,” May 4, 2026 — https://about.gitlab.com/blog/atlassian-will-train-on-your-data-opt-out-with-gitlab/
  5. Sinequa, “Beyond the Hype: The Reality of Enterprise Agentic AI in 2026,” June 2, 2026 — https://www.sinequa.com/resources/blog/beyond-the-hype-the-reality-of-enterprise-agentic-ai-in-2026/
  6. TechCrunch, “Glean’s top line crosses $300M as AI budget cutting becomes its major selling point,” May 28, 2026 — https://techcrunch.com/2026/05/28/gleans-top-line-crosses-300m-as-ai-budget-cutting-becomes-its-major-selling-point/

And in your organization, what does your document estate look like?

30 minutes with a founder. We audit a sample of your documents for free and show you exactly what K-AI detects.

Book a demo → Read other articles