Your Data Might Be Training AI Without Your Knowledge – Here’s What to Do
Every time you use a generative AI tool like ChatGPT, you might be sharing more than you think. These models are trained on enormous datasets scraped from the internet, and sometimes that data includes personal information you never explicitly consented to share. Now, regulators in several states and countries are starting to ask where that data came from—and whether companies have a legal right to use it.
What happened?
Generative AI models require vast amounts of text, images, and other content to learn patterns. Much of that material comes from public websites, social media platforms, forums, and even private databases that were leaked or reused without authorization. A growing number of lawsuits and regulatory actions are challenging this practice.
Recent developments include:
- Illinois postponed proposed regulations on AI in employment, but the state’s earlier biometric privacy law already set a precedent for consent requirements around data used in AI training.
- Colorado, Connecticut, and California are all developing comprehensive AI regulatory frameworks that address transparency in training data sources.
- The Global Privacy Watchlist now includes AI data governance as one of the top concerns for 2026, according to Mayer Brown.
- In the United Kingdom, regulators are building a “regulatory tracker” that specifically flags whether companies disclose how training data is collected and used.
These actions reflect a growing consensus: the source of training data isn’t just a technical issue—it’s a legal and ethical one.
Why it matters for you
If you’ve ever posted a comment, uploaded a photo, or written a review online, your words and images could be part of an AI training set. Most companies do not ask for permission before using public data, and privacy policies are often vague about whether your data will be fed into a model.
The practical consequences:
- Your personal information (name, location, occupation) may appear in AI-generated outputs, sometimes inaccurately.
- Companies may train models on data that includes private conversations from message boards or password-protected communities, if that data was improperly obtained.
- Without clear legal requirements, you have limited recourse if your data is used in ways you didn’t anticipate.
The uncertainty here is real: many AI companies argue that publicly available data is fair game, while privacy advocates and regulators increasingly disagree. The outcome of these debates will shape what protections consumers actually have.
What readers can do
You don’t have to wait for regulations to take effect. Here are concrete steps to reduce your risk:
Check privacy policies before signing up for AI tools. Look for language about how training data is collected and whether your inputs are used to improve the model. Some services let you opt out.
Use opt-out options where available. For example, ChatGPT and Claude allow users to disable training on their conversations in the settings menu. GitHub and some other platforms also offer data-use controls.
Limit what you share publicly. Be cautious about posting personal details on public forums, social media, or review sites if you wouldn’t want them used in a training dataset.
Choose AI tools that are transparent. Services that publish clear data-use practices and commit to not training on user input without consent are generally safer from a privacy standpoint.
Consider using privacy-focused alternatives like local AI models (e.g., running Llama or Mistral on your own device), which don’t send your data to a third party.
Looking ahead
The legal landscape is shifting quickly. If new regulations require companies to disclose their data sources and obtain consent, consumers will gain more control. But until then, the burden is on individual users to protect themselves.
A good rule of thumb: assume anything you post online could end up in an AI training set. Adjust your behavior accordingly, and push for clearer rules by supporting consumer privacy advocacy groups.
Sources
- National Law Review: “Illinois Postpones Proposed Regulations on AI in Employment” (June 2026)
- Kelley Drye & Warren LLP: “AI Regulatory Roundup: Recent Developments in Colorado, Connecticut, and California” (May 2026)
- Mayer Brown: “Global Privacy Watchlist” (January 2026)
- White & Case LLP: “AI Watch: Global Regulatory Tracker – United Kingdom” (November 2025)