How Your Data Gets Used to Train AI—and Why It’s Becoming a Legal Battleground
For years, privacy compliance largely meant making sure companies told you what data they collected and gave you a way to opt out of selling it. That world is changing. A growing number of legal actions and new state laws are shifting the focus from simple notice-and-choice to something broader: how companies acquire and use data to train artificial intelligence models. If you’ve used a chatbot, voice assistant, or image generator, your data may be part of that training—often without your explicit permission. Here’s what’s happening and what you can do about it.
What Happened
In the past year, regulators and courts in several states have started treating the source of AI training data as a central legal risk. The Colorado AI Act, passed in 2024 and taking effect in 2026, requires companies to disclose when they use AI in ways that could lead to discriminatory outcomes and to assess the data used to train those systems. Connecticut’s data privacy law, meanwhile, has been interpreted to give consumers the right to delete data used for AI training, and California has introduced bills that would force companies to label AI-generated content and reveal what data was used to create it.
At the same time, lawsuits are piling up. Artists, writers, and even voice actors have filed class actions alleging that their copyrighted or likeness-protected work was scraped without consent to train commercial AI models. Healthcare transcription tools have drawn particular scrutiny because they handle sensitive patient data; a National Law Review article notes that in-house counsel are being warned to verify that patient data isn’t being used to train the AI without proper authorization.
The upshot is clear: companies can no longer assume that public data on the internet is free for the taking. The legal risks are real, and they extend well beyond traditional privacy compliance.
Why It Matters for You
If you’ve ever posted a photo online, written a public review, or spoken into a smart speaker, there’s a chance that data has been ingested into an AI training set. Most companies do not ask for explicit consent—they rely on broad terms of service or argue that the data is publicly available. The new legal landscape makes that argument harder to sustain.
For consumers, this means your personal data—images, text, voice recordings—could be used without your knowledge to create AI systems that companies then sell or use to automate decisions that affect you. For example, an AI trained on your voice might be used to impersonate you in a scam. An image of your face could end up in a facial recognition database you never agreed to.
On the positive side, the legal push also gives you more power. If companies must prove they have a lawful basis to use your data for training, you have more grounds to demand deletion or compensation.
What You Can Do Right Now
You don’t need to be a lawyer to take practical steps to protect your data:
Check the privacy settings on the AI services you use. Many platforms (e.g., ChatGPT, Google, Meta) now have toggles that let you opt out of training. Look for settings labeled “Improve the model” or “Use my data for training.” Turn them off if you don’t want your conversations included.
Review the terms of service when you sign up for a new AI tool. Look for phrases like “usage data,” “aggregate data,” or “improve our services.” If the terms say your data will be used for training, consider whether you’re comfortable with that.
Limit what you share publicly. Assume that anything you post online—text, images, audio—could be scraped. Set social media accounts to private if possible, and watermark or blur images that you don’t want reused.
Exercise your deletion rights under state privacy laws. If you live in California, Colorado, Connecticut, or other states with broad privacy laws, you can submit a request to companies asking them to delete your data used for AI training. Many companies now have data subject request portals on their websites.
Stay informed about class actions. If you discover your copyrighted work or personal data was used without permission, look for class-action settlements. You may be eligible for compensation, even if you didn’t sign up.
What to Watch For
The field of AI governance is moving fast. Bills in Congress and additional state laws are likely. The trend is toward requiring companies to be transparent about their training data, conduct impact assessments, and obtain explicit consent for certain uses. For now, the best defense is a combination of awareness and action: know where your data goes and use the legal tools available to you.
Sources
- National Law Review, “From Privacy Compliance to AI Governance: Why the Source of Training Data Is Becoming a Central Legal Risk” (June 2026)
- Kelley Drye & Warren LLP, “AI Regulatory Roundup: Recent Developments in Colorado, Connecticut, and California” (May 2026)
- National Law Review, “AI Transcription Tools in Health Care: What In-House Counsel Needs to Get Right” (May 2026)