OpenAI’s New Privacy Filter Automatically Masks Your Personal Data in Documents
If you’ve ever pasted a document full of customer names, email addresses, or phone numbers into ChatGPT, you’ve probably wondered: Is this information safe? A new open-source tool from OpenAI aims to reduce that worry.
On April 22, 2026, OpenAI released the OpenAI Privacy Filter – a machine‑learning model that scans text for personally identifiable information (PII) and replaces it with placeholders. The tool is freely available on GitHub under an open‑source license, so anyone can use it or adapt it to their own workflow.
What happened
The Privacy Filter is a small, purpose‑built model that detects common types of PII: names, email addresses, phone numbers, physical addresses, and other identifiers. When you run a document through it, the filter outputs a version of the text where sensitive items are masked – for example, [NAME] instead of “Jane Doe” or [EMAIL] instead of “[email protected]”.
The model was trained by OpenAI and is designed to work as a preprocessing step before sending content to large language models (LLMs) or storing it in logs. Because it’s open source, you can inspect the code, run it locally, and even fine‑tune it on your own data.
Coverage from MSN and GIGAZINE (among others) confirmed that the filter handles the most common categories of PII, though like any automated system it isn’t perfect. It’s a balance between catching everything and avoiding false positives.
Why it matters
Everyday users of AI tools and professionals who handle sensitive documents have a growing need to protect personal data. When you paste a customer list or a medical record into an AI chatbot, that data may travel to remote servers, be stored, or be used for training. Even if you trust the provider’s privacy policy, mistakes happen.
The Privacy Filter gives you a simple, free layer of defence. You can strip out identifiable information before the document ever leaves your computer. That’s valuable for:
- Privacy‑conscious individuals who want to experiment with AI without exposing their own or their family’s details.
- Small business owners and freelancers who handle client data and can’t afford a full enterprise privacy solution.
- Developers building AI‑powered apps that need to anonymise user‑submitted text.
It also sets an interesting precedent: a major AI company releasing a privacy tool as open source. This makes it easier for other developers to build on, audit, and trust the tool’s behaviour.
What readers can do
You can download the OpenAI Privacy Filter from its GitHub repository (search for “openai‑privacy‑filter”). The repository includes the model weights, usage instructions, and a Python API. Basic usage looks like:
- Install the package via pip.
- Load the model with a few lines of code.
- Pass your text or document to the model – it returns the masked version.
If you’re not a developer, you can still benefit indirectly. Some third‑party apps and browser extensions are beginning to integrate the filter. Keep an eye out for tools that advertise “PII masking before sending to LLMs” – they may be using this very model.
However, keep a few limitations in mind:
- No tool is perfect. The filter may miss less common identifiers (e.g., passport numbers in certain formats) or incorrectly flag everyday words as PII. Always check the output, especially for critical use cases.
- It doesn’t encrypt or delete anything. It only masks text. If you need stronger guarantees, combine it with other measures like encryption at rest and data minimisation.
- It’s designed for text, not images. Photos or scanned documents containing names and numbers need a different approach (OCR + filtering).
For a comparison, existing tools like Microsoft Presidio and Azure’s Purview offer similar capabilities but are heavier and often require cloud subscriptions. The OpenAI Privacy Filter is lighter and free, but Presidio has a wider set of pre‑built recognisers. Choose based on your technical comfort and the scale of your data.
Sources
- OpenAI’s official announcement: “Introducing OpenAI Privacy Filter” (April 22, 2026)
- MSN: “OpenAI introduces privacy filter model” (April 26, 2026)
- GIGAZINE: “OpenAI has released ‘OpenAI Privacy Filter’ as open source…” (April 23, 2026)
Always check the GitHub repository for the latest documentation and model updates.