OpenAI’s New Privacy Filter: How to Use It to Protect Your Personal Data

If you’ve ever pasted a draft email, a report, or even a note into ChatGPT and later worried about what personal details might be sitting on OpenAI’s servers, you’re not alone. Many people are cautious about sharing names, email addresses, or phone numbers with AI tools, even when they want to use them for work or personal projects.

In late April 2026, OpenAI released an open-source privacy filter that addresses this exact concern. It’s a small, free tool that scans documents and masks personally identifiable information (PII) before you send them to any AI model—including ChatGPT. Here’s what it does, how it works, and how you can use it today.


What Happened

On April 22, 2026, OpenAI published a blog post announcing the OpenAI Privacy Filter (the official name in the release is simply that). The tool is available on GitHub under an MIT license, meaning anyone can download, use, and modify it without restrictions.

The filter is a Python-based script that detects common types of PII: full names, email addresses, phone numbers, physical addresses, and a few other patterns. When you run it on a text file, it replaces each piece of detected personal data with a placeholder like [NAME] or [EMAIL]. The cleaned version can then be used as input to any AI service.

OpenAI stated that the filter is meant to be a “first line of defense” for individuals and organizations who want to reduce the risk of exposing private information during AI interactions. It is not integrated into ChatGPT by default—you run it locally on your own machine before you submit text.


Why It Matters

The privacy filter matters because it gives users more control. Even though OpenAI and other providers have policies about data handling, the safest approach is to never send sensitive information in the first place. A tool that runs on your own computer means no data leaves your device until you are satisfied it has been redacted.

For everyday users, this is particularly useful in a few common situations:

  • Pasting customer support transcripts into ChatGPT to summarize or analyze—without including customer names or contact details.
  • Sharing draft documents or resumes with an AI writing assistant while keeping your personal information private.
  • Using AI for legal or medical notes where you want to anonymize patient or client data.

The fact that the filter is open source also means security researchers and developers can inspect exactly how it works. There are no opaque black boxes.


What You Can Do

Using the filter requires some technical comfort, but the steps are straightforward. Here’s a practical guide.

1. Install Python and the required libraries
The filter runs on Python 3.8 or newer. You’ll also need the presidio_analyzer and presidio_anonymizer components from Microsoft’s Presidio project, plus a language model for Named Entity Recognition (NER). OpenAI’s script uses a small, fast model called en_core_web_sm from spaCy.

Open a terminal and run:

pip install presidio_analyzer presidio_anonymizer spacy
python -m spacy download en_core_web_sm

2. Download the filter script
Go to the official GitHub repository (linked in OpenAI’s announcement) and clone or download privacy_filter.py. The file is about 300 lines.

3. Prepare your document
Save the text you want to clean as a plain .txt file. For example, mydoc.txt.

4. Run the filter
In the terminal, navigate to the folder with the script and your document. Run:

python privacy_filter.py mydoc.txt

The script outputs the redacted version to your terminal, or you can redirect it to a new file:

python privacy_filter.py mydoc.txt > cleaned.txt

5. Use the cleaned text with any AI tool
Copy the output from cleaned.txt and paste it into ChatGPT, Claude, or any other AI service.

What it can redact (by default)

  • Full names (e.g., “John Smith” → [PERSON])
  • Email addresses (e.g., “[email protected]” → [EMAIL_ADDRESS])
  • Phone numbers (including international formats) → [PHONE_NUMBER]
  • Physical addresses (street, city, zip) → [LOCATION]
  • Credit card numbers (Amex, Visa, etc.) → [CREDIT_CARD]

You can customize the pattern list in the script if needed.

Limitations to keep in mind
No tool is perfect. The filter relies on pattern matching and NER, so:

  • It might miss unusual names or misspellings.
  • It can false‑positive on non‑personal text (for example, “Washington” as a location when it’s a surname).
  • It does not understand context deeply—if you write “my boss’s email is [email protected]”, it will catch the email but not the phrase “my boss” which could also be revealing.
  • It only works on plain text input, not on PDFs or images (though you can convert those to text first).

The filter is a useful layer, not a guarantee. For sensitive data, combine it with other practices: use disposable or alias email accounts, avoid uploading full datasets, and review the output manually.

How it compares to other tools

  • Microsoft Presidio (the underlying engine) is more powerful and configurable, but requires more setup. OpenAI’s filter wraps it in a simple script.
  • AWS Comprehend can detect PII but runs in the cloud—you’d have to send your data to AWS. OpenAI’s filter is entirely local.
  • Dedicated redaction tools like Conceal or Dedoose offer more features but are not free or not focused on AI workflows.

For most individuals, OpenAI’s filter is the easiest entry point.


Sources

  • OpenAI official announcement: Introducing OpenAI Privacy Filter (April 22, 2026)
  • OpenAI GitHub repository (link in announcement)
  • Microsoft Presidio documentation for background on the anonymization engine
  • spaCy documentation for the NER model