OpenAI Releases Privacy Filter as Open-Weight Apache 2.0 Model, Offering Local PII Detection With 97% F1 Accuracy

Overview

OpenAI on April 23, 2026 published a model it calls Privacy Filter under the Apache 2.0 license — an open-weight, locally runnable system for detecting and masking personally identifiable information in text. The release is notable on two counts: it is one of the few open-weight models OpenAI has published under a permissive license, and it is engineered from the ground up to run entirely on-device, meaning sensitive documents never leave the user’s hardware.

The company framed the release as infrastructure rather than product, stating it supports “a more resilient software ecosystem by providing developers with practical infrastructure for building with AI safely,” as noted by Help Net Security.

What We Know

Architecture and Parameters

Privacy Filter is a bidirectional token classification model built on a pre-norm transformer encoder-style stack with eight repeated transformer blocks. According to its Hugging Face model card, the architecture uses grouped-query attention with rotary positional embeddings — 14 query heads and 2 key-value heads — and a sparse mixture-of-experts feed-forward layer with 128 experts and top-4 routing per token. The output head produces 33 classification labels covering BIOES boundary tags for eight PII span classes.

Total parameter count is 1.5 billion, but only approximately 50 million are active at inference time due to the sparse MoE routing, as reported by Gigazine. That design allows the model to run on consumer hardware including laptops and, via Transformers.js with WebGPU quantization, directly in web browsers.

The model supports a context window of 128,000 tokens, enabling processing of long documents without chunking. Weights are available on Hugging Face at openai/privacy-filter and the codebase is published on GitHub.

PII Categories

Privacy Filter identifies eight span classes: private persons (names), private addresses, private email addresses, private phone numbers, private URLs, private dates, account numbers, and secrets — the last category covering passwords, API keys, and similar authentication credentials. The model predicts a probability distribution over these categories per token and uses constrained Viterbi decoding to refine span boundaries.

Benchmark Performance

On the PII-Masking-300k benchmark, the model achieves an F1 score of 97.43% (96.79% precision, 98.08% recall) on a corrected version of the dataset that accounts for annotation errors in the original, according to Gigazine. The uncorrected benchmark yields 96% F1 (94.04% precision, 98.04% recall), as detailed by Help Net Security.

On CredData, a benchmark focused on detecting authentication credentials, the model scores 84.4% F1. On the SPY dataset — designed around legal and medical text simulations — F1 improved from 54.5% to 96.2% with minimal domain-specific fine-tuning data. Japanese-language performance reached 88.1% F1, according to the Hugging Face model card.

Licensing

The Apache 2.0 license permits free commercial use, modification, and redistribution without royalties. This stands in contrast to OpenAI’s typical practice of keeping model weights proprietary. The open-weight release is paired with a Hugging Face demo space for interactive testing.

Motivation

OpenAI cited a specific behavioral pattern as motivation: users routinely paste documents containing personal information — contracts, medical records, internal communications — into AI assistants without considering where that data goes. Help Net Security reported that by releasing a model capable of local pre-screening, OpenAI is providing developers a tool to sanitize inputs before they reach any external endpoint, including OpenAI’s own APIs.

What We Don’t Know

Several questions remain unanswered at the time of publication. OpenAI has not disclosed training data composition in detail beyond describing it as a mix of public and synthetic corpora, which limits independent assessment of whether specific domains or languages are underrepresented.

The model’s label taxonomy is fixed by design: policy changes — for instance, adding a new PII category — require fine-tuning, and OpenAI has not released the fine-tuning recipe or training infrastructure. Whether the model can be reliably adapted for non-English-primary workflows at production scale is unclear, given that the published benchmarks skew heavily toward English text.

OpenAI explicitly cautions that Privacy Filter is not a legal compliance tool and should not be treated as a guarantee of anonymization, as noted in the Hugging Face model card. What specific audit or certification processes, if any, organizations in regulated industries — healthcare, finance, legal — would need to supplement local deployment with is not addressed in the release documentation.

Finally, OpenAI has not indicated whether Privacy Filter represents a shift toward broader open-weight releases or is a one-off publication for infrastructure-class utilities. The company continues to keep its frontier generative models proprietary.