Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
3803af2
Add opt-in local PII redaction before prompts leave the proxy
claude Jun 12, 2026
90f2553
PII redaction: use Presidio as the single engine, drop handrolled regex
claude Jun 12, 2026
ac2f6f6
PII redaction: drop date/DOB scrubbing and the --pii-all-dates option
claude Jun 12, 2026
9d5def8
PII redaction: make scrubbing configurable per request
claude Jun 12, 2026
79bf83b
Revert "PII redaction: make scrubbing configurable per request"
claude Jun 12, 2026
421d2b5
Potential fix for pull request finding
adambalogh Jun 12, 2026
dfd191c
Potential fix for pull request finding
adambalogh Jun 12, 2026
1ae5b4b
Potential fix for pull request finding
adambalogh Jun 12, 2026
76fe0d7
Potential fix for pull request finding
adambalogh Jun 12, 2026
6da6ada
Potential fix for pull request finding
adambalogh Jun 12, 2026
90bde6d
Potential fix for pull request finding
adambalogh Jun 12, 2026
672988b
Add scripts/pii_repl.py for interactive PII-redaction testing
claude Jun 12, 2026
8677036
Fix install-pii Makefile target dropped by autofix
claude Jun 12, 2026
c7f269a
PII redaction: prioritize identity de-identification (names, contact,…
claude Jun 12, 2026
55a11f8
PII redaction: redact only hard identifiers, keep names and free-form…
claude Jun 12, 2026
f23c4b3
PII redaction: drop the spaCy model, make install a single pip install
claude Jun 13, 2026
684fd9a
README: tighten the PII redaction section
claude Jun 13, 2026
81d0e0f
Potential fix for pull request finding
adambalogh Jun 13, 2026
e94c675
Potential fix for pull request finding
adambalogh Jun 13, 2026
6137618
Potential fix for pull request finding
adambalogh Jun 13, 2026
94dc16e
Potential fix for pull request finding
adambalogh Jun 13, 2026
7e41eb2
Potential fix for pull request finding
adambalogh Jun 13, 2026
bea3617
Update server.py
adambalogh Jun 13, 2026
ece103d
README: install the PII extra with uv to match the quickstart
claude Jun 13, 2026
9fd6a83
Reference uv for the PII extra install in error hint, docstring, and …
claude Jun 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ jobs:
- name: Set up Python
run: uv python install 3.11

- name: Install dependencies
run: uv sync --all-groups
- name: Install dependencies (incl. PII extra)
run: uv sync --all-groups --extra pii

- name: Test
run: make test
6 changes: 5 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
.PHONY: install build publish check test serve
.PHONY: install install-pii build publish check test serve

install:
uv sync --all-groups

# Dev install including the optional PII-redaction extra (Presidio). No model download.
install-pii:
uv sync --all-groups --extra pii

build:
uv build

Expand Down
23 changes: 23 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,29 @@ Session + prefs live in `~/.opengradient/local/` (override with `OG_VEIL_HOME`).
| `OG_VEIL_TEE_ID` | `--tee-id` | — | Pin a specific registry TEE. |
| `OG_VEIL_EXPECTED_PCR_HASH` | `--expected-pcr` | — | Refuse any TEE whose `pcrHash` differs. |
| `OG_VEIL_APP_URL` | `--app-url` | `https://chat.opengradient.ai` | Chat app origin for login. |
| `OG_VEIL_PII_SCRUB` | `--pii-scrub` | off | Redact high-impact PII from prompts locally before they leave the machine. |

### Local PII redaction (opt-in)

OHTTP unlinks *who you are* from *what you ask* — but only if the prompt itself
doesn't name you. With `--pii-scrub` on, concrete identifiers are replaced with
`[REDACTED_*]` tags locally *before* the prompt is encrypted, so they never leave
your machine. Install the optional extra (one step — no model download) and turn
it on:

```sh
uv tool install 'opengradient-veil[pii]' # or: pipx install 'opengradient-veil[pii]'
og-veil --pii-scrub # or: export OG_VEIL_PII_SCRUB=1
```

Redacts **email, phone, US SSN, credit cards, IBANs, US bank numbers, and street
addresses** via [Microsoft Presidio](https://github.com/microsoft/presidio)'s
pattern/checksum recognizers. **Names, cities/countries, and dates are left in** —
detecting them needs statistical NER that over-redacts the third-party names real
prompts are full of and mislabels uncommon ones. So this is a backstop for the
hard data, not a substitute for your own discretion. Redaction is irreversible
(the TEE's signed `output_hash` covers exactly what it ran); if the extra isn't
installed, the server refuses to start rather than send PII.

## Notes & limitations

Expand Down
14 changes: 14 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,20 @@ dependencies = [
"requests>=2.32.0",
]

[project.optional-dependencies]
# Enables local PII redaction. Detection uses Microsoft Presidio's pattern/
# checksum recognizers only (no NER), so there's no spaCy model to download —
# `uv tool install 'opengradient-veil[pii]'` is the whole install.
pii = [
"presidio-analyzer>=2.2.0",
"presidio-anonymizer>=2.2.0",
# presidio-analyzer pulls spaCy transitively; pin to stable 3.8.x because the
# repo allows prereleases (for an SDK dep), which otherwise resolves a spaCy
# 4.0 dev build whose compiled extensions break against numpy 2.x. (No spaCy
# *model* is needed — we never load one.)
"spacy>=3.8.0,<4.0.0",
]

[project.urls]
Homepage = "https://opengradient.ai"
Repository = "https://github.com/OpenGradient/veil"
Expand Down
39 changes: 39 additions & 0 deletions scripts/pii_repl.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
"""Scratch helper: interactively try the PII redactor on your own text.

uv run python scripts/pii_repl.py # interactive: type a line, see it scrubbed
echo "email me at a@b.com" | uv run python scripts/pii_repl.py # or pipe input

Local-only — no login, no TEE, no network. Just the same Redactor the proxy uses.
Needs the extra: make install-pii (or: uv sync --extra pii)
"""
Comment thread
Copilot marked this conversation as resolved.

from __future__ import annotations

import sys

from veil.pii import PiiSetupError, build_redactor


def main() -> None:
try:
redactor = build_redactor(enabled=True)
except PiiSetupError as exc:
sys.exit(f"PII redaction unavailable: {exc}")
assert redactor is not None

if not sys.stdin.isatty(): # piped input: scrub each line and exit
for line in sys.stdin:
print(redactor.scrub_text(line.rstrip("\n")))
return

print("PII redactor — type text and press Enter (Ctrl-D / Ctrl-C to quit).\n")
try:
while True:
text = input("> ")
print(redactor.scrub_text(text))
except (EOFError, KeyboardInterrupt):
print()


if __name__ == "__main__":
main()
131 changes: 131 additions & 0 deletions tests/test_pii.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
"""PII redaction tests.

Detection is delegated to Presidio (the optional [pii] extra; it pulls in spaCy as a dependency),
so these tests skip when the extra isn't installed. ``build_redactor(enabled=False)``
is checked unconditionally.
"""

from __future__ import annotations

import pytest

from veil.pii import (
ADDRESS_TAG,
BANK_TAG,
EMAIL_TAG,
PHONE_TAG,
SSN_TAG,
PiiSetupError,
build_redactor,
)


def test_build_redactor_disabled_returns_none():
# Works without the extra: disabled means no engine is constructed at all.
assert build_redactor(enabled=False) is None


def test_tags_are_distinct():
assert len({EMAIL_TAG, PHONE_TAG, SSN_TAG, BANK_TAG, ADDRESS_TAG}) == 5


# --- everything below needs the [pii] extra -------------------------------

pytest.importorskip("presidio_analyzer", reason="requires the [pii] extra")


def _redactor(**kw):
try:
return build_redactor(enabled=True, **kw)
except PiiSetupError as exc: # presidio present but model missing
pytest.skip(str(exc))


@pytest.fixture(scope="module")
def R():
return _redactor()


def test_names_are_not_redacted(R):
# Names are deliberately left in — they over-redact third parties and spaCy
# mislabels uncommon names. User discretion covers names.
out = R.scrub_text("Reply to Advait about our contractor Julia Smith.")
assert "Advait" in out and "Julia Smith" in out


def test_free_form_location_not_redacted(R):
# Cities/countries are not redacted (only deterministic street lines are).
out = R.scrub_text("I live in San Francisco")
assert "San Francisco" in out


def test_phone_redacted(R):
out = R.scrub_text("call me at +1 (415) 555-0142 tomorrow")
assert "555-0142" not in out and PHONE_TAG in out


def test_street_address_redacted(R):
# Street lines via the custom deterministic recognizer (no NER).
out = R.scrub_text("ship it to 25 Park Lane South, Jersey City")
assert "25 Park Lane South" not in out and ADDRESS_TAG in out


def test_email_redacted(R):
out = R.scrub_text("ping me at jane.doe+x@example.co.uk please")
assert "jane.doe" not in out and EMAIL_TAG in out


def test_ssn_redacted(R):
# A plausible SSN — Presidio deliberately rejects textbook fakes like
# 123-45-6789 / 078-05-1120 via its invalidate_result blacklist.
out = R.scrub_text("my SSN is 457-55-5462")
assert "457-55-5462" not in out and SSN_TAG in out


def test_credit_card_redacted(R):
# Luhn-valid canonical test number.
out = R.scrub_text("card 4111 1111 1111 1111 on file")
assert "4111" not in out and BANK_TAG in out


def test_iban_redacted(R):
out = R.scrub_text("send to GB82 WEST 1234 5698 7654 32 today")
assert "WEST" not in out and BANK_TAG in out


def test_dates_are_not_redacted(R):
# Dates are deliberately left intact.
out = R.scrub_text("DOB: 04/12/1990. The invoice is dated 06/01/2026.")
assert "04/12/1990" in out and "06/01/2026" in out


def test_scrub_request_string_content(R):
body = {
"model": "gpt-4.1",
"messages": [
{"role": "system", "content": "be helpful"},
{"role": "user", "content": "email me at a@b.com"},
],
}
out = R.scrub_request(body)
assert EMAIL_TAG in out["messages"][1]["content"]
# Original body is not mutated.
assert body["messages"][1]["content"] == "email me at a@b.com"


def test_scrub_request_multimodal_parts(R):
body = {
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "reach me at a@b.com"},
{"type": "image_url", "image_url": {"url": "http://x/y.png"}},
],
}
]
}
out = R.scrub_request(body)
parts = out["messages"][0]["content"]
assert EMAIL_TAG in parts[0]["text"]
assert parts[1] == {"type": "image_url", "image_url": {"url": "http://x/y.png"}}
Loading
Loading