-
Notifications
You must be signed in to change notification settings - Fork 0
Add opt-in local PII redaction before TEE encryption #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
3803af2
Add opt-in local PII redaction before prompts leave the proxy
claude 90f2553
PII redaction: use Presidio as the single engine, drop handrolled regex
claude ac2f6f6
PII redaction: drop date/DOB scrubbing and the --pii-all-dates option
claude 9d5def8
PII redaction: make scrubbing configurable per request
claude 79bf83b
Revert "PII redaction: make scrubbing configurable per request"
claude 421d2b5
Potential fix for pull request finding
adambalogh dfd191c
Potential fix for pull request finding
adambalogh 1ae5b4b
Potential fix for pull request finding
adambalogh 76fe0d7
Potential fix for pull request finding
adambalogh 6da6ada
Potential fix for pull request finding
adambalogh 90bde6d
Potential fix for pull request finding
adambalogh 672988b
Add scripts/pii_repl.py for interactive PII-redaction testing
claude 8677036
Fix install-pii Makefile target dropped by autofix
claude c7f269a
PII redaction: prioritize identity de-identification (names, contact,…
claude 55a11f8
PII redaction: redact only hard identifiers, keep names and free-form…
claude f23c4b3
PII redaction: drop the spaCy model, make install a single pip install
claude 684fd9a
README: tighten the PII redaction section
claude 81d0e0f
Potential fix for pull request finding
adambalogh e94c675
Potential fix for pull request finding
adambalogh 6137618
Potential fix for pull request finding
adambalogh 94dc16e
Potential fix for pull request finding
adambalogh 7e41eb2
Potential fix for pull request finding
adambalogh bea3617
Update server.py
adambalogh ece103d
README: install the PII extra with uv to match the quickstart
claude 9fd6a83
Reference uv for the PII extra install in error hint, docstring, and …
claude File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| """Scratch helper: interactively try the PII redactor on your own text. | ||
|
|
||
| uv run python scripts/pii_repl.py # interactive: type a line, see it scrubbed | ||
| echo "email me at a@b.com" | uv run python scripts/pii_repl.py # or pipe input | ||
|
|
||
| Local-only — no login, no TEE, no network. Just the same Redactor the proxy uses. | ||
| Needs the extra: make install-pii (or: uv sync --extra pii) | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| import sys | ||
|
|
||
| from veil.pii import PiiSetupError, build_redactor | ||
|
|
||
|
|
||
| def main() -> None: | ||
| try: | ||
| redactor = build_redactor(enabled=True) | ||
| except PiiSetupError as exc: | ||
| sys.exit(f"PII redaction unavailable: {exc}") | ||
| assert redactor is not None | ||
|
|
||
| if not sys.stdin.isatty(): # piped input: scrub each line and exit | ||
| for line in sys.stdin: | ||
| print(redactor.scrub_text(line.rstrip("\n"))) | ||
| return | ||
|
|
||
| print("PII redactor — type text and press Enter (Ctrl-D / Ctrl-C to quit).\n") | ||
| try: | ||
| while True: | ||
| text = input("> ") | ||
| print(redactor.scrub_text(text)) | ||
| except (EOFError, KeyboardInterrupt): | ||
| print() | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,131 @@ | ||
| """PII redaction tests. | ||
|
|
||
| Detection is delegated to Presidio (the optional [pii] extra; it pulls in spaCy as a dependency), | ||
| so these tests skip when the extra isn't installed. ``build_redactor(enabled=False)`` | ||
| is checked unconditionally. | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| import pytest | ||
|
|
||
| from veil.pii import ( | ||
| ADDRESS_TAG, | ||
| BANK_TAG, | ||
| EMAIL_TAG, | ||
| PHONE_TAG, | ||
| SSN_TAG, | ||
| PiiSetupError, | ||
| build_redactor, | ||
| ) | ||
|
|
||
|
|
||
| def test_build_redactor_disabled_returns_none(): | ||
| # Works without the extra: disabled means no engine is constructed at all. | ||
| assert build_redactor(enabled=False) is None | ||
|
|
||
|
|
||
| def test_tags_are_distinct(): | ||
| assert len({EMAIL_TAG, PHONE_TAG, SSN_TAG, BANK_TAG, ADDRESS_TAG}) == 5 | ||
|
|
||
|
|
||
| # --- everything below needs the [pii] extra ------------------------------- | ||
|
|
||
| pytest.importorskip("presidio_analyzer", reason="requires the [pii] extra") | ||
|
|
||
|
|
||
| def _redactor(**kw): | ||
| try: | ||
| return build_redactor(enabled=True, **kw) | ||
| except PiiSetupError as exc: # presidio present but model missing | ||
| pytest.skip(str(exc)) | ||
|
|
||
|
|
||
| @pytest.fixture(scope="module") | ||
| def R(): | ||
| return _redactor() | ||
|
|
||
|
|
||
| def test_names_are_not_redacted(R): | ||
| # Names are deliberately left in — they over-redact third parties and spaCy | ||
| # mislabels uncommon names. User discretion covers names. | ||
| out = R.scrub_text("Reply to Advait about our contractor Julia Smith.") | ||
| assert "Advait" in out and "Julia Smith" in out | ||
|
|
||
|
|
||
| def test_free_form_location_not_redacted(R): | ||
| # Cities/countries are not redacted (only deterministic street lines are). | ||
| out = R.scrub_text("I live in San Francisco") | ||
| assert "San Francisco" in out | ||
|
|
||
|
|
||
| def test_phone_redacted(R): | ||
| out = R.scrub_text("call me at +1 (415) 555-0142 tomorrow") | ||
| assert "555-0142" not in out and PHONE_TAG in out | ||
|
|
||
|
|
||
| def test_street_address_redacted(R): | ||
| # Street lines via the custom deterministic recognizer (no NER). | ||
| out = R.scrub_text("ship it to 25 Park Lane South, Jersey City") | ||
| assert "25 Park Lane South" not in out and ADDRESS_TAG in out | ||
|
|
||
|
|
||
| def test_email_redacted(R): | ||
| out = R.scrub_text("ping me at jane.doe+x@example.co.uk please") | ||
| assert "jane.doe" not in out and EMAIL_TAG in out | ||
|
|
||
|
|
||
| def test_ssn_redacted(R): | ||
| # A plausible SSN — Presidio deliberately rejects textbook fakes like | ||
| # 123-45-6789 / 078-05-1120 via its invalidate_result blacklist. | ||
| out = R.scrub_text("my SSN is 457-55-5462") | ||
| assert "457-55-5462" not in out and SSN_TAG in out | ||
|
|
||
|
|
||
| def test_credit_card_redacted(R): | ||
| # Luhn-valid canonical test number. | ||
| out = R.scrub_text("card 4111 1111 1111 1111 on file") | ||
| assert "4111" not in out and BANK_TAG in out | ||
|
|
||
|
|
||
| def test_iban_redacted(R): | ||
| out = R.scrub_text("send to GB82 WEST 1234 5698 7654 32 today") | ||
| assert "WEST" not in out and BANK_TAG in out | ||
|
|
||
|
|
||
| def test_dates_are_not_redacted(R): | ||
| # Dates are deliberately left intact. | ||
| out = R.scrub_text("DOB: 04/12/1990. The invoice is dated 06/01/2026.") | ||
| assert "04/12/1990" in out and "06/01/2026" in out | ||
|
|
||
|
|
||
| def test_scrub_request_string_content(R): | ||
| body = { | ||
| "model": "gpt-4.1", | ||
| "messages": [ | ||
| {"role": "system", "content": "be helpful"}, | ||
| {"role": "user", "content": "email me at a@b.com"}, | ||
| ], | ||
| } | ||
| out = R.scrub_request(body) | ||
| assert EMAIL_TAG in out["messages"][1]["content"] | ||
| # Original body is not mutated. | ||
| assert body["messages"][1]["content"] == "email me at a@b.com" | ||
|
|
||
|
|
||
| def test_scrub_request_multimodal_parts(R): | ||
| body = { | ||
| "messages": [ | ||
| { | ||
| "role": "user", | ||
| "content": [ | ||
| {"type": "text", "text": "reach me at a@b.com"}, | ||
| {"type": "image_url", "image_url": {"url": "http://x/y.png"}}, | ||
| ], | ||
| } | ||
| ] | ||
| } | ||
| out = R.scrub_request(body) | ||
| parts = out["messages"][0]["content"] | ||
| assert EMAIL_TAG in parts[0]["text"] | ||
| assert parts[1] == {"type": "image_url", "image_url": {"url": "http://x/y.png"}} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.