Add security regression tests + pyNEID compat scaffolding by bjfultn · Pull Request #21 · Caltech-IPAC/nexsciTAP

bjfultn · 2026-05-20T16:24:27Z

Phase 3 + 4 of the CI plan. Stacked on #20 — review and merge that one first.

What this adds

`tests/test_security_e2e.py` — 12 active tests + 1 skipped, each named for the regression or security boundary it guards. Any change that re-opens one of these holes turns its test red at PR time.

Category	Coverage
Table whitelist	rejects `sqlite_master`, rejects unregistered tables, allows propfilter internal tables (`test_access`) reachable from subqueries
Table extractor	`ORDER BY` not misidentified as a table (`1404e15`), UNKNOWN-statement-skip closed for Oracle `ROWNUM` queries (`867278c`)
propflag	zero honored on L0, complex CASE WHEN with subquery FROM does not misfire the whitelist (the PR #16/#17 regression)
Statement validator	DML/DDL rejected, `utl_http`/`dbms_sql`/`sys_context` rejected, semicolon-chained rejected
Errors	400 for query errors, response bodies don't leak `Traceback`/internal paths/`Password=`

`tests/test_pyneid_compat.py` — module-level skip with a detailed reason. pyNEID's `query_adql` uses the async TAP endpoint, which needs per-request workspace state across requests; the test fixture handler doesn't model that yet (Apache normally provides it). The equivalent code paths are covered by the direct-HTTP security-e2e tests.

`tests/conftest.py` — the handler now routes both `/cgi-bin/TAP/nph-tap.py/` and `/TAP/` (the production short form pyNEID expects) to the same nph script.

Skip details

Test	Why skipped
`test_propflag_bypass_routes_l1_through_propfilter`	`configparam.py` only reads `PROPFILTER`/`ACCESS_TBL`/`USERS_TBL` inside its `if dbms == 'oracle'` branch (~lines 255-345), so on SQLite the propfilter config is silently dropped and the guard becomes a no-op against this fixture. End-to-end propflag coverage needs the deferred Oracle/Postgres CI matrix.
All of `test_pyneid_compat.py`	Async TAP flow needs per-request workspace handling that the fixture handler doesn't yet model.

Both have actionable follow-ups documented inline.

Pre-existing inconsistency surfaced

TAP returns 400 for DML / dangerous-function / semicolon-chained rejections (`validate_statement` raises plain `Exception`) but 403 for system-catalog rejections (raises `TableValidationError`). These are all security-policy rejections and should arguably all be 403. Tests assert current behavior; harmonizing the codes is a follow-up worth filing.

Local results

```
53 passed, 2 skipped in 10.81s
```

ruff clean.

🤖 Generated with Claude Code

For SQLite configs, connectInfo['tap_schema'] is the FILE PATH (passed to ATTACH DATABASE), not the schema name. The schema name used in queries is held in connectInfo['tap_schema_file'] (set as ATTACH AS in tapquery.py). TableValidator and dataDictionary both used connectInfo['tap_schema'] as the schema name when constructing SQL like: SELECT table_name FROM <schema>.tables which produced "near "/": syntax error" against SQLite because the generated SQL embedded the file path. Surfaced by the new CI SQLite fixture; would have prevented any SQLite- backed production deployment from passing table validation. Oracle/ Postgres/MySQL behavior is unchanged — those configs don't set tap_schema_file, so the existing tap_schema lookup remains the fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 1+2 of the CI plan. Establishes a GitHub Actions gate that runs on every PR and on push to develop/main, replacing the manual KOA/NEID/NEA testing loop with automated checks. What runs in CI: - ruff (E, F, W, I) — gated; codebase passes with the minimal ignore set documented in pyproject.toml - pytest tests/ — matrix over Python 3.9 / 3.10 / 3.11 / 3.12 on Ubuntu - python -m build --wheel — Ubuntu + macOS, verifies the C extension builds cross-platform and the wheel is installable Test infrastructure: - tests/conftest.py spins up an HTTP server on a free port backed by a custom nph-aware request handler (stdlib's CGIHTTPRequestHandler prepends its own status line, breaking nph- scripts that emit their own HTTP/1.1 status lines like nph-tap.py does) - tests/fixtures/build_test_db.py builds tap_schema.db (IVOA-shape) and test_data.db (3 public + 3 expired + 4 still-embargoed L1 rows). The still-embargoed rows are the key fixture: production test servers (vmneidtest, koa-test) have all-expired propints, which means propflag enforcement can't be observed there. CI now has the missing scenario. - tests/fixtures/stubs/spatial_index.py is a minimal stub for the four module-level constants TAP imports. The real spatial_index package on PyPI only ships wheels for Python ≤ 3.8, so modern-Python CI cannot install it. Tracking as upstream issue. - tests/test_http_endpoint.py is a smoke layer that exercises the full HTTP→CGI→TAP→SQLite→IPAC pipeline. Security/regression tests land in the follow-up PR (Phase 3+4). Cosmetic ruff autofixes touched 13 files (whitespace, f-strings, import order) — no semantic changes. Deferred to follow-up PRs: - F821 ignored in ruff for now; 6 pre-existing undefined-name bugs to address in TAP/taputil.py (SQLite/MySQL connection branches) and TAP/tap.py:2402 (debug-log typo `result` → `results`) - spatial_index Python 3.9+ wheel situation upstream - Postgres/MySQL/Oracle CI matrix (SQLite-only for first cut) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…index The wheel-build verification and the test-job install both failed on the first CI run because pip can't resolve spatial_index on Python 3.9+: the PyPI package only ships wheels for 2.7 / 3.6 / 3.7 / 3.8 and has no source distribution. Workaround in CI: - Test job: install nexsciTAP and ADQL with --no-deps. The test fixture stub in tests/fixtures/stubs/spatial_index.py supplies the four module-level constants TAP uses at import time. configobj (the only other actually-needed runtime dep) is in requirements-test.txt. - Wheel-platforms job: drop the post-build pip install; instead verify the .so artifact is present in the wheel. The wheel itself builds fine — the issue is only at install-resolve time. This is purely a CI workaround. The deeper fix is upstream in spatial_index (publish modern wheels or an sdist). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Surfaced by the second CI run: TAP/tablenames.py imports sqlparse, TAP/datadictionary.py imports astropy.io.ascii, and other TAP modules import bs4 and xmltodict. None of these are declared in setup.py's install_requires (a pre-existing bug — declaring them in install_requires is a separate cleanup since the CI install path uses --no-deps anyway). Add them to requirements-test.txt so CI installs the actual runtime surface needed to run the tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 3+4 of the CI plan. Each test in test_security_e2e.py is named after the specific regression or security boundary it guards, so any future change that re-opens one of those holes turns the corresponding test red at PR time. What's tested (12 active, 1 skipped): - Table whitelist: rejects sqlite_master, rejects unregistered tables, allows propfilter internal tables (test_access) in subqueries - Table extractor: ORDER BY not misidentified, UNKNOWN-statement-skip bypass closed (Oracle ROWNUM) - propflag: zero honored on L0, complex CASE WHEN with subquery FROM does not misfire the whitelist - Statement validator: DML/DDL rejected, dangerous Oracle functions rejected, semicolon-chained statements rejected - HTTP status codes: 400 for query errors, error bodies don't leak internal state (Traceback, paths, Password=) Skipped with detailed reasons in the test bodies: - test_propflag_bypass_routes_l1_through_propfilter — configparam.py only reads PROPFILTER/ACCESS_TBL/USERS_TBL inside its `if dbms == 'oracle'` branch, so on SQLite the propfilter config is silently dropped and the guard becomes a no-op. End-to-end propflag coverage needs the Oracle/Postgres CI matrix from the deferred plan work. pyNEID compat (test_pyneid_compat.py) — module-level skip. pyNEID's query_adql uses the async TAP endpoint (POST + status URL polling + results fetch); the test fixture handler doesn't yet model the per-request workspace lifecycle Apache provides in production. Direct-HTTP coverage of the same code paths is in test_security_e2e.py. Conftest: handler now routes both /cgi-bin/TAP/nph-tap.py/<endpoint> (direct CGI URL) and /TAP/<endpoint> (production short form) to the same nph-tap.py script — wires up the URL shape pyNEID expects, even though pyNEID itself remains skipped pending async work. Misalignment surfaced: TAP returns 400 for DML / dangerous-function / semicolon-chained rejections (validate_statement raises plain Exception), but 403 for system-catalog rejections (raises TableValidationError). These are all security-policy decisions and should arguably all be 403. Tests match current behavior; harmonizing is a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bjfultn and others added 5 commits May 20, 2026 09:59

bjfultn merged commit a0eb278 into develop May 20, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add security regression tests + pyNEID compat scaffolding#21

Add security regression tests + pyNEID compat scaffolding#21
bjfultn merged 5 commits into
developfrom
ci/security-and-pyneid-tests

bjfultn commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bjfultn commented May 20, 2026

What this adds

Skip details

Pre-existing inconsistency surfaced

Local results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant