Skip to content

Add security regression tests + pyNEID compat scaffolding#21

Merged
bjfultn merged 5 commits into
developfrom
ci/security-and-pyneid-tests
May 20, 2026
Merged

Add security regression tests + pyNEID compat scaffolding#21
bjfultn merged 5 commits into
developfrom
ci/security-and-pyneid-tests

Conversation

@bjfultn
Copy link
Copy Markdown
Collaborator

@bjfultn bjfultn commented May 20, 2026

Phase 3 + 4 of the CI plan. Stacked on #20 — review and merge that one first.

What this adds

`tests/test_security_e2e.py` — 12 active tests + 1 skipped, each named for the regression or security boundary it guards. Any change that re-opens one of these holes turns its test red at PR time.

Category Coverage
Table whitelist rejects `sqlite_master`, rejects unregistered tables, allows propfilter internal tables (`test_access`) reachable from subqueries
Table extractor `ORDER BY` not misidentified as a table (1404e15), UNKNOWN-statement-skip closed for Oracle `ROWNUM` queries (867278c)
propflag zero honored on L0, complex CASE WHEN with subquery FROM does not misfire the whitelist (the PR #16/#17 regression)
Statement validator DML/DDL rejected, `utl_http`/`dbms_sql`/`sys_context` rejected, semicolon-chained rejected
Errors 400 for query errors, response bodies don't leak `Traceback`/internal paths/`Password=`

`tests/test_pyneid_compat.py` — module-level skip with a detailed reason. pyNEID's `query_adql` uses the async TAP endpoint, which needs per-request workspace state across requests; the test fixture handler doesn't model that yet (Apache normally provides it). The equivalent code paths are covered by the direct-HTTP security-e2e tests.

`tests/conftest.py` — the handler now routes both `/cgi-bin/TAP/nph-tap.py/` and `/TAP/` (the production short form pyNEID expects) to the same nph script.

Skip details

Test Why skipped
`test_propflag_bypass_routes_l1_through_propfilter` `configparam.py` only reads `PROPFILTER`/`ACCESS_TBL`/`USERS_TBL` inside its `if dbms == 'oracle'` branch (~lines 255-345), so on SQLite the propfilter config is silently dropped and the guard becomes a no-op against this fixture. End-to-end propflag coverage needs the deferred Oracle/Postgres CI matrix.
All of `test_pyneid_compat.py` Async TAP flow needs per-request workspace handling that the fixture handler doesn't yet model.

Both have actionable follow-ups documented inline.

Pre-existing inconsistency surfaced

TAP returns 400 for DML / dangerous-function / semicolon-chained rejections (`validate_statement` raises plain `Exception`) but 403 for system-catalog rejections (raises `TableValidationError`). These are all security-policy rejections and should arguably all be 403. Tests assert current behavior; harmonizing the codes is a follow-up worth filing.

Local results

```
53 passed, 2 skipped in 10.81s
```

ruff clean.

🤖 Generated with Claude Code

bjfultn and others added 5 commits May 20, 2026 09:59
For SQLite configs, connectInfo['tap_schema'] is the FILE PATH (passed to
ATTACH DATABASE), not the schema name. The schema name used in queries is
held in connectInfo['tap_schema_file'] (set as ATTACH AS in tapquery.py).

TableValidator and dataDictionary both used connectInfo['tap_schema'] as
the schema name when constructing SQL like:
  SELECT table_name FROM <schema>.tables
which produced "near "/": syntax error" against SQLite because the
generated SQL embedded the file path.

Surfaced by the new CI SQLite fixture; would have prevented any SQLite-
backed production deployment from passing table validation. Oracle/
Postgres/MySQL behavior is unchanged — those configs don't set
tap_schema_file, so the existing tap_schema lookup remains the
fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1+2 of the CI plan. Establishes a GitHub Actions gate that runs on
every PR and on push to develop/main, replacing the manual KOA/NEID/NEA
testing loop with automated checks.

What runs in CI:
- ruff (E, F, W, I) — gated; codebase passes with the minimal ignore set
  documented in pyproject.toml
- pytest tests/ — matrix over Python 3.9 / 3.10 / 3.11 / 3.12 on Ubuntu
- python -m build --wheel — Ubuntu + macOS, verifies the C extension
  builds cross-platform and the wheel is installable

Test infrastructure:
- tests/conftest.py spins up an HTTP server on a free port backed by a
  custom nph-aware request handler (stdlib's CGIHTTPRequestHandler
  prepends its own status line, breaking nph- scripts that emit their
  own HTTP/1.1 status lines like nph-tap.py does)
- tests/fixtures/build_test_db.py builds tap_schema.db (IVOA-shape) and
  test_data.db (3 public + 3 expired + 4 still-embargoed L1 rows). The
  still-embargoed rows are the key fixture: production test servers
  (vmneidtest, koa-test) have all-expired propints, which means propflag
  enforcement can't be observed there. CI now has the missing scenario.
- tests/fixtures/stubs/spatial_index.py is a minimal stub for the four
  module-level constants TAP imports. The real spatial_index package on
  PyPI only ships wheels for Python ≤ 3.8, so modern-Python CI cannot
  install it. Tracking as upstream issue.
- tests/test_http_endpoint.py is a smoke layer that exercises the full
  HTTP→CGI→TAP→SQLite→IPAC pipeline. Security/regression tests land in
  the follow-up PR (Phase 3+4).

Cosmetic ruff autofixes touched 13 files (whitespace, f-strings, import
order) — no semantic changes.

Deferred to follow-up PRs:
- F821 ignored in ruff for now; 6 pre-existing undefined-name bugs to
  address in TAP/taputil.py (SQLite/MySQL connection branches) and
  TAP/tap.py:2402 (debug-log typo `result` → `results`)
- spatial_index Python 3.9+ wheel situation upstream
- Postgres/MySQL/Oracle CI matrix (SQLite-only for first cut)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…index

The wheel-build verification and the test-job install both failed on the
first CI run because pip can't resolve spatial_index on Python 3.9+:
the PyPI package only ships wheels for 2.7 / 3.6 / 3.7 / 3.8 and has no
source distribution.

Workaround in CI:
- Test job: install nexsciTAP and ADQL with --no-deps. The test fixture
  stub in tests/fixtures/stubs/spatial_index.py supplies the four
  module-level constants TAP uses at import time. configobj (the only
  other actually-needed runtime dep) is in requirements-test.txt.
- Wheel-platforms job: drop the post-build pip install; instead verify
  the .so artifact is present in the wheel. The wheel itself builds
  fine — the issue is only at install-resolve time.

This is purely a CI workaround. The deeper fix is upstream in
spatial_index (publish modern wheels or an sdist).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surfaced by the second CI run: TAP/tablenames.py imports sqlparse,
TAP/datadictionary.py imports astropy.io.ascii, and other TAP modules
import bs4 and xmltodict. None of these are declared in setup.py's
install_requires (a pre-existing bug — declaring them in install_requires
is a separate cleanup since the CI install path uses --no-deps anyway).

Add them to requirements-test.txt so CI installs the actual runtime
surface needed to run the tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3+4 of the CI plan. Each test in test_security_e2e.py is named
after the specific regression or security boundary it guards, so any
future change that re-opens one of those holes turns the corresponding
test red at PR time.

What's tested (12 active, 1 skipped):
- Table whitelist: rejects sqlite_master, rejects unregistered tables,
  allows propfilter internal tables (test_access) in subqueries
- Table extractor: ORDER BY not misidentified, UNKNOWN-statement-skip
  bypass closed (Oracle ROWNUM)
- propflag: zero honored on L0, complex CASE WHEN with subquery FROM
  does not misfire the whitelist
- Statement validator: DML/DDL rejected, dangerous Oracle functions
  rejected, semicolon-chained statements rejected
- HTTP status codes: 400 for query errors, error bodies don't leak
  internal state (Traceback, paths, Password=)

Skipped with detailed reasons in the test bodies:
- test_propflag_bypass_routes_l1_through_propfilter — configparam.py
  only reads PROPFILTER/ACCESS_TBL/USERS_TBL inside its `if dbms ==
  'oracle'` branch, so on SQLite the propfilter config is silently
  dropped and the guard becomes a no-op. End-to-end propflag coverage
  needs the Oracle/Postgres CI matrix from the deferred plan work.

pyNEID compat (test_pyneid_compat.py) — module-level skip. pyNEID's
query_adql uses the async TAP endpoint (POST + status URL polling +
results fetch); the test fixture handler doesn't yet model the
per-request workspace lifecycle Apache provides in production.
Direct-HTTP coverage of the same code paths is in test_security_e2e.py.

Conftest: handler now routes both /cgi-bin/TAP/nph-tap.py/<endpoint>
(direct CGI URL) and /TAP/<endpoint> (production short form) to the
same nph-tap.py script — wires up the URL shape pyNEID expects, even
though pyNEID itself remains skipped pending async work.

Misalignment surfaced: TAP returns 400 for DML / dangerous-function /
semicolon-chained rejections (validate_statement raises plain
Exception), but 403 for system-catalog rejections (raises
TableValidationError). These are all security-policy decisions and
should arguably all be 403. Tests match current behavior; harmonizing
is a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bjfultn bjfultn merged commit a0eb278 into develop May 20, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant