Skip to content

feat(query): migrate to ClickHouse + web-Lapache backend endpoints#196

Open
jacobjurek wants to merge 1 commit into
mainfrom
jake/query-clickhouse
Open

feat(query): migrate to ClickHouse + web-Lapache backend endpoints#196
jacobjurek wants to merge 1 commit into
mainfrom
jake/query-clickhouse

Conversation

@jacobjurek
Copy link
Copy Markdown
Contributor

Summary

Migrates the query service from Postgres to ClickHouse, and brings the query work it builds on up to date on main. Scoped to query/ plus the query service's docker-compose block and the CLICKHOUSE_* entries in example.env — no other service is touched.

Note: the ClickHouse migration sits on top of the web-Lapache backend endpoints (clusters, date selector, decimation, signal-names), which were on the feature branch but not yet on main. Those files (e.g. service/cluster.py) don't exist on main, so the migration can't be merged in isolation — this PR includes them as one coherent unit.

ClickHouse migration

  • Drop sqlalchemy + psycopg2; add clickhouse-connect (HTTP interface, port 8123).
  • connection.py: one shared client; init_db creates the service's own metadata tables in ClickHouse — query_log (MergeTree, append-only), query_token + signal_definition (ReplacingMergeTree, read with FINAL).
  • Signal/cluster SQL ported to ClickHouse: DISTINCT ONLIMIT BY, EXTRACT(EPOCH …) → integer bucketing on the timestamp micros column, AT TIME ZONE …::datetoDate(produced_at, tz). produced_at is a MATERIALIZED column (queryable, just hidden from SELECT *), so it's always named explicitly.
  • Models → plain dataclasses; metadata services use clickhouse-connect.
  • docker-compose query block + example.env: CLICKHOUSE_* over HTTP 8123 (gr26 keeps native 9000).

Supporting endpoints (pre-existing on the feature branch)

/signals/names, /clusters, /clusters/dates; raw-data decimation via max_points; cached highest-frequency anchor signal; dev reload watcher scoped to the app package.

Verification

  • pytest → 12/12 pass.
  • Exercised end-to-end against ClickHouse: /signals (pivot + decimation via LIMIT BY), /signals/names, /clusters (gap split), /clusters/dates (tz), /token (write), token-auth on /signals (FINAL read), and query_log written on every /signals call.

🤖 Generated with Claude Code

Brings the query service on main up to date: the ClickHouse migration plus
the web-Lapache backend endpoints it builds on (these were not yet on main).

ClickHouse migration:
- Swap sqlalchemy + psycopg2 for clickhouse-connect (HTTP, port 8123).
- connection.py: shared client; init_db creates the service's metadata tables
  in ClickHouse (query_log MergeTree; query_token + signal_definition
  ReplacingMergeTree, read with FINAL).
- Signal/cluster SQL rewritten for ClickHouse: DISTINCT ON -> LIMIT BY,
  EXTRACT(EPOCH ...) -> integer bucketing on the timestamp micros column,
  AT TIME ZONE ::date -> toDate(produced_at, tz). produced_at is MATERIALIZED
  (queryable, absent from SELECT *), so it is always named explicitly.
- Models become plain dataclasses; metadata services use clickhouse-connect.
- docker-compose query block + example.env: CLICKHOUSE_* over HTTP 8123
  (gr26 keeps native 9000).

Supporting endpoints (pre-existing on the feature branch):
- /signals/names, /clusters, /clusters/dates; raw-data decimation via
  max_points; cached highest-frequency anchor signal; dev reload watcher
  scoped to the app package.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant