Skip to content

Database(engine) rebuilds the engine from its URL, dropping connect listeners / pool config #45

Description

@davenquinn

macrostrat.database.utils.create_engine rebuilds a passed-in Engine from its URL in the common no-kwargs case, instead of reusing it. This silently discards anything attached to the engine before it was handed to Database: connect event listeners, the pool class (e.g. StaticPool), and connect_args. As a result Database(engine).engine is engine is False.

Reproduction

import sqlalchemy as sa
from macrostrat.database import Database

eng = sa.create_engine("sqlite+pysqlite://")
fired = []
sa.event.listen(eng, "connect", lambda *_: fired.append(1))

db = Database(eng)
assert db.engine is eng          # FAILS — engine was rebuilt
db.run_query("SELECT 1")
assert fired                     # FAILS — listener was lost

Real-world impact: loading a SQLite extension (mod_spatialite) via a connect listener, or using StaticPool for an in-memory DB, both break when the engine is passed to Database.

Root cause

In utils.create_engine, the recreate check is inverted relative to its own comments, and the Engine branch ignores kwargs:

recreate = False
if len(kwargs) > 0:
    recreate = True
...
if isinstance(db_conn, Engine):
    if recreate:
        return db_conn          # kwargs present → reuse (but kwargs ignored)
    else:
        db_conn = db_conn.url    # no kwargs → rebuild from URL → state lost

So Database(engine) (no kwargs) hits the else and rebuilds.

Expected behavior

A pre-built Engine should be reused as-is. Engine construction options can't be applied to an existing engine after the fact, so passing engine kwargs alongside an Engine should warn, not silently rebuild.

Proposed fix

Drop the recreate concept for the Engine case; only build from str/URL (postgres driver rewrite preserved, so string-based callers are unaffected):

def create_engine(_input, **kwargs):
    from .core import Database

    db_conn = _input
    if isinstance(_input, Database):
        db_conn = _input.engine
    elif isinstance(_input, str):
        db_conn = make_url(_input)

    if isinstance(db_conn, Engine):
        if kwargs:
            log.warning("create_engine: ignoring kwargs %s for a pre-built Engine", sorted(kwargs))
        if db_conn.driver == "psycopg2":
            log.warning("The psycopg2 driver is deprecated. Please use psycopg3 instead.")
        return db_conn

    if not isinstance(db_conn, URL):
        raise ValueError(f"Invalid input type: {_input}")
    url = db_conn
    if "postgres" in url.drivername:
        url = url.set(drivername="postgresql+psycopg")
    return base_create_engine(url, **kwargs)

Tests to add

  • Database(engine).engine is engine
  • a connect listener on a passed engine survives and fires
  • str/URL input still builds a fresh engine with the expected URL
  • postgresql://… → driver rewritten to postgresql+psycopg

Notes

Found while building a SpatiaLite/RTTopo backend that loads mod_spatialite via a connect listener. Interim workaround in the downstream repo: a small shim that installs exactly this corrected create_engine (patches both utils and core, since core binds the name at import) — to be removed once this lands. Affects 4.3.1.

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions