Skip to content

Latest commit

 

History

History
153 lines (121 loc) · 4.65 KB

File metadata and controls

153 lines (121 loc) · 4.65 KB

Python clients

pgRDF exposes its capabilities as SQL UDFs, so any standard Postgres client library works. This page covers the two most common — psycopg (sync) and asyncpg (async) — plus a sketch of using pgRDF as a backend for rdflib if your codebase is already invested in that ecosystem.

psycopg 3

pip install "psycopg[binary]>=3.2"
import psycopg

with psycopg.connect("postgresql://pgrdf:pgrdf@localhost:5432/pgrdf") as conn:
    with conn.cursor() as cur:
        # First-time setup (or use a migration tool).
        cur.execute("CREATE EXTENSION IF NOT EXISTS pgrdf")

        # Load a Turtle file from the server-side filesystem.
        cur.execute(
            "SELECT pgrdf.load_turtle(%s, %s)",
            ("/fixtures/ontologies/foaf.ttl", 1),
        )
        n_triples = cur.fetchone()[0]
        print(f"loaded {n_triples} triples")

        # Parse an in-memory Turtle string.
        cur.execute(
            "SELECT pgrdf.parse_turtle(%s, %s)",
            (
                "@prefix ex: <http://example.com/> . ex:a ex:p ex:b .",
                2,
            ),
        )

        # See structured ingest stats via the verbose variant.
        cur.execute(
            "SELECT pgrdf.load_turtle_verbose(%s, %s, %s)",
            (
                "/fixtures/ontologies/prov.ttl",
                100,
                "http://www.w3.org/ns/prov#",
            ),
        )
        (stats,) = cur.fetchone()    # → dict
        print(f"prov.ttl: {stats['triples']} triples in {stats['elapsed_ms']:.0f}ms")

    conn.commit()

Note: pgrdf.load_turtle_verbose returns JSONB; psycopg adapts that to a Python dict by default.

asyncpg

pip install "asyncpg>=0.30"
import asyncio
import asyncpg

async def main():
    conn = await asyncpg.connect("postgresql://pgrdf:pgrdf@localhost:5432/pgrdf")
    try:
        await conn.execute("CREATE EXTENSION IF NOT EXISTS pgrdf")

        n = await conn.fetchval(
            "SELECT pgrdf.load_turtle($1, $2)",
            "/fixtures/ontologies/foaf.ttl", 1,
        )
        print(f"loaded {n} triples")

        # JSONB stats come back as a dict.
        stats = await conn.fetchval(
            "SELECT pgrdf.load_turtle_verbose($1, $2, $3)",
            "/fixtures/ontologies/prov.ttl", 100, "http://www.w3.org/ns/prov#",
        )
        print(stats["triples"], "triples in", stats["elapsed_ms"], "ms")

        # Quad-count by graph
        n_in_g1 = await conn.fetchval("SELECT pgrdf.count_quads($1)", 1)
        print(f"graph 1 holds {n_in_g1} quads")
    finally:
        await conn.close()

asyncio.run(main())

SQLAlchemy

If you already use SQLAlchemy, you can wire pgRDF as plain text SQL through the regular session. Map the JSONB return from the *_verbose UDFs to sqlalchemy.dialects.postgresql.JSONB so it deserialises to a dict:

from sqlalchemy import create_engine, text
from sqlalchemy.dialects.postgresql import JSONB

engine = create_engine("postgresql+psycopg://pgrdf:pgrdf@localhost/pgrdf")

with engine.begin() as conn:
    n = conn.scalar(
        text("SELECT pgrdf.load_turtle(:path, :graph)"),
        {"path": "/fixtures/ontologies/foaf.ttl", "graph": 1},
    )
    print(f"loaded {n} triples")

rdflib bridge (sketch)

rdflib is the dominant Python RDF library. Today, its default Memory and BerkeleyDB stores keep triples client-side. A natural position for pgRDF is as an rdflib.store.Store implementation that delegates add / remove / triples to pgRDF UDFs over a regular psycopg connection.

The shape (pgrdf.sparql ships in v0.3; the rdflib adapter package itself is not yet released):

# Conceptual — pgrdf.sparql is live, but the adapter package below
# is a to-be-shipped sibling project.
from rdflib import Graph
from pgrdf_rdflib import PgRDFStore     # to-be-shipped sibling project

store = PgRDFStore(dsn="postgresql://pgrdf:pgrdf@localhost/pgrdf", graph_id=1)
g = Graph(store=store)

g.parse("foaf.ttl", format="turtle")    # delegates to pgrdf.parse_turtle
list(g.triples((None, RDF.type, FOAF.Person)))   # delegates to a server-side BGP query

Until the adapter ships, you can use rdflib client-side to parse + manipulate graphs and pgRDF server-side for storage + bulk ops — they don't collide.

Caveats

  • load_turtle reads the path from the postgres process. Your application's working directory is irrelevant.
  • pgRDF strictness: any Turtle that fails to load is genuinely off-spec. Don't paper over parse errors in client code — fix the TTL.
  • The extension's schema is pgrdf. Set search_path = pgrdf, public once per session if you want to drop the pgrdf. prefix on every call.