Skip to content

SDK / API feedback from a pipeline-integration use case #30

@abigailhaddad

Description

@abigailhaddad

tango-python SDK / Tango API — follow-up feedback from a different use case

Companion to the
prior issue.
This round is from integrating Tango alongside USASpending and SAM.gov
as enrichment sources in a continuously-deployed pipeline. None of the
items below block the work — we have workarounds for everything — but
flagging them in case any are useful for documentation or backlog.


What's working great in this use case

  • rate_limit_info on the client is accurate and accessible. We rely
    on it for pacing decisions across multiple workflows that share a
    medium-tier 7,500/day quota.
  • Schema-stable dataclass responses make response handling
    pleasant compared to the loose-shape USASpending API.
  • Resumability via deterministic key values lets us write
    resumable fetchers cleanly. Each workflow's output CSV carries a
    lookup_status column; PIIDs already in found / not_found state
    are skipped on re-run; error rows are retried. The Tango key pattern
    is what makes this safe.
  • list_vehicles(search="…") plus list_vehicle_awardees(uuid=…)
    surfaces program-level vehicle families (CIO-SP3, OASIS+, Alliant 2,
    EIS, SEWP, …) that USASpending's public API doesn't name at all.
    This is the headline reason Tango is in our pipeline.

Friction we hit (ranked by impact on this pipeline)

1. Vehicle ↔ IDV linkage requires two calls and a client-side exact-match check

Use case: for every parent IDV PIID in our dataset (~7,800), look up
the program-level Vehicle it belongs to.

The shortest path we found:

# Call 1: resolve PIID → solicitation_identifier
idv = client.list_idvs(piid=parent_piid, limit=1,
                       shape="piid,competition(*)")
sol = idv.results[0]["competition"]["solicitation_identifier"]

# Call 2: search Vehicles by that solicitation_identifier
veh = client.list_vehicles(search=sol, limit=5,
                           shape="uuid,program_acronym,vehicle_type,"
                                 "solicitation_identifier,idv_count,"
                                 "awardee_count,order_count,total_obligated")

# Client-side filter — `search` is keyword-style, not exact
match = next((v for v in veh.results
              if v["solicitation_identifier"] == sol), None)

That's 2 API calls per PIID and a client-side exact-match check —
because (a) list_idvs doesn't have a vehicle(…) shape selector and
(b) list_vehicles accepts no exact solicitation_identifier or
piid filter. At our parent-PIID count this doubles our backfill cost
(~15,500 calls instead of ~7,800).

Possible fix: either expose
list_vehicles(solicitation_identifier=…, exact=True) /
list_vehicles(piid=…) as a server-side filter, or let list_idvs
accept a vehicle(uuid,program_acronym,…) shape that joins the
Vehicle in-server. Either would collapse this to 1 call/PIID and
eliminate the client-side exact-match step.

2. Entity.immediate_owner and highest_owner lack uei

Use case: we maintain a manually-curated corporate-family override
file (141 rows / 37 canonical-parent groups — for example Lockheed
Martin has 15 distinct UEIs in USASpending bulk data that need to
roll up into one canonical parent_uei). We evaluated using Tango's
owner fields to auto-derive these from SAM.

Repro: shape any entity with star-expansion on the owner fields,
including a known subsidiary like General Dynamics Land Systems
(HAWKSQF848W7):

client.list_entities(uei="HAWKSQF848W7", limit=1,
                     shape="legal_business_name,"
                           "immediate_owner(*),highest_owner(*)")

Owner responses come back as:

{"cage_code": "8JFT1",
 "legal_business_name": "GENERAL DYNAMICS LAND SYSTEMS, GLOBAL LLC"}
{"cage_code": "95403",
 "legal_business_name": "GENERAL DYNAMICS CORP"}

The owner is identified by NAME and CAGE — uei is not in the
returned payload. Across the 5 known subsidiaries we probed (all
expected to roll up to General Dynamics), 4 came back with owner
records carrying only cage_code + legal_business_name; the 5th
had no owner record at all. Without a UEI on the owner side we can't
join subsidiary_uei → parent_uei.

Probable root cause: SAM's underlying registration data populates
parent registrant fields with name + CAGE but not the parent's own UEI
when known. So this may need an upstream SAM API enhancement rather
than a Tango fix — but if Tango maintains its own Entity → Entity
index and could backfill the UEI when the legal_business_name resolves
to a known Entity in the registry, that'd be the unlock.

Possible fix: when the owner's legal_business_name + cage_code
matches a registered Entity, surface that Entity's uei on the owner
record.

3. Vehicle program_acronym is often null

In a small initial probe (5 parent PIIDs that matched a Vehicle), 2
came back with program_acronym=None — both were agency-specific
BPA/BOA-style vehicles. They still get vehicle_type and the rollup
counts (idv_count, order_count, etc.), which is enough for
analytical aggregation, but we lose the friendly display label for
those rows.

Not really a bug — the underlying SAM / FPDS data legitimately doesn't
have a name for these — but for consumers building a "By Vehicle"
view it'd help to know in advance which vehicle classes tend to lack
a program_acronym, so we can plan a fallback (we ended up rendering
(unnamed <type>) as the display label).

Possible fix on the docs side: a one-line note on
VEHICLE_SCHEMA.program_acronym listing the vehicle classes that
typically have it populated (GWAC, large multi-agency IDCs) vs
typically null (agency-specific BPAs, BOAs, single-award IDCs).

4. rate_limit_info exposes daily totals but not per-endpoint utilization

For a multi-workflow account where several refresh jobs share quota,
it'd be useful to know which endpoints are burning quota fastest
without instrumenting application-side counters. Tier-up territory
for us, but flagging in case the response is easy to extend.

Possible fix: add endpoint_calls_today: {endpoint_name: count, …}
or similar to the rate-limit response payload.


Wishlist (small things, no concrete bug)

  • A list_vehicles(piid=parent_idv_piid) or (solicitation_identifier=…, exact=True) filter — biggest single ergonomic win for the use case here (collapses item 1 from two calls to one).
  • A docs page or schema annotation flagging which fields tend to be null by vehicle/contract class (item 3).
  • UEI on immediate_owner / highest_owner when resolvable (item 2).

What Tango uniquely provided here

The program-level vehicle layer (program_acronym, vehicle_type,
and the per-vehicle rollup metrics under VEHICLE_SCHEMA) is the
piece with no public alternative we could find — USASpending exposes
parent IDV PIIDs but doesn't name the program-level family those IDVs
belong to. That's what motivated the integration.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions