SDK / API feedback from a pipeline-integration use case

# `tango-python` SDK / Tango API — follow-up feedback from a different use case

Companion to the
[prior issue](https://github.com/abigailhaddad/gdit-socom-trace/blob/main/TANGO_ISSUE.md).
This round is from integrating Tango alongside USASpending and SAM.gov
as enrichment sources in a continuously-deployed pipeline. None of the
items below block the work — we have workarounds for everything — but
flagging them in case any are useful for documentation or backlog.

---

## What's working great in this use case

- **`rate_limit_info` on the client** is accurate and accessible. We rely
  on it for pacing decisions across multiple workflows that share a
  medium-tier 7,500/day quota.
- **Schema-stable `dataclass` responses** make response handling
  pleasant compared to the loose-shape USASpending API.
- **Resumability via deterministic `key` values** lets us write
  resumable fetchers cleanly. Each workflow's output CSV carries a
  `lookup_status` column; PIIDs already in `found` / `not_found` state
  are skipped on re-run; error rows are retried. The Tango key pattern
  is what makes this safe.
- **`list_vehicles(search="…")`** plus `list_vehicle_awardees(uuid=…)`
  surfaces program-level vehicle families (CIO-SP3, OASIS+, Alliant 2,
  EIS, SEWP, …) that USASpending's public API doesn't name at all.
  This is the headline reason Tango is in our pipeline.

---

## Friction we hit (ranked by impact on this pipeline)

### 1. Vehicle ↔ IDV linkage requires two calls and a client-side exact-match check

Use case: for every parent IDV PIID in our dataset (~7,800), look up
the program-level Vehicle it belongs to.

The shortest path we found:

```python
# Call 1: resolve PIID → solicitation_identifier
idv = client.list_idvs(piid=parent_piid, limit=1,
                       shape="piid,competition(*)")
sol = idv.results[0]["competition"]["solicitation_identifier"]

# Call 2: search Vehicles by that solicitation_identifier
veh = client.list_vehicles(search=sol, limit=5,
                           shape="uuid,program_acronym,vehicle_type,"
                                 "solicitation_identifier,idv_count,"
                                 "awardee_count,order_count,total_obligated")

# Client-side filter — `search` is keyword-style, not exact
match = next((v for v in veh.results
              if v["solicitation_identifier"] == sol), None)
```

That's 2 API calls per PIID and a client-side exact-match check —
because (a) `list_idvs` doesn't have a `vehicle(…)` shape selector and
(b) `list_vehicles` accepts no exact `solicitation_identifier` or
`piid` filter. At our parent-PIID count this doubles our backfill cost
(~15,500 calls instead of ~7,800).

**Possible fix**: either expose
`list_vehicles(solicitation_identifier=…, exact=True)` /
`list_vehicles(piid=…)` as a server-side filter, or let `list_idvs`
accept a `vehicle(uuid,program_acronym,…)` shape that joins the
Vehicle in-server. Either would collapse this to 1 call/PIID and
eliminate the client-side exact-match step.

### 2. `Entity.immediate_owner` and `highest_owner` lack `uei`

Use case: we maintain a manually-curated corporate-family override
file (141 rows / 37 canonical-parent groups — for example Lockheed
Martin has 15 distinct UEIs in USASpending bulk data that need to
roll up into one canonical parent_uei). We evaluated using Tango's
owner fields to auto-derive these from SAM.

Repro: shape any entity with star-expansion on the owner fields,
including a known subsidiary like General Dynamics Land Systems
(`HAWKSQF848W7`):

```python
client.list_entities(uei="HAWKSQF848W7", limit=1,
                     shape="legal_business_name,"
                           "immediate_owner(*),highest_owner(*)")
```

Owner responses come back as:
```
{"cage_code": "8JFT1",
 "legal_business_name": "GENERAL DYNAMICS LAND SYSTEMS, GLOBAL LLC"}
{"cage_code": "95403",
 "legal_business_name": "GENERAL DYNAMICS CORP"}
```

The owner is identified by NAME and CAGE — `uei` is not in the
returned payload. Across the 5 known subsidiaries we probed (all
expected to roll up to General Dynamics), 4 came back with owner
records carrying only `cage_code` + `legal_business_name`; the 5th
had no owner record at all. Without a UEI on the owner side we can't
join `subsidiary_uei → parent_uei`.

Probable root cause: SAM's underlying registration data populates
parent registrant fields with name + CAGE but not the parent's own UEI
when known. So this may need an upstream SAM API enhancement rather
than a Tango fix — but if Tango maintains its own Entity → Entity
index and could backfill the UEI when the legal_business_name resolves
to a known Entity in the registry, that'd be the unlock.

**Possible fix**: when the owner's `legal_business_name` + `cage_code`
matches a registered Entity, surface that Entity's `uei` on the owner
record.

### 3. Vehicle `program_acronym` is often null

In a small initial probe (5 parent PIIDs that matched a Vehicle), 2
came back with `program_acronym=None` — both were agency-specific
BPA/BOA-style vehicles. They still get `vehicle_type` and the rollup
counts (`idv_count`, `order_count`, etc.), which is enough for
analytical aggregation, but we lose the friendly display label for
those rows.

Not really a bug — the underlying SAM / FPDS data legitimately doesn't
have a name for these — but for consumers building a "By Vehicle"
view it'd help to know in advance which vehicle classes tend to lack
a `program_acronym`, so we can plan a fallback (we ended up rendering
`(unnamed <type>)` as the display label).

**Possible fix on the docs side**: a one-line note on
`VEHICLE_SCHEMA.program_acronym` listing the vehicle classes that
typically have it populated (`GWAC`, large multi-agency IDCs) vs
typically null (agency-specific BPAs, BOAs, single-award IDCs).

### 4. `rate_limit_info` exposes daily totals but not per-endpoint utilization

For a multi-workflow account where several refresh jobs share quota,
it'd be useful to know which endpoints are burning quota fastest
without instrumenting application-side counters. Tier-up territory
for us, but flagging in case the response is easy to extend.

**Possible fix**: add `endpoint_calls_today: {endpoint_name: count, …}`
or similar to the rate-limit response payload.

---

## Wishlist (small things, no concrete bug)

- A `list_vehicles(piid=parent_idv_piid)` or `(solicitation_identifier=…, exact=True)` filter — biggest single ergonomic win for the use case here (collapses item 1 from two calls to one).
- A docs page or schema annotation flagging which fields tend to be null by vehicle/contract class (item 3).
- UEI on `immediate_owner` / `highest_owner` when resolvable (item 2).

---

## What Tango uniquely provided here

The program-level vehicle layer (`program_acronym`, `vehicle_type`,
and the per-vehicle rollup metrics under `VEHICLE_SCHEMA`) is the
piece with no public alternative we could find — USASpending exposes
parent IDV PIIDs but doesn't name the program-level family those IDVs
belong to. That's what motivated the integration.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDK / API feedback from a pipeline-integration use case #30

`tango-python` SDK / Tango API — follow-up feedback from a different use case

What's working great in this use case

Friction we hit (ranked by impact on this pipeline)

1. Vehicle ↔ IDV linkage requires two calls and a client-side exact-match check

2. `Entity.immediate_owner` and `highest_owner` lack `uei`

3. Vehicle `program_acronym` is often null

4. `rate_limit_info` exposes daily totals but not per-endpoint utilization

Wishlist (small things, no concrete bug)

What Tango uniquely provided here

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

SDK / API feedback from a pipeline-integration use case #30

Description

tango-python SDK / Tango API — follow-up feedback from a different use case

What's working great in this use case

Friction we hit (ranked by impact on this pipeline)

1. Vehicle ↔ IDV linkage requires two calls and a client-side exact-match check

2. Entity.immediate_owner and highest_owner lack uei

3. Vehicle program_acronym is often null

4. rate_limit_info exposes daily totals but not per-endpoint utilization

Wishlist (small things, no concrete bug)

What Tango uniquely provided here

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`tango-python` SDK / Tango API — follow-up feedback from a different use case

2. `Entity.immediate_owner` and `highest_owner` lack `uei`

3. Vehicle `program_acronym` is often null

4. `rate_limit_info` exposes daily totals but not per-endpoint utilization