Skip to content

feat(cli): capture-video on-demand fetcher + capture pipeline robustness#1447

Merged
ukimsanov merged 1 commit into
mainfrom
feat/cli-capture-video
Jun 15, 2026
Merged

feat(cli): capture-video on-demand fetcher + capture pipeline robustness#1447
ukimsanov merged 1 commit into
mainfrom
feat/cli-capture-video

Conversation

@ukimsanov

@ukimsanov ukimsanov commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

What

  • New hyperframes capture video <project> subcommand — on-demand mp4 fetch from the capture manifest.
  • Capture pipeline robustness: structural logo signals, content-hash SVG slugs, SVG → PNG rasterization for Gemini Vision, regex-escape fix.
  • New lintMissingLocalAsset rule — flags <video> / <img> / <source> src pointing at missing local files.

Why

heygen-logo.svg on main today contains Google's wordmark — label-derived slugs picked the wrong DOM context. The new website-to-video skill inherits this and would ship off-brand videos. Same root for: class-substring isLogo catching 0/32 SVGs on heygen.com, Vision hallucinating wordmarks from raw SVG path text, and capture being 100% blocked by a \/ regex collapse in page.evaluate.

How

  • capture video: nested under capture per the team's auth/cloud/lambda subcommand pattern. safeFetch (SSRF + redirect-hop revalidation), 250 MB cap, content-type whitelist, flag: "wx" exclusive-create write, lookup by entry.index (not array offset — manifest can have gaps).
  • Content-hash SVG slugs (svg-<8-char-sha1>.svg) so filename can never drift from content.
  • sharp rasterizes SVGs to PNG before Vision; polarity detection flips white-glyph SVGs to a dark background.
  • lintMissingLocalAsset masks <!-- --> / <style> / <script> ranges before scanning so commented-out examples aren't false positives.

Test plan

  • bun run --filter @hyperframes/cli typecheck + test pass (17 capture/video cases + 51 lintProject cases)
  • Manual: captured 9 sites (heygen, vercel, stripe, linear, notion, huggingface, remotion, posthog, anthropic, airbnb) — all clean; brand mark findable via description grep on each
  • Manual: hyperframes capture video <dir> --list + --index 0 downloads mp4 from heygen + notion + webm from airbnb — all valid per file(1)

Comment thread packages/cli/src/commands/capture/video.ts Fixed
@ukimsanov ukimsanov force-pushed the feat/cli-capture-video branch 4 times, most recently from e4e8b44 to b56b7c5 Compare June 14, 2026 23:47
Comment thread packages/cli/src/cli.ts Outdated
Comment on lines +137 to +138
capture: () => import("./commands/capture.js").then((m) => m.default),
"capture-video": () => import("./commands/capture-video.js").then((m) => m.default),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if capture-video could be done via a subcommand of capture?

capture video

or capture --video

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah actually capture video should work

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me do that real quick

@ukimsanov ukimsanov force-pushed the feat/cli-capture-video branch from b56b7c5 to 8404517 Compare June 15, 2026 05:23
Comment thread packages/cli/src/commands/capture/video.ts Fixed
@ukimsanov ukimsanov force-pushed the feat/cli-capture-video branch 2 times, most recently from 51c9c6c to 343d5ac Compare June 15, 2026 05:43
@ukimsanov ukimsanov requested a review from jrusso1020 June 15, 2026 05:49

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified at HEAD 343d5ac — layering on @via's review at b56b7c5 and James's inline ask.

Verdict: approve-with-concerns. James's "make it a capture subcommand" landed cleanly; 2 of @via's 6 items resolved, 2 partial, 2 open. None blocking, but the buffering-vs-streaming gap is worth a follow-up since it shifts from "cosmetic duplication" to "actual OOM surface" on a hostile CDN.


Status table

James's askcapture-videocapture video subcommand

subCommands: { video: ... } wired in capture.ts; "capture-video" entry removed from cli.ts registry; file moved to commands/capture/video.ts; example updated to hyperframes capture video ./linear-video --index 0. Clean follow-through.

@via's 6 findings at HEAD

# Finding Status Notes
1 Duplicate download primitive (buffers full 250 MB via arrayBuffer() vs cloud/download.ts:downloadToFile's streaming) ❌ open Still Buffer.from(await r.arrayBuffer()) at video.ts:55. See concern below — this is now a real safety gap, not just a duplication.
2 run() untested (manifest discovery, --list, EEXIST short-circuit, exit codes) ❌ open video.test.ts still covers 4 helpers; integration surface uncovered. Acceptable for an author-time CLI but worth a follow-up.
3 maskNonScannableRanges masking has no regression test ✅ resolved New case at lintProject.test.ts: "does not flag <img>/<video> tokens inside <!-- -->, <style>, or <script>". Good.
4 SVG polarity light/dark fill regex asymmetric ⚠️ partial Light glyph regex broadened to #fff(fff)?|white|#[ef][ef][ef]|#[ef]{6} — now catches #eee, #efe, etc. Still misses CSS named colors (gainsboro, silver), rgb()/hsl(), and class-driven fills. Light/dark sides still asymmetric in shape, but the heygen-on-#eee failure mode @via flagged is now caught.
5 Sharp failure swallows to empty caption silently ⚠️ partial Now increments svgsSkipped and emits progress("design", "skipped rasterizing N SVG(s) — fell back to label-derived") after the loop. User learns something failed. Open: bare catch {} still drops the actual error, so when one user reports "Vision says X" you can't diagnose which SVG broke or why. Suggest logging err.message per skipped file at progress("design", ...) verbosity.
6 CodeQL network data written to file needs suppression w/ rationale ✅ resolved video.ts:223 now reads // lgtm[js/http-to-file-access] — manifest-vetted URL, content-type whitelist, 250MB cap, SSRF-safe fetch. Rationale explicit.

My additional findings (independent read)

Concerns

video.ts:55Buffering 250 MB before checking byteLength is a real OOM surface, not just cosmetic. MAX_VIDEO_BYTES content-length pre-check (line ~38) is advisory: a hostile or misconfigured CDN can send content-length: 100 and then stream 2 GB. The post-arrayBuffer() byteLength check fires AFTER node has buffered the whole response — process memory has already spiked. Streaming with a running byte cap (the shape cloud/download.ts:downloadToFile uses) would refuse mid-stream. This strengthens @via's #1 from "duplicate primitive" to "shared streaming primitive would actually be safer." Not blocking for v1 against trusted CDNs, but flag for a follow-up before this is used against arbitrary user-controlled manifests.

video.ts:217safeFilename is non-injective; collisions silently EEXIST. decodeURIComponent + [^A-Za-z0-9._-]+_ collapse means "hero 1.mp4" and "hero_1.mp4" both produce "hero_1.mp4". If a manifest has two such entries, the second writeFileSync({ flag: "wx" }) throws EEXIST and the code logs "already downloaded: ... (skipping)" — misleading because the bytes on disk are entry A's, not entry B's. The PR's own test covers "a b___c" → "a_b___c", which proves the collapse is lossy. Suggest prefixing the filename with entry.index (e.g. 0-hero_1.mp4) so the manifest's own gap-tolerant key disambiguates on disk too.

commands/capture.ts:56Manifest schema is duck-typed. JSON.parse then directly indexed via manifest.map((e) => e.index) etc. If capture's manifest shape ever changes (extra fields fine, but renamed indexentryIndex, or wrapped in { entries: [...] }), capture-video silently breaks or misreports. No schemaVersion on the manifest, no shape check on read. Author-time tool, so a runtime crash is acceptable — but a one-line "expected manifest version N" check on read would future-proof this cheaply.

Nits

commands/capture.ts:56if (url === "video") return; early-return relies on citty's "parent run fires after subcommand routing" behavior. Comment is helpful but consider: if a user ever captures https://video.example.com and the URL is normalized somewhere upstream, this could short-circuit incorrectly. Worth a if (args._?.[0] === "video") return;-style guard against subCommand name only (not URL value). (nit)

contentExtractor.ts:285 — bare catch {} (no (err)) means sharp's real failure cause is unrecoverable for diagnosis. Combined with the new svgsSkipped count, you get "I skipped 3 SVGs" but no breadcrumb to which SVGs or why. Pair with @via's #5 fix: at minimum catch (err) and emit a progress("design", ...) line per file. (nit, builds on Via #5)

video.ts:223 — synchronous writeFileSync of a 250 MB Buffer blocks the event loop for the duration of the disk write. Fine for a CLI, but if this primitive ever gets reused server-side, prefer the fs/promises writeFile async version. (nit, forward-looking)

Questions

video.ts:38application/octet-stream in the accept-list is broad. A lot of static-asset CDNs do send application/octet-stream for .mp4, so this is pragmatic — but it also means any 500-page HTML error served with that mimetype passes the check. Do you have a fallback "first 4 bytes are an mp4/webm/mov magic header" sniff, or is the URL trust model strong enough that magic-byte verification isn't worth the complexity?

• Cross-package coupling — @via flagged the download primitive duplication. Are there other ones emerging? sharp is now used in contentExtractor.ts for SVG rasterization; is it used anywhere else in the cli/core packages? If so, a packages/core/src/imageOps.ts-shaped landing zone might be worth carving before the third caller appears.


What I didn't verify

• I didn't run the new test suite locally — assumed the green CI claim in the PR body is accurate.
• I didn't manually exercise capture video against a real captured project; trusted the manual test plan in the PR body and the helper-level unit coverage.
• I didn't trace the new beats CLI surface that landed alongside this PR (visible in the b56b7c5..343d5ac compare diff) — it's adjacent feature work, not the capture-video review scope. Worth a separate look if it's in this PR's surface.

Review by Rames D Jusso

@ukimsanov ukimsanov force-pushed the feat/cli-capture-video branch from 343d5ac to eada860 Compare June 15, 2026 08:11
`video exceeded ${Math.round(MAX_VIDEO_BYTES / 1024 / 1024)}MB cap mid-stream for ${url}`,
);
}
if (!file.write(chunk)) {

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified at HEAD eada860 — R3 layering on @via's R1 (b56b7c5) and my R2 (343d5ac).

Verdict: approve. The latest push directly addressed the two strongest follow-ups from R2 (OOM-via-arrayBuffer and silent collision misreport). The remaining gaps are minor and non-blocking.


R2 follow-up status at HEAD

# R2 finding Status Notes
1 Buffers full 250 MB via arrayBuffer() — hostile-CDN OOM surface resolved video.ts:25-100 — full rewrite to streamToFile with createWriteStream({ flags: "wx" }), mid-stream bytes > MAX_VIDEO_BYTES abort, partial-file unlink on non-EEXIST error, backpressure via drain, single shared streamErrored promise to avoid MaxListeners. Solid implementation. Reuses safeFetch from assetDownloader.js. Code-org duplication with cloud/download.ts:downloadToFile is still technically present, but the safety goal is met — calling this resolved, not "resolved-differently," because the threat model is satisfied. A future de-dup pass can extract a streamHttpToFile(url, dest, {maxBytes, contentTypeRe, safeFetch}) shared helper.
2 run() integration untested (manifest discovery, --list, EEXIST short-circuit, exit codes) still open video.test.ts still covers 4 pure helpers + the new findFilenameCollision. No integration coverage for the run-body branches (W2H-vs-standalone layout discovery, malformed-JSON exit code, collision-refusal exit code, list output). Acceptable for an author-time CLI; flag for a follow-up.
4 SVG polarity light-fill regex still misses named colors / rgb() ⚠️ partial (unchanged) contentExtractor.ts:262 unchanged. Heygen-on-#eee failure mode caught; named-color / rgb() / class-driven fills still missed. Same severity as R2.
5 catch {} swallows sharp error context ⚠️ partial (unchanged) contentExtractor.ts:273 still bare catch {}. svgsSkipped counter still emits the aggregate count via progress(...). User learns something failed; per-file diagnostic still missing. Same severity as R2.
M2 safeFilename collisions silently misreport via EEXIST resolved New findFilenameCollision helper at video.ts:139-149 + check at video.ts:264-275 before download. Fails loudly with all colliding indices listed instead of silently EEXIST-skipping. Test coverage at video.test.ts:140-158 (collision detection + self-exclusion). Clean fix — better than my suggested entry.index-prefix because it surfaces the manifest authoring bug rather than papering over it.
M3 Manifest schema duck-typed, no version field still open commands/capture/video.ts:228-232 still does raw JSON.parse + direct field access. Author-time tool, runtime crash on schema drift is acceptable; flag for a follow-up if this surface grows.

New observations at HEAD

Nits

video.ts:80-87file.once("drain", ...) is registered inside a backpressure-loop iteration, which means each slow chunk attaches a fresh drain listener. Node clears once listeners on fire, so the leak is bounded, but on a heavily-throttled CDN you can briefly see N parked listeners. Consider hoisting to a shared waitForDrain() helper that re-uses one resolver. Non-blocking. (nit)

video.ts:53AbortSignal.timeout(120_000) is total request timeout, but streaming a 250 MB video at slow 3G could legitimately exceed 2 minutes. Consider replacing with an idle-timeout pattern (reset timer on each chunk) if real users report aborts. Forward-looking. (nit)

video.ts:48-51content-length > MAX_VIDEO_BYTES pre-check now serves only as an early-exit optimization (mid-stream cap is the real safety). Fine to keep, but the comment at line 48 could note "early-exit only; mid-stream cap below is authoritative" so a future reader doesn't try to harden it. (nit)

Questions

video.test.ts — net +0 lines vs R2 (additions: 0 in the R2→HEAD compare). The new collision helper added a test class, but the run()-body branches Via flagged are still uncovered. Was there a deliberate scope decision to defer integration tests, or is that a follow-up PR you'd take?

• The streamToFile implementation is good enough to be the shared primitive. Would you take an immediate follow-up that extracts both this and cloud/download.ts:downloadToFile into a single helper, or is the duplication acceptable until a third caller appears?


What I didn't verify

• Didn't re-run the new test suite locally — assumed CI green.
• Didn't trace whether file.destroy() interacts cleanly with the partial-file unlinkSync race on Windows (the EEXIST-vs-other-error branch at lines 96-103). Linux/macOS should be fine.
• Didn't audit the contentExtractor SVG-polarity changes against new failure cases beyond R2 — same code path.

Review by Rames D Jusso

@jrusso1020 jrusso1020 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved on eada860. Clean R1→R2→R3 with the OOM-via-arrayBuffer + filename-collision concerns both resolved.

@ukimsanov ukimsanov force-pushed the feat/cli-capture-video branch from eada860 to dda77c3 Compare June 15, 2026 08:23

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified delta at HEAD dda77c3, R3 baseline was eada860.

R4 — narrow review of the 3 new structural logo signals in assetCataloger.ts (lines 28–33, 71–83, 119–122, 348–351) and their consumer in assetDownloader.ts:144–155.

Verdict: approve-with-concerns. Signals are well-scoped, deterministic, side-effect-free, and the blast radius is intentionally tiny — they OR into isLogo which only flips the SVG filename slug from svg-<hash> to logo-<hash>. Nothing rasterization-path or polarity-detection adjacent; this does NOT subsume Via's regex-based light-glyph detection (orthogonal axis: "is logo at all" vs. "light-on-dark polarity").

Signal-by-signal semantics:

  • inBanner (closest('header, nav, [role="banner"]')) — clean. FP: nav-bar avatars / hamburger icons. FN: legacy SPAs that don't use semantic header/nav. Acceptable for candidate semantics.
  • inHomeLink — covers /, #, ./, and origin-only via /^https?:\\/\\/[^/]+\\/?$/. Regex is correctly double-escaped for the template-string → page.evaluate boundary (verified — \\/ in the .ts source becomes \/ post-template-literal-processing, which is a valid escaped / in the eval'd regex literal). FN worth knowing: React-router <Link to="/"> that renders without an href attribute, and <button onclick>-based home navigation.
  • matchesTitleBrandasymmetric assumption: (document.title || '').split(/[-|—]/)[0] takes the FIRST segment as the brand. Works for "HeyGen | Product" (brand-first, the convention). Fails for the equally-common "Page Title | HeyGen" (brand-last). On a brand-last site, this signal silently flips to false-positive territory — alt-text containing the page-title phrase wrongly counts as brand-match. Length guard (> 1 && < 30) bounds the damage but doesn't eliminate it. Not a blocker since it only affects a filename slug, but worth a one-line "we're betting on brand-first titles" comment so future-you knows what to revisit if logo classification ever leaks into more consequential code paths.

Concerns (non-blocking):

  • No test coverage on the new signals. assetCataloger.test.ts exists but adds nothing for inBanner / inHomeLink / matchesTitleBrand. These are DOM-evaluated inside a page.evaluate(\...`)` template string, so unit testing means a jsdom harness or Playwright fixture — non-trivial. At minimum, one positive + one negative for each signal would lock the semantics. Given the contained blast radius (slug naming only) this is a concern, not a blocker.
  • Merge-step boolean OR is correct but worth flagging: across both add() (line 119) and deduplicateSrcsetVariants() (line 348), any positive sample sets the field. Means if the SAME asset URL appears once in a banner and once in a footer, it's flagged as logo from the banner instance. That's the intended behavior and the comment at line 114 documents it — good.

Determinism / side-effects / types: all clean. No Date.now/Math.random/fetch in the cataloger path. Fields are ?: boolean optional, backwards-compat with existing manifest consumers preserved.

Green-light from me for Rames Jusso's stamp.

Review by Rames D Jusso

@jrusso1020 jrusso1020 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approved on dda77c3 after R4 delta verification.

For the hyperframes.dev website-to-video flow. Real-AI-test runs against
heygen.com, huly.io, and heygen-showcase surfaced two gaps: (1) capture's
logo / asset-captioning signals missed modern React/Tailwind builds; and
(2) there was no CLI surface to pull the videos the manifest references.

New command:

  • `hyperframes capture-video <project>` — on-demand downloader for
    entries in capture/extracted/video-manifest.json. Capture writes the
    manifest + preview PNGs but skips the mp4s; this pulls one entry by
    `--index N` (matched against the entry's `index` field, NOT array
    offset — gaps are possible when a preview screenshot fails). SSRF-safe
    via safeFetch, 250 MB cap, content-type whitelist, race-free
    exclusive-create write. Layout-aware (handles both standalone capture
    and W2H project layouts).

Capture pipeline fixes:

  • Structural logo signals (assetCataloger + tokenExtractor): inBanner /
    inHomeLink / matchesTitleBrand. Class-substring alone caught 0/32 SVGs
    on heygen.com — modern builds don't put 'logo' / 'brand' in any
    className.

  • Content-hash SVG slugs (assetDownloader): `svg-<8char-sha1>.svg` —
    label-derived slugs mis-attributed partner-logo carousels
    (heygen-logo.svg actually contained Google, hubspot-logo.svg contained
    Trivago, etc.). Content-hash names are invariant by construction.

  • SVG → PNG rasterization before Gemini Vision (contentExtractor): the
    raw-SVG-as-text path was hallucinating wordmarks (VIVIENNE for HubSpot,
    'wrestling' for Workday). Adds polarity detection so a white-glyph SVG
    flattened to a blank PNG gets inverted before captioning. LOGO tag in
    asset-descriptions.md when structural signals fire (independent of
    Gemini key presence).

  • Double-escape \/ inside the page.evaluate template literal in
    assetCataloger + tokenExtractor: the original `/^https?:\/\/.../`
    collapsed to `/` mid-template and threw `Unexpected token ^`. Capture
    was 100% blocked on this until the escape was fixed.

  • `asset-descriptions.md` header branches on Gemini-key presence with
    an explicit 'Vision OFF — catalog-derived descriptions' warning.

New lint rule:

  • `lintMissingLocalAsset` (cli/utils/lintProject): scans <video> / <img>
    / <source> src for local files that don't exist in the project.
    Empirically the most common sub-agent mistake across multi-URL runs
    (~5+ per run). Uses `resolveExistingLocalAsset` so the existence check
    matches the bundler's notion of 'resolves'. Masks comment / style /
    script ranges before scanning so a literal `<img src=missing.png>`
    inside a tutorial comment isn't reported.

Tests: 17 new for capture-video (safeFilename decoding/sanitization,
VIDEO_CONTENT_TYPE_RE accept/reject, pickManifestEntry index-field lookup
with gaps, URL-mismatch + bad-index rejection, --index over --url
priority); 70 cases under lintProject.test.ts covering the new rule and
existing rules.

Sibling PRs in this stack:
  • #PR_A1 — fix(producer): __dirname ESM banner shim
  • #PR_A2 — fix(core/lint): findRootTag masks comment/style/script
@ukimsanov ukimsanov force-pushed the feat/cli-capture-video branch from dda77c3 to 6a024a3 Compare June 15, 2026 08:41

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R5 delta review — verified at HEAD 6a024a3, R4 baseline was dda77c3.

Verdict: approve-delta with one minor caveat

R4 concern #1 — brand-first title heuristic — ✅ fixed

packages/cli/src/capture/assetCataloger.ts:79-90 — split widened to /[-|—:]/ and iterates all titleParts instead of [0]. Brand-last ("Ideas - HeyGen") and colon-separated ("Vercel: Build") titles now match. Length guard 1 < part.length < 30 filters out empty/runaway segments. Good fix.

Minor caveat (not a blocker): the desc.indexOf(part) substring check is still noisy — a non-brand segment like "Ideas" will match an alt containing "ideas". Since this is a boolean signal OR-merged across samples and consumed by a downstream classifier (not a hard gate), false positives are bounded. Worth a one-line comment that this is a recall-biased signal, not a precision one — or leave as-is and re-tune if logo-detection precision suffers in the wild.

R4 concern #2 — tests for new logo signals — ⚠️ not added

New tests landed (commands/capture/video.test.ts, utils/lintProject.test.ts) but cover the capture-video fetcher + lintProject, not the new inBanner / inHomeLink / matchesTitleBrand signals in assetCataloger.ts. The signal logic runs inside page.evaluate(), so unit-testing it requires a DOM harness that may not exist in this package — acceptable trade-off for a heuristic-tuning PR, but worth tracking as a follow-up (extract the signal computation into a pure helper testable against a fixture DOM).

New code from this push — no concerns

  • assetCataloger.ts:122-124 and :357-360 — boolean-signals merge with "any positive wins" semantics is correct (text fields keep first-occurrence, booleans union across srcset variants).
  • New capture/video.ts + tests are scope-additive to the PR's main capture-video feature, not the logo-detection workstream.

Green-light to stamp.

Review by Rames D Jusso

@jrusso1020 jrusso1020 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approved on 6a024a3 after R5 delta verification.

@ukimsanov ukimsanov merged commit 3bfb25e into main Jun 15, 2026
36 checks passed
@ukimsanov ukimsanov deleted the feat/cli-capture-video branch June 15, 2026 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants