feat(scraper-run): multi-URL input via --urls and --input-file by anil-bd · Pull Request #8 · brightdata/cli

anil-bd · 2026-05-25T06:15:18Z

Summary

bdata scraper run accepted only one URL per call. The reference Scraper Studio SDKs (Node + Python) treat batch input as the default pattern — they default to a 3-URL SAMPLE_URLS array and ship triggerWithUrls(urls) / trigger_with_urls(urls) helpers that POST the whole array to /dca/trigger in one request. The CLI was the outlier.

This PR exposes that path. A list of URLs becomes one API call, one snapshot, one merged result array.

Node reference → triggerWithUrls(urls)
Python reference → trigger_with_urls(urls)

New flags on `scraper run`

Flag	Description
`--urls "u1,u2,..."`	Comma-separated list of URLs
`--input-file <path>`	File with URLs — one per line (# comments + blanks skipped), OR a JSON array of strings, OR a JSON array of `{"url": "..."}` objects (auto-detected by first char)

Positional <url> is now optional but otherwise unchanged. Exactly one input source must be provided; combining sources errors with only one input source.

Routing

Input	Path	Endpoint
0 URLs	error	—
1 URL (positional, or `--urls` / `--input-file` with one entry)	existing single-URL flow	`/dca/trigger_immediate` → `/dca/get_result` (or `/dca/crawl` with `--sync`)
2+ URLs (`--urls` / `--input-file`)	new multi-URL batch	single POST to `/dca/trigger` with array body → poll `/dca/dataset`

--sync is rejected when combined with multi-URL — /dca/crawl accepts only one URL server-side. Clear error message: --sync cannot be combined with --urls / --input-file.

The auto-fallback to /dca/trigger on realtime page-limit errors is unchanged.

Backward compatibility

Existing bdata scraper run <id> <url> calls behave identically.
All 45 pre-existing scraper tests pass unchanged.
The run_batch helper was generalized from url: string to urls: string[]; its only two callers (sync and async fallback paths) wrap the single URL in an array — same wire shape as before ([{"url": "..."}]).

Tests

27 new cases (72 total in scraper.test.ts):

is_valid_url, parse_urls_arg — input parsing primitives
read_input_file — newline txt, JSON array of strings, JSON array of {url} objects, # comments, malformed JSON, non-array JSON, missing file, empty file
resolve_run_inputs — positional / --urls / --input-file happy paths, mutual exclusion, empty after parsing, invalid URL surfaced by name
handle_run_scraper multi-URL — correct endpoint (/dca/trigger), correct array body ([{url}, {url}, {url}]), --sync rejection with clear message, missing-input rejection, single URL via --urls still uses the legacy single path

Docs

README.md — scraper run section rewritten with the new flags, the routing table, and three new examples (--urls, --input-file txt, --input-file JSON).

Out of scope (suggested follow-up)

bdata pipelines <type> has the same gap — same underlying /dca/trigger endpoint, same single-URL CLI surface. Worth a parallel PR if there's appetite.

Test plan

tsc --noEmit clean
vitest run src/__tests__/commands/scraper.test.ts — 72/72 pass
Smoke test against a real collector: bdata scraper run <id> --urls "u1,u2,u3" --pretty returns 3 records in a single array
Smoke test --input-file urls.txt (txt) and urls.json (JSON array)
Smoke test --sync --urls "..." returns the rejection error without making any API call

🤖 Generated with Claude Code

`bdata scraper run` accepted only one URL per invocation; for N URLs users had to spawn N processes (or N HTTP calls), each producing its own snapshot ID and its own poll loop. The underlying `POST /dca/trigger` endpoint natively accepts an array body, and the official Scraper Studio reference SDKs (Node + Python) ship this as their canonical helper: - https://github.com/brightdata/bright-data-scraper-studio-nodejs-project → triggerWithUrls(urls) - https://github.com/brightdata/bright-data-scraper-studio-python-project → trigger_with_urls(urls) This change exposes that path in the CLI so a list of URLs becomes one API call, one snapshot, one merged result array. New flags on `scraper run`: --urls "u1,u2,..." Comma-separated list of URLs. --input-file <path> File with URLs — one per line (# comments and blank lines skipped), OR a JSON array of URL strings, OR a JSON array of {url} objects (auto-detected by first char). The positional `<url>` argument is now optional but otherwise unchanged. Exactly one input source must be provided; passing more than one errors with "only one input source". Routing: - 0 URLs → error, "requires one of: <url>, --urls, --input-file" - 1 URL (any source) → existing single-URL path (trigger_immediate → poll get_result, or --sync /dca/crawl). Auto-fallback to /dca/trigger on realtime page-limit error stays in place. - 2+ URLs (--urls / --input-file) → single POST /dca/trigger with array body, poll /dca/dataset for the merged array. `--sync` is rejected here with a clear error since /dca/crawl is single-URL only server-side. Tests: 27 new cases covering parse_urls_arg, read_input_file (txt/JSON shapes + comments + malformed JSON + non-array JSON), resolve_run_inputs (mutual exclusion + empty + invalid URL), and the multi-URL handle_run_scraper flow (correct endpoint, correct array body, --sync rejection, single-URL via --urls still uses the legacy path). Backward compatible: all 45 existing scraper tests pass unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

anil-bd mentioned this pull request May 25, 2026

docs(scraper-studio): document --urls and --input-file batch flag brightdata/skills#21

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(scraper-run): multi-URL input via --urls and --input-file#8

feat(scraper-run): multi-URL input via --urls and --input-file#8
anil-bd wants to merge 1 commit into
brightdata:mainfrom
anil-bd:feat/scraper-run-multi-url

anil-bd commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anil-bd commented May 25, 2026

Summary

New flags on scraper run

Routing

Backward compatibility

Tests

Docs

Out of scope (suggested follow-up)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New flags on `scraper run`