feat(cli): add --batch-size parameter to prevent context leaking#2055

Open

Hellnight2005 wants to merge 1555 commits intolingodotdev:mainfrom

Hellnight2005:fix/1733-batch-size

Hellnight2005 commented Mar 24, 2026 •

edited by coderabbitai bot

Loading

Summary

Implements the --batch-size parameter for CLI commands to control the number of translation strings sent to the LLM per request, preventing context leaking of translator hints between unrelated keys.

Changes

Added --batch-size <number> flag to the run and i18n CLI commands.
Updated createBasicTranslator in src/cli/processor/basic.ts to break down and process inputs into configurable chunks.
Updated createExplicitLocalizer in src/cli/localizer/explicit.ts to execute batches properly and consolidate the chunked responses.
Fixed a bug causing the pseudo-localizer to crash by implementing the missing validateSettings method.
Handled edge cases for JSON.parse exceptions gracefully using jsonrepair in the AI feedback loop.

Testing

Business logic tests added:

Verified chunking logic correctly isolates inputs when batchSize is strictly set to 1.
Verified the default behavior reliably places smaller files into a single batch without chunking unnecessary data boundaries.
All tests pass locally (verified the 846 unit tests in packages/cli succeed).

Visuals

Required for UI/UX changes:

[N/A] Before/after screenshots attached (CLI backend behavior only)
[N/A] Video demo for interactions (< 30s)

Checklist

Changeset added (if version bump needed)
Tests cover business logic (not just happy path)
No breaking changes (or documented below)

Closes #1733

Summary by CodeRabbit

New Features
- Added --batch-size CLI option to run and i18n commands, allowing users to customize translation batch processing with flexible batch sizes between 1 and 250
- Implemented intelligent chunked processing during AI-driven localization workflows with real-time progress reporting for each batch being processed
Tests
- Added comprehensive test coverage for batch processing behavior and correctness

github-actions bot and others added 30 commits

November 25, 2025 17:07


          chore: bump package version (lingodotdev#1638)

363754f

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>


          feat: yaml root key preserve formatting (lingodotdev#1640)

80bcbe4

* feat: preserve formatting in yaml files

* feat: preserve mixed key quoting

* fix: yaml-room-key detection

* chore: add changeset


          chore: bump package version (lingodotdev#1641)

a48fdce

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>


          fix: yaml preserve formatting (lingodotdev#1642)

9f429c6

* fix: preserve formatting in YAML files

* chore: add changeset


          chore: bump package version (lingodotdev#1643)

8a70413

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>


          fix: yaml list format (lingodotdev#1644)

* fix: preserve list format in YAML files

* chore: add changeset


          chore: bump package version (lingodotdev#1645)

5fccf51

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>


          feat: HTML loader localize logical blocks together (lingodotdev#1639)

a881f81

* feat: localize logical blocks together

* fix: localize noscript, and text inside inner tags

* feat: allow localizing HTML blocks not just entire docs

* chore: add changeset


          chore: bump package version (lingodotdev#1646)

19a71eb

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>


          fix: prevent html duplication in android bucket (lingodotdev#1647)

a9e1af5

* fix: prevent HTML tag duplication in Android XML strings

* chore: add changeset


          chore: bump package version (lingodotdev#1648)

c9ca78d

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>


          feat: add mjml bucket (lingodotdev#1628)

ad646a4

* feat: add mjml bucket

* fix(mjml): content extraction to prevent duplication

* chore: mjml examples

* chore: add changeset

* chore: add missing i18n.lock for mjml demo

- Add i18n.lock file for mjml demo bucket to pass tests

* chore: clean up i18n.json

* chore: clean up code

* chore: refactor code


          chore: bump package version (lingodotdev#1652)

a495c1e

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>


          chore: add twig demo

7121f60


          feat: add twig bucket (lingodotdev#1653)

f6352b6


          chore: bump package version (lingodotdev#1654)

580e711

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>


          feat: add ail bucket support and demo (lingodotdev#1655)

738bf08


          chore: upd twig i18n lock (lingodotdev#1656)

4a4b03e

* chore: upd twig lockfile

* chore: empty changeset


          chore: bump package version (lingodotdev#1657)

d744592

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>


          fix: mjml format issue (lingodotdev#1658)

77cf56e

* fix: mjml format issue

* chore: add changeset


          chore: bump package version (lingodotdev#1659)

809f5b1

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>


          chore(deps): bump next to non-vulnerable versions (lingodotdev#1660)

1b2980d

* chore(deps): bump next to non-vulnerable versions

* chore: add changeset


          chore: bump package version (lingodotdev#1663)

a8f85e7

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>


          fix: upd react version (lingodotdev#1665)

b898777

* fix: upd react version

* chore: add changeset


          chore: bump package version (lingodotdev#1666)

25edd03

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>


          fix: upd npm publish workflows (lingodotdev#1667)

1a857bd

* fix: upd npm publish workflows

* chore: add changeset


          chore: bump package version (lingodotdev#1668)

dd13b54

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>


          fix: upd npm (lingodotdev#1669)

0b4be79


          fix: add repository field to package.json for npm provenance

4f485e9

Add repository metadata to all published packages to fix trusted publishers
provenance verification. npm requires repository.url in package.json to match
the GitHub repository URL from the provenance attestation.

Fixes E422 error: "Error verifying sigstore provenance bundle: Failed to
validate repository information: package.json: \"repository.url\" is \"\",
expected to match \"https://github.com/lingodotdev/lingo.dev\" from provenance"


          fix: xcode xcstrings encoded keys (lingodotdev#1664)

7367bee

* fix: lock ignore keys with whitespaces

* chore: add changeset

* chore: upd tests

vrcprl requested a review from AndreyHirsa

March 25, 2026 20:47

coderabbitai bot reviewed

View reviewed changes

Contributor

coderabbitai bot left a comment

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/cli/src/cli/processor/basic.ts (1)
98-127: ⚠️ Potential issue | 🟠 Major

--batch-size still counts top-level objects, not logical translation entries.

This helper now explicitly accepts nested payloads, but it still batches on Object.entries(payload). A namespace like { common: { save: "...", cancel: "..." } } is treated as one item, so --batch-size 1 can still send multiple keys together and miss the per-key isolation this flag is meant to provide. Split on logical translation entries instead of only top-level properties.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cli/src/cli/processor/basic.ts` around lines 98 - 127,
extractPayloadChunks currently counts only top-level keys so nested translation
entries (e.g., {common:{save:...,cancel:...}}) are treated as one item; change
extractPayloadChunks to iterate logical translation entries instead: first
flatten the payload into an array of entries representing each translation key
path and value (e.g., ["common.save", value] or path array), then build chunks
by adding one logical entry at a time and reconstructing the nested object shape
when pushing currentChunk; keep the same batching logic (use countWordsInRecord
to measure size and effectiveBatchSize for item counting) but increment
currentChunkItemCount per logical entry rather than per top-level property, and
add a small helper (or inline logic) to set nested values into currentChunk when
adding an entry so output chunks preserve the original nested structure.

🧹 Nitpick comments (1)

packages/cli/src/cli/processor/basic.ts (1)

5-8: Keep batchSize out of the AI SDK call settings.

batchSize is internal batching state, but the same settings object is still reused later for generateText(...). That leaks a CLI-only flag into provider request options and makes this path depend on adapters tolerating unknown fields. Split batchSize from the actual model settings before building the request.

Suggested change

 export function createBasicTranslator(
   model: LanguageModel,
   systemPrompt: string,
   settings: ModelSettings = {},
 ) {
+  const { batchSize, ...modelSettings } = settings;
+
   return async (input: LocalizerInput, onProgress: LocalizerProgressFn) => {
     const chunks = extractPayloadChunks(
       input.processableData,
-      settings.batchSize,
+      batchSize,
     );
@@
     const response = await generateText({
       model,
-      ...settings,
+      ...modelSettings,
       messages: [

Also applies to: 16-18

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/cli/src/cli/processor/basic.ts` around lines 5 - 8, The
ModelSettings type currently includes batchSize which is internal and is being
passed through into provider calls; update the code so you split out batchSize
from the settings object before building SDK request options: keep ModelSettings
as-is for typing if desired but when preparing the call (where you pass settings
into generateText / the provider request) destructure like `const { batchSize,
...modelSettings } = settings` and use modelSettings for the AI SDK/generateText
call and batchSize only for local batching logic; ensure any place that forwards
`settings` to the provider (e.g., the generateText invocation) uses the cleaned
modelSettings to avoid leaking the CLI-only flag.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/cli/src/cli/localizer/explicit.ts`:
- Around line 199-202: The current chunking leaks sibling hints because
extractPayloadChunks splits only on top-level entries so chunkHints carries
entire top-level subtrees; update the chunking logic (extractPayloadChunks and
the code that builds chunkHints) to operate at the leaf/key-path level: when
batching, enumerate leaf key paths from processableData and create chunks of
those full paths respecting params.batchSize, and when building chunkHints
prune/copy only the hint nodes that correspond to the exact leaf key paths
included in that chunk (not the entire top-level subtree). Ensure functions
referenced (extractPayloadChunks, any chunkHints builder code that consumes
processableData) use key-path traversal rather than top-level splitting so
--batch-size 1 isolates sibling keys and prevents hint bleed.
- Around line 278-324: The code currently only inspects result.data and drops
top-level parsed payloads; change the logic to first normalize the parsed output
into a single variable (e.g., rawData = result?.data ?? result) and then run the
existing object/string handling against rawData instead of result.data so plain
objects or top-level JSON strings aren't ignored; update all subsequent
references (the object branch, string branch, error messages and places using
result.data) to use rawData, keeping parseModelResponse, finalResult and
jsonrepair behavior intact and preserving the same error/snippet reporting that
includes params.id.

---

Outside diff comments:
In `@packages/cli/src/cli/processor/basic.ts`:
- Around line 98-127: extractPayloadChunks currently counts only top-level keys
so nested translation entries (e.g., {common:{save:...,cancel:...}}) are treated
as one item; change extractPayloadChunks to iterate logical translation entries
instead: first flatten the payload into an array of entries representing each
translation key path and value (e.g., ["common.save", value] or path array),
then build chunks by adding one logical entry at a time and reconstructing the
nested object shape when pushing currentChunk; keep the same batching logic (use
countWordsInRecord to measure size and effectiveBatchSize for item counting) but
increment currentChunkItemCount per logical entry rather than per top-level
property, and add a small helper (or inline logic) to set nested values into
currentChunk when adding an entry so output chunks preserve the original nested
structure.

---

Nitpick comments:
In `@packages/cli/src/cli/processor/basic.ts`:
- Around line 5-8: The ModelSettings type currently includes batchSize which is
internal and is being passed through into provider calls; update the code so you
split out batchSize from the settings object before building SDK request
options: keep ModelSettings as-is for typing if desired but when preparing the
call (where you pass settings into generateText / the provider request)
destructure like `const { batchSize, ...modelSettings } = settings` and use
modelSettings for the AI SDK/generateText call and batchSize only for local
batching logic; ensure any place that forwards `settings` to the provider (e.g.,
the generateText invocation) uses the cleaned modelSettings to avoid leaking the
CLI-only flag.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0194db5f-69f6-420b-9b1d-0f02f2f07e99

📥 Commits

Reviewing files that changed from the base of the PR and between c916e41 and 7d2b6ef.

📒 Files selected for processing (2)

packages/cli/src/cli/localizer/explicit.ts
packages/cli/src/cli/processor/basic.ts

packages/cli/src/cli/localizer/explicit.ts

packages/cli/src/cli/localizer/explicit.ts

Comment on lines +278 to +324

+                      let result: any;
+                      try {
+                        result = parseModelResponse(response.text);
+                      } catch (e2) {
+                        const snippet =
+                          response.text.length > 500
+                            ? `${response.text.slice(0, 500)}…`
+                            : response.text;
+                        console.error(
+                          `Failed to parse response from ${params.id}. Response snippet: ${snippet}`,
+                        );
+                        throw new Error(
+                          `Failed to parse response from ${params.id}: ${e2} (Snippet: ${snippet})`,
+                        );
+                      }
+                      let finalResult: Record<string, any> = {};
-                    // Handle both object and string responses
-                    if (typeof result.data === "object" && result.data !== null) {
-                      return result.data;
+                      // Handle both object and string responses
+                      if (typeof result?.data === "object" && result.data !== null) {
+                        finalResult = result.data;
+                      } else if (result?.data) {
+                        // Handle string responses - extract and repair JSON
+                        const index = result.data.indexOf("{");
+                        const lastIndex = result.data.lastIndexOf("}");
+                        if (index !== -1 && lastIndex !== -1) {
+                          try {
+                            const trimmed = result.data.slice(index, lastIndex + 1);
+                            const repaired = jsonrepair(trimmed);
+                            const parsed = JSON.parse(repaired);
+                            finalResult = parsed.data || parsed || {};
+                          } catch (e) {
+                            console.error(
+                              `Failed to parse nested JSON response. Snippet: ${result.data.slice(0, 100)}...`,
+                            );
+                            throw new Error(
+                              `Failed to parse nested JSON response: ${e} (Snippet: ${result.data.slice(0, 100)}...)`,
+                            );
+                          }
+                        } else {
+                          console.error(
+                            `Unexpected response format - no JSON object found. Snippet: ${String(result.data).slice(0, 100)}...`,
+                          );
+                          throw new Error(
+                            `Unexpected response format from ${params.id} - no JSON object found in response`,
+                          );
+                        }
+                      }

Contributor

coderabbitai bot Mar 26, 2026

⚠️ Potential issue | 🟠 Major

Handle top-level parsed payloads before assuming a data wrapper.

This branch only succeeds when the parsed response is stored under result.data. If parseModelResponse() returns a plain object or a top-level JSON string, finalResult stays {} and the whole chunk is silently dropped. Normalize result itself into rawData first, then run the existing object/string handling on that value.

Suggested change

-        let finalResult: Record<string, any> = {};
-
-        // Handle both object and string responses
-        if (typeof result?.data === "object" && result.data !== null) {
-          finalResult = result.data;
-        } else if (result?.data) {
+        let finalResult: Record<string, any> = {};
+        const rawData =
+          typeof result === "object" && result !== null && "data" in result
+            ? result.data
+            : result;
+
+        if (typeof rawData === "object" && rawData !== null) {
+          finalResult = rawData as Record<string, any>;
+        } else if (typeof rawData === "string" && rawData) {
           // Handle string responses - extract and repair JSON
-          const index = result.data.indexOf("{");
-          const lastIndex = result.data.lastIndexOf("}");
+          const index = rawData.indexOf("{");
+          const lastIndex = rawData.lastIndexOf("}");
           if (index !== -1 && lastIndex !== -1) {
             try {
-              const trimmed = result.data.slice(index, lastIndex + 1);
+              const trimmed = rawData.slice(index, lastIndex + 1);
               const repaired = jsonrepair(trimmed);
               const parsed = JSON.parse(repaired);
               finalResult = parsed.data || parsed || {};
             } catch (e) {
               console.error(
-                `Failed to parse nested JSON response. Snippet: ${result.data.slice(0, 100)}...`,
+                `Failed to parse nested JSON response. Snippet: ${rawData.slice(0, 100)}...`,
               );
               throw new Error(
-                `Failed to parse nested JSON response: ${e} (Snippet: ${result.data.slice(0, 100)}...)`,
+                `Failed to parse nested JSON response: ${e} (Snippet: ${rawData.slice(0, 100)}...)`,
               );
             }
           } else {
             console.error(
-              `Unexpected response format - no JSON object found. Snippet: ${String(result.data).slice(0, 100)}...`,
+              `Unexpected response format - no JSON object found. Snippet: ${String(rawData).slice(0, 100)}...`,
             );
             throw new Error(
               `Unexpected response format from ${params.id} - no JSON object found in response`,
             );
           }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/cli/src/cli/localizer/explicit.ts` around lines 278 - 324, The code
currently only inspects result.data and drops top-level parsed payloads; change
the logic to first normalize the parsed output into a single variable (e.g.,
rawData = result?.data ?? result) and then run the existing object/string
handling against rawData instead of result.data so plain objects or top-level
JSON strings aren't ignored; update all subsequent references (the object
branch, string branch, error messages and places using result.data) to use
rawData, keeping parseModelResponse, finalResult and jsonrepair behavior intact
and preserving the same error/snippet reporting that includes params.id.

AndreyHirsa and others added 6 commits

March 26, 2026 13:47


          fix(cli): vtt style region blocks (lingodotdev#2057)

abe3d2f

* fix(cli): handle STYLE and REGION blocks unsupported by node-webvtt

* fix(cli): add changeset

* fix(cli): add test for REGION block preservation in push output

* fix(cli): split block extraction into focused functions and fix regex to match exact keywords


          chore: bump package version (lingodotdev#2058)

2f6438f

Co-authored-by: Max Prilutskiy <maks.prilutskiy@gmail.com>


          fix(cli): skip undefined cues during incremental batch push (lingodot…

…dev#2059)


          chore: bump package version (lingodotdev#2060)

336eeae

Co-authored-by: Max Prilutskiy <maks.prilutskiy@gmail.com>


          feat: posthog distinct id migration (lingodotdev#2062)

10e6364

* feat: use database user ID as PostHog distinct_id instead of email

* fix: revert jsom demo changes

* fix: revert jsom demo changes

* fix: don't cache failed /users/me identity

* fix: prevent  override and ensure posthog shutdown on error


          chore: bump package version (lingodotdev#2063)

e490347

Co-authored-by: Max Prilutskiy <maks.prilutskiy@gmail.com>

AndreyHirsa requested changes

View reviewed changes

packages/cli/src/cli/processor/basic.ts Outdated

                   currentChunkItemCount++;
                   const currentChunkSize = countWordsInRecord(currentChunk);
+                  const effectiveBatchSize =

Contributor

AndreyHirsa Apr 1, 2026

Now, when --batch-size isn’t provided, the effective batch size defaults to the total number of keys. I would keep 25 as the fallback instead of payloadEntries.length.

packages/react/src/client/index.ts Outdated

               export * from "./locale-switcher";
               export * from "./attribute-component";
               export * from "./locale";
+              export { getLocaleFromCookies, setLocaleInCookies } from "./utils";

Contributor

AndreyHirsa Apr 1, 2026

This doesn't seem related to batch size

packages/cli/src/cli/cmd/run/_types.ts Outdated

                 debounce: z.number().positive().prefault(5000), // 5 seconds default
                 sound: z.boolean().optional(),
                 pseudo: z.boolean().optional(),
+                batchSize: z.number().min(1).optional(),

Contributor

AndreyHirsa Apr 1, 2026

Could you please add a maximum value of 250?

packages/cli/src/cli/cmd/i18n.ts Outdated

                   file: Z.array(Z.string()).optional(),
                   interactive: Z.boolean().prefault(false),
                   debug: Z.boolean().prefault(false),
+                  batchSize: Z.number().min(1).optional(),

Contributor

AndreyHirsa Apr 1, 2026

Same as run: use a maximum value of 250.

packages/cli/src/cli/cmd/run/setup.ts

                         const provider = isPseudo ? "pseudo" : ctx.config?.provider;
                         const engineId = ctx.config?.engineId;
-                        ctx.localizer = createLocalizer(provider, engineId, ctx.flags.apiKey);
+                        ctx.localizer = createLocalizer(

Contributor

AndreyHirsa Apr 1, 2026

Please use prettier to match the project formatting.

Author

Hellnight2005 Apr 2, 2026

I looked into this and tried reformatting it. I also cross-verified it (including with AI and Prettier), and it seems the line is being split due to the printWidth: 80 rule in the project config.

packages/cli/src/cli/localizer/explicit.ts Outdated

+               * @param batchSize - Max number of keys per chunk (default: 25)
+               * @returns An array of payload chunks
+               */
+              function extractPayloadChunks(

Contributor

AndreyHirsa Apr 1, 2026

extractPayloadChunks and countWordsInRecord are now copy-pasted in two places - packages/cli/src/cli/processor/basic.ts and at the bottom of packages/cli/src/cli/localizer/explicit.ts. Please extract them into a shared utility so we don't maintain two copies.

Author

Hellnight2005 Apr 2, 2026

I’ve removed the duplicate implementations from processor/basic.ts and localizer/explicit.ts

I extracted them into a shared utility (packages/cli/src/cli/utils/chunk.ts) and imported them back into both places so we only maintain them in one place

packages/cli/src/cli/localizer/explicit.ts Outdated

+                      // Handle both object and string responses
+                      if (typeof result?.data === "object" && result.data !== null) {
+                        finalResult = result.data;
+                      } else if (result?.data) {

Contributor

AndreyHirsa Apr 1, 2026

What are the problems with the current repair logic? This looks overly complex for what it does.

Author

Hellnight2005 Apr 2, 2026

The original repair logic was added to handle inconsistencies across different LLM providers. In some cases, models return the data field as double-stringified JSON (e.g. {"data": "{\"message\": \"Hola\"}"}) instead of an object.

The manual extraction (indexOf('{') + jsonrepair) was intended to safely recover that payload and prevent crashes during batch localization.

That said, I agree the implementation was overly complex. I’ve simplified it by removing the manual slicing and reusing parseModelResponse() recursively, which handles the double-stringification more cleanly with less code.

Hellnight2005 added 11 commits

April 2, 2026 10:54


          feat(react): export locale cookie helpers

f4c1edb


          chore: add changeset for locale helper exports

10e8cef


          feat: add batch size parameter to prevent context leaking (closes lin…

2243c1a

…godotdev#1733)


          fix: resolve progress calculation off-by-one error (review feedback)

9c82c24


          fix: handle unhandled JSON.parse exception (review feedback)

f55ddbb


          fix: correct changeset for batch size feature

1e991d8


          fix: address review feedback (docs update for batch-size flag and whi…

15ee4d4

…tespace cleanup)


          fix: add error handling for nested string-to-JSON parsing (CodeRabbit…

c8cc355

… review)


          fix: throw error on nested JSON parse failure to prevent silent data …

03356cb

…loss (CodeRabbit review)


          fix: address CodeRabbit review feedback


          chore: address PR review feedback on batch sizing and chunk extraction

6a251f2

- Fallback to 25 instead of total keys for batch size
- Add Zod max(250) validation for batchSize flag
- Extract duplicate JSON chunking repair logic to utils
- Revert unintended export from react/src/client/index.ts
- Format modified files with Prettier

coderabbitai bot reviewed

View reviewed changes

Contributor

coderabbitai bot left a comment

🧹 Nitpick comments (1)

packages/cli/src/cli/cmd/run/index.ts (1)

126-130: Improve --batch-size parsing to reject malformed input at CLI level.

Currently, parseInt(val) on line 129 silently coerces malformed input: "2abc" → 2 and "1.9" → 1. While downstream schema validation (flagsSchema.parse()) catches invalid values, early CLI-level validation provides immediate feedback. Since interactive-commander supports error throwing in option parser callbacks, reject non-integer values directly:

Proposed change

   .option(
     "--batch-size <number>",
     "Number of translations to process in a single batch (not applicable when using lingo.dev provider)",
-    (val: string) => parseInt(val),
+    (val: string) => {
+      const parsed = Number(val);
+      if (!Number.isInteger(parsed) || parsed < 1) {
+        throw new Error("batch-size must be a positive integer");
+      }
+      return parsed;
+    },
   )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/cli/src/cli/cmd/run/index.ts` around lines 126 - 130, The CLI option
parser for "--batch-size <number>" currently uses parseInt(val) which silently
accepts malformed input like "2abc" or "1.9"; update the option parser callback
for the "--batch-size" option to perform strict integer validation (e.g., verify
the string is all digits or that Number(val) is finite and Number.isInteger) and
throw an Error when the input is not a valid integer so the commander CLI
reports the problem immediately (this is the code path where parseInt(val) is
used; downstream validation still occurs in flagsSchema.parse()).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/cli/src/cli/cmd/run/index.ts`:
- Around line 126-130: The CLI option parser for "--batch-size <number>"
currently uses parseInt(val) which silently accepts malformed input like "2abc"
or "1.9"; update the option parser callback for the "--batch-size" option to
perform strict integer validation (e.g., verify the string is all digits or that
Number(val) is finite and Number.isInteger) and throw an Error when the input is
not a valid integer so the commander CLI reports the problem immediately (this
is the code path where parseInt(val) is used; downstream validation still occurs
in flagsSchema.parse()).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9e8b37e1-dff1-42c8-8c32-3218aa929879

📥 Commits

Reviewing files that changed from the base of the PR and between 7d2b6ef and e20377f.

📒 Files selected for processing (2)

packages/cli/src/cli/cmd/i18n.ts
packages/cli/src/cli/cmd/run/index.ts

Hellnight2005 force-pushed the fix/1733-batch-size branch from e20377f to 6a251f2 Compare

April 2, 2026 05:47

coderabbitai bot reviewed

View reviewed changes

Contributor

coderabbitai bot left a comment

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/cli/src/cli/cmd/i18n.ts (1)

444-449: ⚠️ Potential issue | 🟡 Minor

Avoid overriding provider settings.batchSize with undefined.

Line 448 always includes batchSize, even when not set. In packages/cli/src/cli/processor/index.ts:15-31, this can override provider settings with undefined.

🔧 Proposed fix

                 let processPayload = createProcessor(i18nConfig!.provider, {
                   apiKey: settings.auth.apiKey,
                   apiUrl: settings.auth.apiUrl,
                   engineId: i18nConfig!.engineId,
-                  batchSize: flags.batchSize,
+                  ...(flags.batchSize !== undefined
+                    ? { batchSize: flags.batchSize }
+                    : {}),
                 });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/cli/src/cli/cmd/i18n.ts` around lines 444 - 449, The current call to
createProcessor(i18nConfig!.provider, { ... }) always sets batchSize:
flags.batchSize which can overwrite provider defaults with undefined; update the
call so the options object only includes batchSize when flags.batchSize is
defined (e.g., build the options object and conditionally add batchSize if
flags.batchSize !== undefined) so createProcessor receives no batchSize key when
the flag isn't set; look for createProcessor, flags.batchSize and
i18nConfig!.provider to modify this call.

🧹 Nitpick comments (1)

packages/cli/src/cli/utils/chunk.ts (1)

19-30: Split before overflow so the 250-word target is actually respected.

The current check runs after Line 19 adds the new entry, so the emitted chunk can exceed the ideal word limit. Pre-checking before append keeps chunk sizes closer to the intended threshold.

♻️ Refactor sketch

-    currentChunk[key] = value;
-    currentChunkItemCount++;
-
-    const currentChunkSize = countWordsInRecord(currentChunk);
+    const entrySize = countWordsInRecord({ [key]: value });
+    const currentChunkSize = countWordsInRecord(currentChunk);
     const effectiveBatchSize = batchSize ?? 25;
+    if (
+      currentChunkItemCount > 0 &&
+      (currentChunkSize + entrySize > idealBatchItemSize ||
+        currentChunkItemCount >= effectiveBatchSize)
+    ) {
+      result.push(currentChunk);
+      currentChunk = {};
+      currentChunkItemCount = 0;
+    }
+    currentChunk[key] = value;
+    currentChunkItemCount++;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/cli/src/cli/utils/chunk.ts` around lines 19 - 30, The chunking logic
currently appends the new entry to currentChunk before measuring size, which
allows emitted chunks to exceed idealBatchItemSize; modify the flow in the
function that builds result (use symbols currentChunk, countWordsInRecord,
idealBatchItemSize, batchSize, currentChunkItemCount, payloadEntries, i) to
check whether adding the next record would push currentChunk over
idealBatchItemSize or effectiveBatchSize and, if so, push the existing
currentChunk to result and reset it before adding the new entry; ensure you
still handle the case where currentChunk is empty but a single record itself
exceeds idealBatchItemSize by allowing that record into its own chunk and
incrementing counts accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/cli/src/cli/cmd/i18n.ts`:
- Around line 93-97: Replace the direct use of parseInt in the
.option("--batch-size <number>", ...) call with a dedicated wrapper like (v) =>
Number.parseInt(v, 10) to avoid Commander passing a previousValue as a radix;
also update the validation schema for batchSize to enforce an integer (add
.int() to the batchSize schema entry) so batchSize is validated as a
whole-number count.

In `@packages/cli/src/cli/utils/chunk.ts`:
- Around line 23-27: The code currently silently falls back to 25 for invalid
batchSize; instead validate batchSize when provided and reject invalid values.
In the chunking logic (where effectiveBatchSize, batchSize, currentChunkSize,
currentChunkItemCount, idealBatchItemSize, i, and payloadEntries are used) check
if batchSize is defined and if so enforce it is an integer within 1..250; if it
is out of range or not an integer throw an error (or return a rejected Promise)
so callers get immediate feedback; only default to 25 when batchSize is entirely
omitted/undefined. Ensure the validation happens before computing
effectiveBatchSize and before the loop that uses currentChunkItemCount and i.

---

Outside diff comments:
In `@packages/cli/src/cli/cmd/i18n.ts`:
- Around line 444-449: The current call to createProcessor(i18nConfig!.provider,
{ ... }) always sets batchSize: flags.batchSize which can overwrite provider
defaults with undefined; update the call so the options object only includes
batchSize when flags.batchSize is defined (e.g., build the options object and
conditionally add batchSize if flags.batchSize !== undefined) so createProcessor
receives no batchSize key when the flag isn't set; look for createProcessor,
flags.batchSize and i18nConfig!.provider to modify this call.

---

Nitpick comments:
In `@packages/cli/src/cli/utils/chunk.ts`:
- Around line 19-30: The chunking logic currently appends the new entry to
currentChunk before measuring size, which allows emitted chunks to exceed
idealBatchItemSize; modify the flow in the function that builds result (use
symbols currentChunk, countWordsInRecord, idealBatchItemSize, batchSize,
currentChunkItemCount, payloadEntries, i) to check whether adding the next
record would push currentChunk over idealBatchItemSize or effectiveBatchSize
and, if so, push the existing currentChunk to result and reset it before adding
the new entry; ensure you still handle the case where currentChunk is empty but
a single record itself exceeds idealBatchItemSize by allowing that record into
its own chunk and incrementing counts accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: be6382fd-bf16-41b5-b60a-3fcf16fe838a

📥 Commits

Reviewing files that changed from the base of the PR and between e20377f and 6a251f2.

📒 Files selected for processing (12)

.changeset/add-batch-size.md
packages/cli/src/cli/cmd/i18n.ts
packages/cli/src/cli/cmd/run/_types.ts
packages/cli/src/cli/cmd/run/index.ts
packages/cli/src/cli/cmd/run/setup.ts
packages/cli/src/cli/localizer/explicit.ts
packages/cli/src/cli/localizer/index.ts
packages/cli/src/cli/localizer/pseudo.ts
packages/cli/src/cli/processor/basic.spec.ts
packages/cli/src/cli/processor/basic.ts
packages/cli/src/cli/processor/index.ts
packages/cli/src/cli/utils/chunk.ts

✅ Files skipped from review due to trivial changes (2)

.changeset/add-batch-size.md
packages/cli/src/cli/processor/basic.spec.ts

🚧 Files skipped from review as they are similar to previous changes (7)

packages/cli/src/cli/cmd/run/setup.ts
packages/cli/src/cli/localizer/pseudo.ts
packages/cli/src/cli/localizer/index.ts
packages/cli/src/cli/cmd/run/_types.ts
packages/cli/src/cli/processor/index.ts
packages/cli/src/cli/processor/basic.ts
packages/cli/src/cli/localizer/explicit.ts

packages/cli/src/cli/cmd/i18n.ts

Comment on lines +93 to +97

+                .option(
+                  "--batch-size <number>",
+                  "Number of translations to process in a single batch",
+                  parseInt,
+                )

Contributor

coderabbitai bot Apr 2, 2026

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

interactive-commander option parser callback signature: does the parser receive (value, previousValue), and is passing JavaScript parseInt directly considered unsafe because of its second-argument radix behavior?

💡 Result:

No, the option parser (argParser) callback in interactive-commander receives only a single value parameter, not (value, previousValue). This is shown in official documentation examples where .argParser(Number) and .argParser((value) => { ... }) are used, with the lambda taking one argument. Yes, passing JavaScript parseInt directly is considered unsafe in Commander.js option processing (which interactive-commander extends and is compatible with). The parser receives (value, previousValue) for repeatable options, so parseInt(value, previousValue) treats previousValue as radix, often causing NaN (e.g., radix=1). Multiple GitHub issues confirm this: defaults become NaN (Issue #523), and wrappers like v => parseInt(v) are recommended (Issue #1007). MDN notes parseInt requires explicit radix (10) for safety.

Citations:

Use a dedicated numeric parser and add integer validation for --batch-size.

Line 96 passes parseInt directly as the parser callback. When interactive-commander extends Commander.js behavior, repeatable options pass both value and previousValue to the callback, causing parseInt(value, previousValue) to treat the previous value as a radix argument, often producing unexpected results. Use an explicit wrapper: (value) => Number.parseInt(value, 10).

Additionally, since batchSize represents a count and must be a whole number, add .int() validation to the schema at line 671.

🔧 Proposed fix

   .option(
     "--batch-size <number>",
     "Number of translations to process in a single batch",
-    parseInt,
+    (value: string) => Number.parseInt(value, 10),
   )

-    batchSize: Z.number().min(1).max(250).optional(),
+    batchSize: Z.number().int().min(1).max(250).optional(),

Also applies to: 671-672

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/cli/src/cli/cmd/i18n.ts` around lines 93 - 97, Replace the direct
use of parseInt in the .option("--batch-size <number>", ...) call with a
dedicated wrapper like (v) => Number.parseInt(v, 10) to avoid Commander passing
a previousValue as a radix; also update the validation schema for batchSize to
enforce an integer (add .int() to the batchSize schema entry) so batchSize is
validated as a whole-number count.

packages/cli/src/cli/utils/chunk.ts

Comment on lines +23 to +27

+                  const effectiveBatchSize = batchSize && batchSize > 0 ? batchSize : 25;
+                  if (
+                    currentChunkSize > idealBatchItemSize ||
+                    currentChunkItemCount >= effectiveBatchSize ||
+                    i === payloadEntries.length - 1

Contributor

coderabbitai bot Apr 2, 2026

⚠️ Potential issue | 🟠 Major

Reject invalid batchSize instead of silently falling back to 25.

Current logic masks invalid values and makes misconfiguration hard to detect. This utility should enforce the same contract as CLI input (1..250) when a value is provided.

🔧 Proposed fix

 export function extractPayloadChunks(
   payload: Record<string, any>,
   batchSize?: number,
 ): Record<string, any>[] {
   const idealBatchItemSize = 250;
+  if (
+    batchSize !== undefined &&
+    (!Number.isInteger(batchSize) || batchSize < 1 || batchSize > 250)
+  ) {
+    throw new RangeError("batchSize must be an integer between 1 and 250");
+  }
   const result: Record<string, any>[] = [];
   let currentChunk: Record<string, any> = {};
   let currentChunkItemCount = 0;
@@
-    const effectiveBatchSize = batchSize && batchSize > 0 ? batchSize : 25;
+    const effectiveBatchSize = batchSize ?? 25;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/cli/src/cli/utils/chunk.ts` around lines 23 - 27, The code currently
silently falls back to 25 for invalid batchSize; instead validate batchSize when
provided and reject invalid values. In the chunking logic (where
effectiveBatchSize, batchSize, currentChunkSize, currentChunkItemCount,
idealBatchItemSize, i, and payloadEntries are used) check if batchSize is
defined and if so enforce it is an integer within 1..250; if it is out of range
or not an integer throw an error (or return a rejected Promise) so callers get
immediate feedback; only default to 25 when batchSize is entirely
omitted/undefined. Ensure the validation happens before computing
effectiveBatchSize and before the loop that uses currentChunkItemCount and i.

Hellnight2005 requested a review from AndreyHirsa

April 2, 2026 05:56

AndreyHirsa and others added 5 commits

April 2, 2026 17:23


          fix(cli): po loader push performance, multi-entry sections, pseudo mo…

3140f2a

…de crash (lingodotdev#2064)


          chore: bump package version (lingodotdev#2065)

9688edb

Co-authored-by: Max Prilutskiy <maks.prilutskiy@gmail.com>


          Merge branch 'main' into fix/1733-batch-size

fe7ef72


          feat(ctx): add CTX AI context engine to lingo.dev (lingodotdev#2061)

cee3eb8

* feat(ctx): add CTX AI context engine integration

* fix(demo/ctx): move tsx to dependencies for global install support

tsx was in devDependencies, which are not installed for global consumers
of the bin entry; moving it to dependencies ensures the shebang resolves.

Made-with: Cursor

* docs(demo/ctx): replace Bun references with Node.js/tsx

The project uses tsx/Node, not Bun. Updated the badge, install
requirements section, and the requirements list to reflect the
actual runtime.

Made-with: Cursor

* fix(demo/ctx): use os.homedir() instead of process.env.HOME! in state dir

Made-with: Cursor

* fix(demo/ctx): guard setRawMode behind isTTY check to avoid crash in non-TTY

Made-with: Cursor

* fix(demo/ctx): extract new path from rename entries in git status --porcelain

Made-with: Cursor

* fix(demo/ctx): scope JSONC comment replacement to // CTX: prefix to preserve user comments

Made-with: Cursor

* fix(demo/ctx): only apply path-shortening in toolCall when arg is a filesystem path

Made-with: Cursor

* fix(demo/ctx): merge into existing provider instead of overwriting to preserve custom fields

Made-with: Cursor

* fix(demo/ctx): add error handling for readFile, readFileSync, and JSON.parse in voices

Made-with: Cursor

* fix(demo/ctx): handle pause_turn stop reason and add path traversal guard in research agent

Made-with: Cursor

* fix(demo/ctx): add resolveSafe path traversal guard for all tool calls in agent loop

Made-with: Cursor

* fix(demo/ctx): include i18n.json in candidates and defer clearState until rewrite confirmed

Made-with: Cursor

* docs(demo/ctx): update install instructions to point to lingodotdev/lingo.dev monorepo

Made-with: Cursor

* chore(demo/ctx): rename package from lingo-agent to ctx

Made-with: Cursor

* revert(demo/ctx): remove manually created changeset file

* chore(demo/ctx): add changeset for ctx minor release

* chore: update pnpm-lock.yaml to include demo/ctx workspace package


          Merge branch 'main' into fix/1733-batch-size

60b632d

vrcprl force-pushed the main branch from ab701a7 to 873e773 Compare

April 17, 2026 19:37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet