week_2: Module C (The Librarian) — C.0 input boundary: SectionValidator + ExplicitLinkResolver by PRAteek-singHWY · Pull Request #925 · OWASP/OpenCRE

PRAteek-singHWY · 2026-06-10T21:35:04Z

week_2: Module C (The Librarian) — C.0 deterministic input boundary: SectionValidator + ExplicitLinkResolver

Stacked on #922. This branch is based on gsocmodule_C_week_1; only the top commit is new. I will rebase onto main as soon as #922 merges, which will shrink the diff to the Week 2 files only.

Overview

This is the Week 2 deliverable for Module C (The Librarian): C.0, the deterministic input boundary. Before any retrieval or ML, every incoming chunk passes two deterministic stages:

SectionValidator — is this input well-formed and usable? Validates the row, adapts it into the internal Section the pipeline consumes, and rejects bad input with typed errors.
ExplicitLinkResolver — does the text already cite a CRE id (ddd-ddd)? If exactly one known id is cited, link it directly with no ML. Unknown or conflicting references never auto-link — they route to human review (fail-safe).

A naming note for reviewers: the plan documents called this component "SectionNormalizer", but the RFC (#734) assigns text normalization to Module A (harvest + normalize + chunk), with Module B's sanitizer on top. By the time text reaches Module C it is contractually clean, and re-cleaning it here would silently drift C from what A hashed and B classified. This component validates and adapts — it never transforms text — so it is named SectionValidator.

Scope: 4 new files + 1 updated (the eval harness). No frontend, no migrations, no DB access, no behaviour change to OpenCRE proper.

What changed

Area	Files	Description
C.0 validator	`section_validator.py`	Validates both upstream shapes — Module B's reduced `knowledge_queue` row and the full RFC `KnowledgeItem` envelope — into a frozen internal `Section`. Synthesizes the RFC identity fields from B's row (`chunk_id = chk:{repo}@{sha}:{path}`, `artifact_id = art:{repo}:{path}`, `source`, `locator`). Strips volatile audit metadata (`llm_reasoning`, filter stages, run timestamps) so downstream stages can never key decisions on it. Rejections are typed (`MalformedKnowledgeItemError`, `EmptyTextError`, `UnsupportedLanguageError`, `NotKnowledgeError`); raw Pydantic `ValidationError` never escapes the boundary.
C.0.5 fast path	`explicit_link_resolver.py`	Deterministic regex (`\b\d{3}-\d{3}\b`) extraction + resolution against an injected set of known CRE ids. Outcomes: `resolved` (single known id → auto-link), `no_reference` (continue to the semantic path, W3+), `unknown_reference` / `conflicting_references` (→ review, with the known ids preserved as suggestions). The known-id set is injected so this module stays dependency-free: the harness seeds it from the golden dataset today; the DB-backed `cre.external_id` registry arrives with the retriever (W3).
Eval harness	`evaluate_librarian.py`	Now runs every golden row through C.0: prints the validation pass rate per slice, and gates the explicit slice at 100% resolver correctness — the script exits non-zero on any regression, so CI can block on it. `predict()` makes its first real predictions (explicit path only; semantic path still stubbed).
Tests	`section_validator_test.py`, `explicit_link_resolver_test.py`	15 new tests, table-driven: every rejection class, identity synthesis, volatile-metadata stripping, language variants (`en-GB` accepted, `fr` rejected), and every resolver outcome including pattern boundaries (`027-5555` and `CVE-2024-1234-567` must not match). One test asserts the boundary never leaks a raw Pydantic error.

How the pieces connect

flowchart TB
    subgraph UPSTREAM["Upstream shapes (Week 1 contracts)"]
        row["KnowledgeQueueItem<br/>(Module B's reduced row)"]
        ki["KnowledgeItem<br/>(RFC envelope)"]
    end

    subgraph C0["C.0 — input boundary (this PR)"]
        val["section_validator.py<br/>validate · adapt · synthesize identity"]
        sec["Section<br/>(internal, frozen)"]
        res["explicit_link_resolver.py<br/>regex ddd-ddd, no ML"]
    end

    subgraph OUT["Outcomes"]
        link["resolved →<br/>deterministic link"]
        sem["no_reference →<br/>semantic path (W3+)"]
        rev["unknown / conflicting →<br/>human review"]
        err["typed errors:<br/>Malformed · EmptyText ·<br/>UnsupportedLanguage · NotKnowledge"]
    end

    subgraph HARNESS["Eval harness (updated)"]
        eval["evaluate_librarian.py<br/>pass rate per slice ·<br/>explicit gate 100% (exit 1 on fail)"]
    end

    row --> val
    ki --> val
    val --> sec
    val -. reject .-> err
    sec --> res
    res --> link
    res --> sem
    res --> rev
    sec --> eval
    res --> eval

Results

validation pass rate (C.0 boundary):
  ambiguous      5/5 (100%)
  explicit       5/5 (100%)
  hard_negative  12/12 (100%)
  positive       292/292 (100%)
  update         5/5 (100%)
explicit slice (C.0.5 resolver): 5/5 — gate 100%: PASS

66 tests passing (51 from Week 1 + 15 new)
mypy --strict clean on both new modules (repo flags)
black clean

What is intentionally not here

Retriever, embeddings, cross-encoder, SafetyGuard, decision engine, CLI wiring, and DB models/migrations are later weeks (W3–W8 per the proposal). The semantic path in the harness still predicts nothing — Week 3 plugs the retriever into the same predict() seam.

How to verify locally

# 66 tests
python3 -m unittest discover -s application/tests/librarian -p '*_test.py' -t .

# harness end-to-end: per-slice pass rate + explicit gate
python3 scripts/evaluate_librarian.py --dataset application/tests/librarian/fixtures/golden_dataset.json

# explicit slice only
python3 scripts/evaluate_librarian.py --dataset application/tests/librarian/fixtures/golden_dataset.json --slice explicit

🤖 Generated with Claude Code

dataset Contracts + regression ruler before any pipeline code, per the OIE RFC's 'test before the code' directive. RFC OWASP#734 envelopes (KnowledgeItem in, LinkProposal/ReviewItem out) as Pydantic v2, drift-guarded against the vendored owasp-graph schemas; TRACT hub-firewall + multi-link scoring; 319-row golden dataset derived from standards_cache.sqlite with --check drift detection. One prod edit: pydantic>=2,<3 pin.

…nts, fail-fast build, edge cases

coderabbitai · 2026-06-10T21:35:18Z

Warning

Review limit reached

@PRAteek-singHWY, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 45 minutes and 18 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 91c9b5fe-8d13-4712-874b-4b57cffc5f4b

📥 Commits

Reviewing files that changed from the base of the PR and between d99a64a and 73aa15e.

📒 Files selected for processing (1)

requirements.txt

Walkthrough

Adds Module C (Librarian): RFC JSON schemas and Pydantic models, env-config loading, explicit-link resolution, hub firewalling, scoring, C.0 input boundary conversion, golden dataset generation, evaluation, and unit tests.

Changes

Module C Librarian Implementation

Layer / File(s)	Summary
RFC contracts and Pydantic models `application/utils/librarian/__init__.py`, `application/utils/librarian/_rfc_schemas/*`, `application/utils/librarian/schemas.py`, `requirements.txt`	Vendored JSON Schemas and Pydantic v2 models implement RFC `#734` envelopes (`KnowledgeItem`, `LinkProposal`, `ReviewItem`) and supporting types; module docstring and `pydantic>=2.4.0,<3` pin.
Configuration loader and tests `application/utils/librarian/config_loader.py`, `application/tests/librarian/config_loader_test.py`	Frozen `LibrarianConfig` and `load_config()` parse `CRE_LIBRARIAN_*` env vars, cast and validate numeric bounds and ordering; tests cover defaults, overrides, and invalid inputs.
Core utilities `application/utils/librarian/explicit_link_resolver.py`, `application/utils/librarian/hub_firewall.py`, `application/utils/librarian/knowledge_source.py`, `application/utils/librarian/scoring.py`, `application/tests/librarian/explicit_link_resolver_test.py`, `application/tests/librarian/hub_firewall_test.py`, `application/tests/librarian/scoring_test.py`	Deterministic `ddd-ddd` extraction and resolution with explicit outcomes, hub firewall to remove hub echoes, fixture-backed `KnowledgeSource`, and Jaccard-based scoring with edge-case tests.
C.0 input boundary validation `application/utils/librarian/section_validator.py`, `application/tests/librarian/section_validator_test.py`	`section_from_queue_row` and `section_from_knowledge_item` convert upstream inputs into `Section` objects with synthesized IDs, language gating, and a controlled `SectionValidationError` hierarchy; tests validate happy and rejection paths.
Golden dataset construction and validation `scripts/build_golden_dataset.py`, `application/tests/librarian/fixtures/golden_dataset.schema.json`, `application/tests/librarian/fixtures/sample_knowledge_queue.jsonl`, `application/tests/librarian/dataset_test.py`	Deterministic builder produces a golden dataset across slices (positive, multilink, hard_negative, explicit, update, ambiguous); JSON Schema fixture and tests assert shape, provenance, multi-link rows, ID uniqueness, and determinism via `--check`.
Evaluation harness and schema tests `scripts/evaluate_librarian.py`, `application/tests/librarian/schemas_test.py`	Evaluation harness loads golden rows, synthesizes queue rows, applies deterministic explicit-only prediction, scores cases with optional firewalling, enforces the explicit-slice gate, and `schemas_test.py` validates canonical schema round-trips and Pydantic model constraints.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested reviewers

Pa04rth

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: C.0 input boundary work for SectionValidator and ExplicitLinkResolver.
Description check	✅ Passed	The description is directly related and accurately explains the same validator, resolver, harness, and test changes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (6)

application/utils/librarian/schemas.py (2)
272-290: 💤 Low value

Consider making the extra field handling explicit for clarity.

The docstring (Line 276-277) states this model "tolerates extra fields so B can extend the row without breaking C," relying on Pydantic v2's default behavior of extra="ignore". While this works correctly, being explicit would improve code clarity and prevent confusion if Pydantic defaults change in the future.
📝 Suggested improvement
 class KnowledgeQueueItem(BaseModel):
     """Read-side mirror of Module B's `knowledge_queue` Postgres row.
 
     Per master guide §1.2: C reads these rows and synthesizes the RFC
     `KnowledgeItem` envelope from them. Not a wire contract; tolerates extra
     fields so B can extend the row without breaking C.
     """
 
+    model_config = ConfigDict(extra="ignore")
     id: str
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/utils/librarian/schemas.py` around lines 272 - 290, The
KnowledgeQueueItem Pydantic model currently relies on Pydantic v2's default of
ignoring extra fields; make this explicit by adding an explicit model config to
the KnowledgeQueueItem class (e.g., set model_config = {"extra": "ignore"} or
the equivalent Config/ModelConfig pattern used across the codebase) so BaseModel
subclass KnowledgeQueueItem clearly documents and enforces tolerant handling of
unknown fields from Module B.
99-107: 💤 Low value

Consider aligning the language field default with the JSON schema.

The JSON schema (knowledge-item.json Line 43) specifies "language": { "type": "string", "default": "en" }, but the Pydantic model has language: Optional[str] = None. While JSON Schema default values are not enforced during validation (they're hints for tooling), this creates a semantic mismatch: the schema documents that language should default to "en", but the model defaults to None.

If the intent is for language to always be "en" when unspecified, consider:
language: str = "en"
If None is intentionally allowed (meaning "language unknown"), the current model is correct but consider updating the JSON schema to clarify this.

Since round-trip tests are passing (per PR description), this may be intentional or the tests may not validate default value behavior strictly.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/utils/librarian/schemas.py` around lines 99 - 107, The
KnowledgeContent Pydantic model's language field currently allows None but the
JSON schema documents a default of "en"; to align them, change the field in the
KnowledgeContent class from Optional[str] = None to a non-optional string with
default "en" (i.e., make language: str = "en") so the model provides the
documented default; if instead None is intended, update the JSON schema to
remove or change the default—refer to the KnowledgeContent class and the
language field to implement the change.
requirements.txt (1)
1-119: ⚖️ Poor tradeoff

Consider deduplicating dependency entries.

The file contains duplicate entries that may cause confusion:

compliance-trestle (lines 3, 35)

setuptools (lines 12, 33)

SQLAlchemy (lines 34, 100)

psycopg2-binary (lines 27, 61)

playwright (lines 26, 54)

scikit_learn/scikit-learn (lines 30, 94 — naming inconsistency)

When pip encounters duplicates, it uses the last occurrence, making earlier entries misleading. Consider consolidating these into single entries.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@requirements.txt` around lines 1 - 119, requirements.txt contains duplicate
and inconsistent package entries; remove redundant lines and consolidate into
single canonical entries for each package (e.g., keep one compliance-trestle,
one setuptools, one SQLAlchemy, one psycopg2-binary, one playwright) and
normalize scikit-learn to the correct package name (scikit-learn) so pip
behavior is deterministic; ensure any required version specifiers are preserved
when merging and remove exact duplicates (e.g., duplicate PyYAML/version lines)
so the final file lists each dependency only once.
scripts/build_golden_dataset.py (1)
261-261: ⚡ Quick win

Rename unused loop variable to follow convention.

The variable node_id is not used within the loop body. Per Python convention, prefix unused variables with underscore.
♻️ Proposed fix
-    for node_id, name, section_id, text, cre_concat in rows:
+    for _node_id, name, section_id, text, cre_concat in rows:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/build_golden_dataset.py` at line 261, The loop binding in the for
statement "for node_id, name, section_id, text, cre_concat in rows:" uses an
unused variable node_id; rename it to _node_id to follow Python convention for
unused variables and avoid linter warnings. Update the for header to "for
_node_id, name, section_id, text, cre_concat in rows:" and ensure there are no
references to node_id inside the loop (if any, replace them with the intended
variable or raise if needed).
application/utils/librarian/knowledge_source.py (2)
16-20: ⚡ Quick win

Simplify abstract method body.

The raise NotImplementedError in the abstract method body is redundant. The @abstractmethod decorator already enforces that subclasses must implement this method.
♻️ Proposed fix
 class KnowledgeSource(ABC):
     `@abstractmethod`
     def items(self) -> Iterator[KnowledgeQueueItem]:
         """Yield knowledge_queue rows awaiting classification."""
-        raise NotImplementedError
+        ...
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/utils/librarian/knowledge_source.py` around lines 16 - 20, Remove
the redundant raise in the abstract method: in the KnowledgeSource class remove
the "raise NotImplementedError" from the items method (keep the `@abstractmethod`
decorator and the docstring or replace the body with a simple pass) so
subclasses are still required to implement KnowledgeSource.items without the
unnecessary explicit exception.
9-9: ⚡ Quick win

Remove unused import.

The json module is imported but never used. Line 34 uses KnowledgeQueueItem.model_validate_json, which is a Pydantic method that handles JSON parsing internally.
♻️ Proposed fix
-import json
 from abc import ABC, abstractmethod
 from typing import Iterator
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/utils/librarian/knowledge_source.py` at line 9, Remove the unused
import of the json module at the top of knowledge_source.py; since the code uses
KnowledgeQueueItem.model_validate_json (a Pydantic method) for JSON parsing,
delete the "import json" line to avoid an unused-import warning and keep imports
minimal.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@requirements.txt`:
- Line 66: Update the pydantic version range in requirements.txt to exclude
vulnerable 2.x releases by changing the spec for "pydantic" from
"pydantic>=2,<3" to "pydantic>=2.4.0,<3" so that installations will use the
first patched 2.4.0+ release; locate the "pydantic>=2,<3" entry and replace it
with "pydantic>=2.4.0,<3".

---

Nitpick comments:
In `@application/utils/librarian/knowledge_source.py`:
- Around line 16-20: Remove the redundant raise in the abstract method: in the
KnowledgeSource class remove the "raise NotImplementedError" from the items
method (keep the `@abstractmethod` decorator and the docstring or replace the body
with a simple pass) so subclasses are still required to implement
KnowledgeSource.items without the unnecessary explicit exception.
- Line 9: Remove the unused import of the json module at the top of
knowledge_source.py; since the code uses KnowledgeQueueItem.model_validate_json
(a Pydantic method) for JSON parsing, delete the "import json" line to avoid an
unused-import warning and keep imports minimal.

In `@application/utils/librarian/schemas.py`:
- Around line 272-290: The KnowledgeQueueItem Pydantic model currently relies on
Pydantic v2's default of ignoring extra fields; make this explicit by adding an
explicit model config to the KnowledgeQueueItem class (e.g., set model_config =
{"extra": "ignore"} or the equivalent Config/ModelConfig pattern used across the
codebase) so BaseModel subclass KnowledgeQueueItem clearly documents and
enforces tolerant handling of unknown fields from Module B.
- Around line 99-107: The KnowledgeContent Pydantic model's language field
currently allows None but the JSON schema documents a default of "en"; to align
them, change the field in the KnowledgeContent class from Optional[str] = None
to a non-optional string with default "en" (i.e., make language: str = "en") so
the model provides the documented default; if instead None is intended, update
the JSON schema to remove or change the default—refer to the KnowledgeContent
class and the language field to implement the change.

In `@requirements.txt`:
- Around line 1-119: requirements.txt contains duplicate and inconsistent
package entries; remove redundant lines and consolidate into single canonical
entries for each package (e.g., keep one compliance-trestle, one setuptools, one
SQLAlchemy, one psycopg2-binary, one playwright) and normalize scikit-learn to
the correct package name (scikit-learn) so pip behavior is deterministic; ensure
any required version specifiers are preserved when merging and remove exact
duplicates (e.g., duplicate PyYAML/version lines) so the final file lists each
dependency only once.

In `@scripts/build_golden_dataset.py`:
- Line 261: The loop binding in the for statement "for node_id, name,
section_id, text, cre_concat in rows:" uses an unused variable node_id; rename
it to _node_id to follow Python convention for unused variables and avoid linter
warnings. Update the for header to "for _node_id, name, section_id, text,
cre_concat in rows:" and ensure there are no references to node_id inside the
loop (if any, replace them with the intended variable or raise if needed).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: c707f462-c16f-4413-acca-0aa2f5c45f32

📥 Commits

Reviewing files that changed from the base of the PR and between d796ff5 and b69a735.

📒 Files selected for processing (28)

application/tests/librarian/__init__.py
application/tests/librarian/config_loader_test.py
application/tests/librarian/dataset_test.py
application/tests/librarian/explicit_link_resolver_test.py
application/tests/librarian/fixtures/golden_dataset.json
application/tests/librarian/fixtures/golden_dataset.schema.json
application/tests/librarian/fixtures/sample_knowledge_queue.jsonl
application/tests/librarian/hub_firewall_test.py
application/tests/librarian/schemas_test.py
application/tests/librarian/scoring_test.py
application/tests/librarian/section_validator_test.py
application/utils/librarian/__init__.py
application/utils/librarian/_rfc_schemas/knowledge-item.json
application/utils/librarian/_rfc_schemas/link-proposal.json
application/utils/librarian/_rfc_schemas/locator.json
application/utils/librarian/_rfc_schemas/proposed-link.json
application/utils/librarian/_rfc_schemas/review-item.json
application/utils/librarian/_rfc_schemas/source-ref.json
application/utils/librarian/config_loader.py
application/utils/librarian/explicit_link_resolver.py
application/utils/librarian/hub_firewall.py
application/utils/librarian/knowledge_source.py
application/utils/librarian/schemas.py
application/utils/librarian/scoring.py
application/utils/librarian/section_validator.py
requirements.txt
scripts/build_golden_dataset.py
scripts/evaluate_librarian.py

…inkResolver The C.0 deterministic input boundary, per the proposal's W2/W3 'data preparation layer' (named validator, not normalizer: the RFC assigns text normalization to Module A; C validates and adapts, never transforms text). - section_validator.py: typed-error validation of both upstream shapes (B's knowledge_queue row + RFC KnowledgeItem envelope) into an internal Section; synthesizes RFC identity fields (chunk_id/artifact_id/source/ locator) from B's reduced row; strips volatile audit metadata. - explicit_link_resolver.py: deterministic ddd-ddd fast path, no ML. Fail-safe: only a single known reference auto-links; unknown or conflicting references route to review. - evaluate_librarian.py: harness now runs every golden row through C.0, prints per-slice validation pass rate, and gates the explicit slice at 100% resolver correctness (exit 1 on regression). Gate: PASS 5/5. - Table-driven tests for every rejection class and resolver outcome. mypy --strict clean, black clean, 66 tests green.

…ASP#925) GHSA pydantic ReDoS affects >=2.0.0,<2.4.0; first patched in 2.4.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ASP#925) GHSA pydantic ReDoS affects >=2.0.0,<2.4.0; first patched in 2.4.0.

PRAteek-singHWY added 2 commits June 9, 2026 17:17

week_1: address review — config validation, schema non-empty constrai…

e097016

…nts, fail-fast build, edge cases

coderabbitai Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread requirements.txt Outdated

PRAteek-singHWY force-pushed the gsocmodule_C_week_2 branch from b69a735 to b4a6aa2 Compare June 10, 2026 21:56

Merge branch 'main' into gsocmodule_C_week_2

0232910

PRAteek-singHWY mentioned this pull request Jun 18, 2026

week_3: Module C (The Librarian) — C.1 candidate retriever (in-memory + pgvector) + pipeline switch #937

Open

Merge branch 'main' into gsocmodule_C_week_2

a01e404

week_2: pin pydantic>=2.4.0 to avoid v2 ReDoS advisory (CodeRabbit OW…

73aa15e

…ASP#925) GHSA pydantic ReDoS affects >=2.0.0,<2.4.0; first patched in 2.4.0.

PRAteek-singHWY force-pushed the gsocmodule_C_week_2 branch from d99a64a to 73aa15e Compare June 26, 2026 19:59

PRAteek-singHWY added a commit to PRAteek-singHWY/OpenCRE that referenced this pull request Jun 26, 2026

week_2: pin pydantic>=2.4.0 to avoid v2 ReDoS advisory (CodeRabbit OW…

695a016

…ASP#925) GHSA pydantic ReDoS affects >=2.0.0,<2.4.0; first patched in 2.4.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

week_2: Module C (The Librarian) — C.0 input boundary: SectionValidator + ExplicitLinkResolver#925

week_2: Module C (The Librarian) — C.0 input boundary: SectionValidator + ExplicitLinkResolver#925
PRAteek-singHWY wants to merge 6 commits into
OWASP:mainfrom
PRAteek-singHWY:gsocmodule_C_week_2

PRAteek-singHWY commented Jun 10, 2026

Uh oh!

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading

Review limit reached

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

PRAteek-singHWY commented Jun 10, 2026

week_2: Module C (The Librarian) — C.0 deterministic input boundary: SectionValidator + ExplicitLinkResolver

Overview

What changed

How the pieces connect

Results

What is intentionally not here

How to verify locally

Uh oh!

coderabbitai Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading