Skip to content

Fixes for CWE-Bench-Java#78

Draft
IcebladeLabs wants to merge 11 commits into
iris-sast:v2from
IcebladeLabs:v2
Draft

Fixes for CWE-Bench-Java#78
IcebladeLabs wants to merge 11 commits into
iris-sast:v2from
IcebladeLabs:v2

Conversation

@IcebladeLabs

@IcebladeLabs IcebladeLabs commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

In the previous update to CWE-Bench-Java, CVEs were added to project_info.csv and build_info.csv. However, these projects were never appropriately added to fix_info.csv. This PR addresses that issue in the following ways:

  • Missing method-level information was programmatically generated and added to fix_info.csv.
  • The fix was aimed at all projects (identified by slug) present in another file missing from fix_info.csv.
  • This data was generated by evaluating commit diffs (from project_info.csv) using the Gemini API. The script that was used for this purpose is available at data/scripts/fix_info_generator.py.
  • Creates contributing_cwe_bench_java.md precisely defining how additions to CWE-Bench-Java should be formatted. This includes specifications for each field, and rules for inclusion/ exclusion. Note that the original data does not strictly follow this new formatting, and is not adjusted to fit.
  • Added tests at data/tests to ensure future data quality. These tests check for blank values, slug formatting, missing fix information, and alignment with the source commits.
  • Minor update to the README cataloging this change.

@IcebladeLabs IcebladeLabs requested a review from clairew June 23, 2026 20:17
Comment thread .github/workflows/CI_pipeline.yml
Comment thread README.md Outdated
class_name: List[str] = Field(description="Class the change is in.")
class_start: List[int] = Field(description="Starting line number of the class.")
class_end: List[int] = Field(description="Ending line number of the class.")
method_name: List[str] = Field(description="Method the change is in, e.g. evaluate")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add tests (data/tests/*_test.py) for ensuring we don't run into buggy rows in fix_info.csv?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be static Python tests or CI tests? We still have methods that do not exactly fit new guidelines from the original dataset, and there are still projects with no fix information (if they had no valid changes according to the new guidelines.) Should I add some simple formatting checks?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static python tests that you can run on the newly added v2 of the dataset. can add to ci later

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests at data/tests, checking for blank values, slug formatting, and missing fix information. PR description has been updated as well.

@IcebladeLabs IcebladeLabs requested a review from clairew June 30, 2026 16:45
@@ -0,0 +1,26 @@
# Contributing to CWE-Bench-Java

Projects in CWE-Bench-Java follow a strict framework in how they are recorded. All projects should be logged in `project_info.csv`, `build_info.csv`, and `fix_info.csv`. Details on how each should be formatted are below.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - rename to contributing_cwe_bench_java.md

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed in all instances.

Comment thread README.md Outdated
⚠️ Code and data for the [ICLR 2025 Paper](https://arxiv.org/pdf/2405.17238) can be found in the v1 branch, license and citation below.

## 📰 News
* **[Jun. 23, 2026]**: CVEs added in the previous update to CWE-Bench-Java did not include information regarding specific methods that were altered. This missing fix information has been added. Additionally, guidelines for future contributions to the benchmark have been established at [```contributing.md```](data/contributing.md).

@clairew clairew Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have been established at [```contributing_cwe_bench_java.md```](data/contributing_cwe_bench_java.md)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do link the commit where the extra cves were added. Make sure to say Only the CVEs added in the previous update

Only the CVEs added in the previous (update)[link to update]

@clairew

clairew commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Commits w/ the same name eg Update README.md or Add files via upload can be squashed to one commit.

@clairew

clairew commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Can you also add a test that compares - for each of updated fix info rows, retrieve the git commit diff and assert that the method info added in fix_info aligns with the git commit diff? @IcebladeLabs

Update README.md
Add files via upload
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants