Fixes for CWE-Bench-Java#78
Conversation
Add missing fix_info methods.
| class_name: List[str] = Field(description="Class the change is in.") | ||
| class_start: List[int] = Field(description="Starting line number of the class.") | ||
| class_end: List[int] = Field(description="Ending line number of the class.") | ||
| method_name: List[str] = Field(description="Method the change is in, e.g. evaluate") |
There was a problem hiding this comment.
can you add tests (data/tests/*_test.py) for ensuring we don't run into buggy rows in fix_info.csv?
There was a problem hiding this comment.
Should these be static Python tests or CI tests? We still have methods that do not exactly fit new guidelines from the original dataset, and there are still projects with no fix information (if they had no valid changes according to the new guidelines.) Should I add some simple formatting checks?
There was a problem hiding this comment.
static python tests that you can run on the newly added v2 of the dataset. can add to ci later
There was a problem hiding this comment.
Added tests at data/tests, checking for blank values, slug formatting, and missing fix information. PR description has been updated as well.
| @@ -0,0 +1,26 @@ | |||
| # Contributing to CWE-Bench-Java | |||
|
|
|||
| Projects in CWE-Bench-Java follow a strict framework in how they are recorded. All projects should be logged in `project_info.csv`, `build_info.csv`, and `fix_info.csv`. Details on how each should be formatted are below. | |||
There was a problem hiding this comment.
nit - rename to contributing_cwe_bench_java.md
There was a problem hiding this comment.
Renamed in all instances.
| ⚠️ Code and data for the [ICLR 2025 Paper](https://arxiv.org/pdf/2405.17238) can be found in the v1 branch, license and citation below. | ||
|
|
||
| ## 📰 News | ||
| * **[Jun. 23, 2026]**: CVEs added in the previous update to CWE-Bench-Java did not include information regarding specific methods that were altered. This missing fix information has been added. Additionally, guidelines for future contributions to the benchmark have been established at [```contributing.md```](data/contributing.md). |
There was a problem hiding this comment.
have been established at [```contributing_cwe_bench_java.md```](data/contributing_cwe_bench_java.md)
There was a problem hiding this comment.
Do link the commit where the extra cves were added. Make sure to say Only the CVEs added in the previous update
Only the CVEs added in the previous (update)[link to update]
|
Commits w/ the same name eg Update README.md or Add files via upload can be squashed to one commit. |
|
Can you also add a test that compares - for each of updated fix info rows, retrieve the git commit diff and assert that the method info added in fix_info aligns with the git commit diff? @IcebladeLabs |
Update README.md
Add files via upload
In the previous update to CWE-Bench-Java, CVEs were added to
project_info.csvandbuild_info.csv. However, these projects were never appropriately added tofix_info.csv. This PR addresses that issue in the following ways:fix_info.csv.fix_info.csv.project_info.csv) using the Gemini API. The script that was used for this purpose is available atdata/scripts/fix_info_generator.py.contributing_cwe_bench_java.mdprecisely defining how additions to CWE-Bench-Java should be formatted. This includes specifications for each field, and rules for inclusion/ exclusion. Note that the original data does not strictly follow this new formatting, and is not adjusted to fit.data/teststo ensure future data quality. These tests check for blank values, slug formatting, missing fix information, and alignment with the source commits.