Skip to content

Commit e7a60cb

Browse files
KyriosGN0claude
andauthored
feat(github): add username filtering helper for bot exclusion (#8716)
* feat(github): add username filtering helper for bot exclusion Implements shouldSkipByUsername() function to filter bot accounts by username using GITHUB_PR_EXCLUDELIST environment variable. - Case-insensitive matching - Comma-separated list support - Whitespace trimming - Returns false for empty usernames or empty exclusion list Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: AvivGuiser <avivguiser@gmail.com> * feat(github): filter bot PRs in extractor Adds username filtering to PR extractor to skip bot-authored PRs when GITHUB_PR_EXCLUDELIST is set. - Checks author username before extraction - Logs debug message when PR is skipped - Includes unit test for bot filtering - Includes e2e test data for bot filtering Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * feat(github): filter bot reviews in extractor Adds username filtering to review extractor to skip bot reviews when GITHUB_PR_EXCLUDELIST is set. - Checks reviewer username before extraction - Logs debug message when review is skipped - Includes e2e test for bot filtering Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * feat(github): filter bot PR review comments in extractor Adds username filtering to PR review comment extractor to skip bot comments when GITHUB_PR_EXCLUDELIST is set. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * feat(github): filter bot issue comments in extractor Adds username filtering to issue comment extractor to skip bot comments when GITHUB_PR_EXCLUDELIST is set. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * docs(github): add bot filtering documentation Documents GITHUB_PR_EXCLUDELIST configuration and usage. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(github): fix CI errors in bot filtering PR - Add Apache license header to README_FILTERING.md - Export ResetExcludedUsernamesForTest to fix unused lint error - Call ResetExcludedUsernamesForTest in e2e tests to reset sync.Once cache before setting GITHUB_PR_EXCLUDELIST env var Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Signed-off-by: AvivGuiser <avivguiser@gmail.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent 79b8f8a commit e7a60cb

11 files changed

Lines changed: 430 additions & 7 deletions
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
<!--
2+
Licensed to the Apache Software Foundation (ASF) under one or more
3+
contributor license agreements. See the NOTICE file distributed with
4+
this work for additional information regarding copyright ownership.
5+
The ASF licenses this file to You under the Apache License, Version 2.0
6+
(the "License"); you may not use this file except in compliance with
7+
the License. You may obtain a copy of the License at
8+
9+
http://www.apache.org/licenses/LICENSE-2.0
10+
11+
Unless required by applicable law or agreed to in writing, software
12+
distributed under the License is distributed on an "AS IS" BASIS,
13+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
See the License for the specific language governing permissions and
15+
limitations under the License.
16+
-->
17+
18+
# GitHub Plugin - Bot Filtering
19+
20+
## Overview
21+
22+
The GitHub plugin supports filtering bot-generated PRs, reviews, and comments from data collection to prevent them from skewing metrics like lead time for changes and PR pickup time.
23+
24+
## Configuration
25+
26+
Set the `GITHUB_PR_EXCLUDELIST` environment variable with a comma-separated list of bot usernames to exclude:
27+
28+
```bash
29+
export GITHUB_PR_EXCLUDELIST="renovate[bot],dependabot[bot],github-actions[bot]"
30+
```
31+
32+
## Common Bot Usernames
33+
34+
- `renovate[bot]` - Renovate dependency updates
35+
- `dependabot[bot]` - GitHub Dependabot
36+
- `github-actions[bot]` - GitHub Actions automated PRs
37+
- `sonarcloud[bot]` - SonarCloud code analysis
38+
- `codecov[bot]` - Codecov coverage reports
39+
40+
## What Gets Filtered
41+
42+
When a username is in the exclusion list, the following entities are NOT collected:
43+
44+
1. **Pull Requests** - PRs authored by bots
45+
2. **PR Reviews** - Reviews submitted by bots
46+
3. **PR Review Comments** - Comments on PR reviews by bots
47+
4. **Issue Comments** - Comments on issues by bots
48+
49+
## How It Works
50+
51+
- Filtering happens at the **extraction** layer
52+
- Raw API responses are still saved (in `_raw_github_api_*` tables)
53+
- Filtered entities never reach the tool layer tables
54+
- Metrics queries only see non-bot entities
55+
56+
## Matching Rules
57+
58+
- **Case-insensitive**: `renovate[bot]` matches `Renovate[bot]` and `RENOVATE[BOT]`
59+
- **Exact match**: Must match the full username
60+
- **Whitespace trimmed**: Extra spaces in the config are ignored
61+
62+
## Examples
63+
64+
### Docker Compose
65+
66+
```yaml
67+
services:
68+
devlake:
69+
environment:
70+
- GITHUB_PR_EXCLUDELIST=renovate[bot],dependabot[bot]
71+
```
72+
73+
### Kubernetes
74+
75+
```yaml
76+
env:
77+
- name: GITHUB_PR_EXCLUDELIST
78+
value: "renovate[bot],dependabot[bot],github-actions[bot]"
79+
```
80+
81+
### Local Development
82+
83+
```bash
84+
# .env file
85+
GITHUB_PR_EXCLUDELIST=renovate[bot],dependabot[bot]
86+
```
87+
88+
## Updating the Exclusion List
89+
90+
Changes to `GITHUB_PR_EXCLUDELIST` require a DevLake restart. After updating:
91+
92+
1. Restart DevLake
93+
2. Trigger re-collection for affected repositories
94+
3. Previously collected bot data remains in the database
95+
4. New collections will respect the updated filter
96+
97+
## Verification
98+
99+
Check logs for filtering activity:
100+
101+
```
102+
DEBUG: Skipping PR #123 from bot user: renovate[bot]
103+
DEBUG: Skipping review #456 from bot user: dependabot[bot]
104+
```
105+
106+
## Troubleshooting
107+
108+
**Bot PRs still appearing in metrics:**
109+
110+
1. Verify `GITHUB_PR_EXCLUDELIST` is set correctly
111+
2. Check DevLake logs for "Skipping" messages
112+
3. Ensure username matches exactly (case-insensitive)
113+
4. Restart DevLake after config changes
114+
5. Re-run collection for the repository
115+
116+
**How to find bot usernames:**
117+
118+
Check GitHub PR/comment authors in the web UI - bot usernames typically end with `[bot]`.

backend/plugins/github/e2e/pr_review_test.go

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,10 @@ limitations under the License.
1818
package e2e
1919

2020
import (
21+
"os"
2122
"testing"
2223

24+
"github.com/apache/incubator-devlake/core/dal"
2325
"github.com/apache/incubator-devlake/core/models/domainlayer/code"
2426
"github.com/apache/incubator-devlake/helpers/e2ehelper"
2527
"github.com/apache/incubator-devlake/plugins/github/impl"
@@ -99,3 +101,41 @@ func TestPrReviewDataFlow(t *testing.T) {
99101
},
100102
)
101103
}
104+
105+
func TestPrReviewDataFlowWithBotFiltering(t *testing.T) {
106+
var plugin impl.Github
107+
dataflowTester := e2ehelper.NewDataFlowTester(t, "github", plugin)
108+
109+
// Set up bot filtering
110+
tasks.ResetExcludedUsernamesForTest()
111+
os.Setenv("GITHUB_PR_EXCLUDELIST", "renovate[bot]")
112+
defer os.Unsetenv("GITHUB_PR_EXCLUDELIST")
113+
114+
taskData := &tasks.GithubTaskData{
115+
Options: &tasks.GithubOptions{
116+
ConnectionId: 1,
117+
Name: "test/repo",
118+
GithubId: 123,
119+
},
120+
}
121+
122+
// import raw data table with bot and human reviews
123+
dataflowTester.ImportCsvIntoRawTable("./raw_tables/_raw_github_api_pr_reviews_bot_filter.csv", "_raw_github_api_pull_request_reviews")
124+
125+
// verify review extraction filters bot reviews
126+
dataflowTester.FlushTabler(&models.GithubPrReview{})
127+
dataflowTester.FlushTabler(&models.GithubReviewer{})
128+
dataflowTester.FlushTabler(&models.GithubRepoAccount{})
129+
dataflowTester.Subtask(tasks.ExtractApiPullRequestReviewsMeta, taskData)
130+
131+
// Verify only human review was extracted
132+
var reviews []models.GithubPrReview
133+
dataflowTester.Dal.All(&reviews, dal.Where("connection_id = ?", 1))
134+
135+
if len(reviews) != 1 {
136+
t.Errorf("Expected 1 review (human), got %d", len(reviews))
137+
}
138+
if len(reviews) > 0 && reviews[0].GithubId != 5002 {
139+
t.Errorf("Expected review #5002 (human), got #%d", reviews[0].GithubId)
140+
}
141+
}

backend/plugins/github/e2e/pr_test.go

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,10 @@ limitations under the License.
1818
package e2e
1919

2020
import (
21+
"os"
2122
"testing"
2223

24+
"github.com/apache/incubator-devlake/core/dal"
2325
"github.com/apache/incubator-devlake/core/models/domainlayer/code"
2426
"github.com/apache/incubator-devlake/helpers/e2ehelper"
2527
"github.com/apache/incubator-devlake/plugins/github/impl"
@@ -171,3 +173,41 @@ func TestPrDataFlow(t *testing.T) {
171173
},
172174
)
173175
}
176+
177+
func TestPrDataFlowWithBotFiltering(t *testing.T) {
178+
var plugin impl.Github
179+
dataflowTester := e2ehelper.NewDataFlowTester(t, "github", plugin)
180+
181+
// Set up bot filtering
182+
tasks.ResetExcludedUsernamesForTest()
183+
os.Setenv("GITHUB_PR_EXCLUDELIST", "renovate[bot]")
184+
defer os.Unsetenv("GITHUB_PR_EXCLUDELIST")
185+
186+
taskData := &tasks.GithubTaskData{
187+
Options: &tasks.GithubOptions{
188+
ConnectionId: 1,
189+
Name: "test/repo",
190+
GithubId: 123,
191+
ScopeConfig: &models.GithubScopeConfig{},
192+
},
193+
}
194+
195+
// import raw data table with bot and human PRs
196+
dataflowTester.ImportCsvIntoRawTable("./raw_tables/_raw_github_api_pull_requests_bot_filter.csv", "_raw_github_api_pull_requests")
197+
198+
// verify pr extraction filters bot PRs
199+
dataflowTester.FlushTabler(&models.GithubPullRequest{})
200+
dataflowTester.FlushTabler(&models.GithubRepoAccount{})
201+
dataflowTester.Subtask(tasks.ExtractApiPullRequestsMeta, taskData)
202+
203+
// Verify only human PR was extracted
204+
var prs []models.GithubPullRequest
205+
dataflowTester.Dal.All(&prs, dal.Where("connection_id = ?", 1))
206+
207+
if len(prs) != 1 {
208+
t.Errorf("Expected 1 PR (human), got %d", len(prs))
209+
}
210+
if len(prs) > 0 && prs[0].Number != 1000 {
211+
t.Errorf("Expected PR #1000 (human), got #%d", prs[0].Number)
212+
}
213+
}
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
id,params,data,url,input,created_at
2+
1,"{""ConnectionId"":1,""Name"":""test/repo""}","{""id"":5001,""user"":{""login"":""renovate[bot]"",""id"":29139614},""body"":""LGTM"",""state"":""APPROVED"",""commit_id"":""abc123"",""submitted_at"":""2024-01-01T00:00:00Z""}",https://api.github.com/repos/test/repo/pulls/1/reviews,"{""GithubId"":1,""Number"":1}",2024-01-01 00:00:00
3+
2,"{""ConnectionId"":1,""Name"":""test/repo""}","{""id"":5002,""user"":{""login"":""human-reviewer"",""id"":12345},""body"":""Looks good"",""state"":""APPROVED"",""commit_id"":""abc123"",""submitted_at"":""2024-01-02T00:00:00Z""}",https://api.github.com/repos/test/repo/pulls/1/reviews,"{""GithubId"":1,""Number"":1}",2024-01-02 00:00:00
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
id,params,data,url,input,created_at
2+
1,"{""ConnectionId"":1,""Name"":""test/repo""}","{""id"":999,""number"":999,""state"":""closed"",""title"":""Dependency Update"",""user"":{""login"":""renovate[bot]"",""id"":29139614},""body"":""Updates dependencies"",""created_at"":""2024-01-01T00:00:00Z"",""updated_at"":""2024-01-02T00:00:00Z"",""closed_at"":""2024-01-02T00:00:00Z"",""merged_at"":""2024-01-02T00:00:00Z"",""merge_commit_sha"":""abc123"",""merged"":true,""additions"":10,""deletions"":5,""changed_files"":2,""comments"":0,""review_comments"":0,""commits"":1,""draft"":false,""labels"":[],""head"":{""ref"":""renovate/deps"",""sha"":""head123"",""repo"":{""id"":123,""name"":""repo""}},""base"":{""ref"":""main"",""sha"":""base123"",""repo"":{""id"":123,""name"":""repo""}},""html_url"":""https://github.com/test/repo/pull/999""}",https://api.github.com/repos/test/repo/pulls,null,2024-01-01 00:00:00
3+
2,"{""ConnectionId"":1,""Name"":""test/repo""}","{""id"":1000,""number"":1000,""state"":""open"",""title"":""Feature PR"",""user"":{""login"":""human-dev"",""id"":12345},""body"":""Adds feature"",""created_at"":""2024-01-03T00:00:00Z"",""updated_at"":""2024-01-03T00:00:00Z"",""closed_at"":null,""merged_at"":null,""merge_commit_sha"":"""",""merged"":false,""additions"":100,""deletions"":20,""changed_files"":5,""comments"":2,""review_comments"":3,""commits"":5,""draft"":false,""labels"":[],""head"":{""ref"":""feature/new"",""sha"":""head456"",""repo"":{""id"":123,""name"":""repo""}},""base"":{""ref"":""main"",""sha"":""base123"",""repo"":{""id"":123,""name"":""repo""}},""html_url"":""https://github.com/test/repo/pull/1000""}",https://api.github.com/repos/test/repo/pulls,null,2024-01-03 00:00:00

backend/plugins/github/tasks/comment_extractor.go

Lines changed: 22 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -107,8 +107,19 @@ func ExtractApiComments(taskCtx plugin.SubTaskContext) errors.Error {
107107
Type: "NORMAL",
108108
}
109109
if apiComment.User != nil {
110+
// Filter bot comments by username
111+
if shouldSkipByUsername(apiComment.User.Login) {
112+
taskCtx.GetLogger().Debug("Skipping PR comment #%d from bot user: %s", apiComment.GithubId, apiComment.User.Login)
113+
return nil, nil
114+
}
110115
githubPrComment.AuthorUsername = apiComment.User.Login
111116
githubPrComment.AuthorUserId = apiComment.User.Id
117+
118+
githubAccount, err := convertAccount(apiComment.User, data.Options.GithubId, data.Options.ConnectionId)
119+
if err != nil {
120+
return nil, err
121+
}
122+
results = append(results, githubAccount)
112123
}
113124
results = append(results, githubPrComment)
114125
} else {
@@ -121,18 +132,22 @@ func ExtractApiComments(taskCtx plugin.SubTaskContext) errors.Error {
121132
GithubUpdatedAt: apiComment.GithubUpdatedAt.ToTime(),
122133
}
123134
if apiComment.User != nil {
135+
// Filter bot comments by username
136+
if shouldSkipByUsername(apiComment.User.Login) {
137+
taskCtx.GetLogger().Debug("Skipping issue comment #%d from bot user: %s", apiComment.GithubId, apiComment.User.Login)
138+
return nil, nil
139+
}
124140
githubIssueComment.AuthorUsername = apiComment.User.Login
125141
githubIssueComment.AuthorUserId = apiComment.User.Id
142+
143+
githubAccount, err := convertAccount(apiComment.User, data.Options.GithubId, data.Options.ConnectionId)
144+
if err != nil {
145+
return nil, err
146+
}
147+
results = append(results, githubAccount)
126148
}
127149
results = append(results, githubIssueComment)
128150
}
129-
if apiComment.User != nil {
130-
githubAccount, err := convertAccount(apiComment.User, data.Options.GithubId, data.Options.ConnectionId)
131-
if err != nil {
132-
return nil, err
133-
}
134-
results = append(results, githubAccount)
135-
}
136151
return results, nil
137152
},
138153
})

backend/plugins/github/tasks/pr_extractor.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,11 @@ func ExtractApiPullRequests(taskCtx plugin.SubTaskContext) errors.Error {
131131
if rawL.GithubId == 0 {
132132
return nil, nil
133133
}
134+
// Filter bot PRs by username
135+
if rawL.User != nil && shouldSkipByUsername(rawL.User.Login) {
136+
taskCtx.GetLogger().Debug("Skipping PR #%d from bot user: %s", rawL.Number, rawL.User.Login)
137+
return nil, nil
138+
}
134139
//If this is a pr, ignore
135140
githubPr, err := convertGithubPullRequest(rawL, data.Options.ConnectionId, data.Options.GithubId)
136141
if err != nil {

backend/plugins/github/tasks/pr_review_comment_extractor.go

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,12 @@ func ExtractApiPrReviewComments(taskCtx plugin.SubTaskContext) errors.Error {
103103
}
104104

105105
if prReviewComment.User != nil {
106+
// Filter bot comments by username
107+
if shouldSkipByUsername(prReviewComment.User.Login) {
108+
taskCtx.GetLogger().Debug("Skipping PR review comment #%d from bot user: %s", prReviewComment.GithubId, prReviewComment.User.Login)
109+
return nil, nil
110+
}
111+
106112
githubPrComment.AuthorUserId = prReviewComment.User.Id
107113
githubPrComment.AuthorUsername = prReviewComment.User.Login
108114

backend/plugins/github/tasks/pr_review_extractor.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,11 @@ func ExtractApiPullRequestReviews(taskCtx plugin.SubTaskContext) errors.Error {
8484
if apiPullRequestReview.State == "PENDING" || apiPullRequestReview.User == nil {
8585
return nil, nil
8686
}
87+
// Filter bot reviews by username
88+
if shouldSkipByUsername(apiPullRequestReview.User.Login) {
89+
taskCtx.GetLogger().Debug("Skipping review #%d from bot user: %s", apiPullRequestReview.GithubId, apiPullRequestReview.User.Login)
90+
return nil, nil
91+
}
8792
pull := &SimplePr{}
8893
err = errors.Convert(json.Unmarshal(row.Input, pull))
8994
if err != nil {

0 commit comments

Comments
 (0)