Skip to content

proces_pull_request_files doesnt handle hitting a rate limit very well #409

Description

@MoralCode

Given what ive seen of the GraphQL logic while working on #404, i dont think any of it is wired up to use keyman properly

As a result, when collecting a large repo with many thousands of PRs (enough to exceed the rate limit/require multiple runs to do so) we just seem to.... blow right past the limtit

Stack Trace
 Traceback: Traceback (most recent call last):
[core]     |   File "/collectoss/collectoss/tasks/github/util/github_graphql_data_access.py", line 109, in make_request_with_retries
[core]     |     return self.__make_request_with_retries(query, variables, timeout)
[core]     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[core]     |   File "/collectoss/.venv/lib/python3.11/site-packages/tenacity/__init__.py", line 330, in wrapped_f
[core]     |     return self(f, *args, **kw)
[core]     |            ^^^^^^^^^^^^^^^^^^^^
[core]     |   File "/collectoss/.venv/lib/python3.11/site-packages/tenacity/__init__.py", line 467, in __call__
[core]     |     do = self.iter(retry_state=retry_state)
[core]     |          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[core]     |   File "/collectoss/.venv/lib/python3.11/site-packages/tenacity/__init__.py", line 368, in iter
[core]     |     result = action(retry_state)
[core]     |              ^^^^^^^^^^^^^^^^^^^
[core]     |   File "/collectoss/.venv/lib/python3.11/site-packages/tenacity/__init__.py", line 411, in exc_check
[core]     |     raise retry_exc from fut.exception()
[core]     | tenacity.RetryError: RetryError[<Future at 0x7fb862e41e10 state=finished raised Exception>]
[core]     | 
[core]     | During handling of the above exception, another exception occurred:
[core]     | 
[core]     | Traceback (most recent call last):
[core]     |   File "/collectoss/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 453, in trace_task
[core]     |     R = retval = fun(*args, **kwargs)
[core]     |                  ^^^^^^^^^^^^^^^^^^^^
[core]     |   File "/collectoss/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 736, in __protected_call__
[core]     |     return self.run(*args, **kwargs)
[core]     |            ^^^^^^^^^^^^^^^^^^^^^^^^^
[core]     |   File "/collectoss/collectoss/tasks/github/pull_requests/files_model/tasks.py", line 18, in process_pull_request_files
[core]     |     pull_request_files_model(repo.repo_id, logger, db_session, manifest.key_auth, full_collection)
[core]     |   File "/collectoss/collectoss/tasks/github/pull_requests/files_model/core.py", line 85, in pull_request_files_model
[core]     |     for pr_file in github_graphql_data_access.paginate_resource(query, params, values):
[core]     |   File "/collectoss/collectoss/tasks/github/util/github_graphql_data_access.py", line 53, in paginate_resource
[core]     |     result_json = self.make_request_with_retries(query, params).json()
[core]     |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[core]     |   File "/collectoss/collectoss/tasks/github/util/github_graphql_data_access.py", line 111, in make_request_with_retries
[core]     |     raise e.last_attempt.exception()
[core]     |   File "/collectoss/.venv/lib/python3.11/site-packages/tenacity/__init__.py", line 470, in __call__
[core]     |     result = fn(*args, **kwargs)
[core]     |              ^^^^^^^^^^^^^^^^^^^
[core]     |   File "/collectoss/collectoss/tasks/github/util/github_graphql_data_access.py", line 123, in __make_request_with_retries
[core]     |     return self.make_request(query, variables, timeout)
[core]     |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[core]     |   File "/collectoss/collectoss/tasks/github/util/github_graphql_data_access.py", line 98, in make_request
[core]     |     raise Exception(f"Github Graphql Data Access Errors: {errors}")
[core]     | Exception: Github Graphql Data Access Errors: [{'type': 'RATE_LIMIT', 'code': 'graphql_rate_limit', 'message': 'API rate limit already exceeded for user ID 17362949.'}]
[core]     | [2026-06-24 19:21:54,934: WARNING/ForkPoolWorker-5] 2026-06-24 19:21:54 39bd24393d6a secondary_task_failure[263] INFO Repo git: https://github.com/debezium/debezium
[core]     | [2026-06-24 19:21:54,972: ERROR/ForkPoolWorker-5] Task collectoss.tasks.github.pull_requests.files_model.tasks.process_pull_request_files[e4bef086-0677-4ead-b511-cf0f0a999381] raised unexpected: Exception("Github Graphql Data Access Errors: [{'type': 'RATE_LIMIT', 'code': 'graphql_rate_limit', 'message': 'API rate limit already exceeded for user ID 17362949.'}]")

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Fields

No fields configured for Bug.

Projects

Status
Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions