Skip to content

Commit a1ae6d2

Browse files
authored
feat(bulk-import): add support for on-hehalf-of user access (#2647)
* feat(bulk-import): add support for on-hehalf-of user access Signed-off-by: Patrick Knight <pknight@redhat.com> * feat(bulk-import): add tests for on behalf of Signed-off-by: Patrick Knight <pknight@redhat.com> * feat(bulk-import): fix scm-hosts audit event id Signed-off-by: Patrick Knight <pknight@redhat.com> * feat(bulk-import): ensure that tokens are not logged to audit logger Signed-off-by: Patrick Knight <pknight@redhat.com> * feat(bulk-import): fix openapi header type mismatch Signed-off-by: Patrick Knight <pknight@redhat.com> * feat(bulk-import): fix raw token in query key Signed-off-by: Patrick Knight <pknight@redhat.com> * feat(bulk-import): fix sonarqube issues Signed-off-by: Patrick Knight <pknight@redhat.com> * feat(bulk-import): fix sonarqube duplication issues Signed-off-by: Patrick Knight <pknight@redhat.com> * feat(bulk-import): fix e2e tests Signed-off-by: Patrick Knight <pknight@redhat.com> * feat(bulk-import): make auth providers a new requirement Signed-off-by: Patrick Knight <pknight@redhat.com> * feat(bulk-import): consolidate getDeleteImportActionPath and getImportActionPath Signed-off-by: Patrick Knight <pknight@redhat.com> --------- Signed-off-by: Patrick Knight <pknight@redhat.com>
1 parent d1ae476 commit a1ae6d2

48 files changed

Lines changed: 2322 additions & 524 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
'@red-hat-developer-hub/backstage-plugin-bulk-import-backend': minor
3+
'@red-hat-developer-hub/backstage-plugin-bulk-import': minor
4+
---
5+
6+
## On Behalf of User Access
7+
8+
This release introduces the ability for the Bulk Import plugin to fetch repository and organization listings **on behalf of the signed-in user**, using their OAuth credentials rather than relying solely on server-side integration credentials (GitHub App, PAT, or GitLab token).
9+
10+
### What Changed
11+
12+
**Backend (`bulk-import-backend`)**
13+
14+
- Added a new `GET /api/bulk-import/scm-hosts` endpoint that returns the configured GitHub and GitLab integration host URLs as a `SCMHostList` object, enabling the frontend to discover which hosts to request OAuth tokens for.
15+
- The `GET /repositories` and `GET /organizations/{organizationName}/repositories` endpoints now **require** the `x-scm-tokens` request header — a JSON map of SCM host base URL to user OAuth token. Requests that omit this header, or supply an empty or oversized header, are rejected with HTTP 401. This ensures repository listings are always scoped to the signed-in user's access and never fall back to server-wide integration credentials.
16+
- The `x-scm-tokens` header is stripped from the request immediately upon receipt, before the permission check and before any audit event is created, so OAuth token values are never persisted in audit logs.
17+
- When user tokens are provided for GitHub, the Octokit response cache is intentionally disabled to prevent cross-user ETag cache leakage. Server-side credential paths are not affected.
18+
- Introduced a shared `GitApiService` interface and common SCM types (`SCMOrganization`, `SCMRepository`, `SCMFetchError`, etc.) to unify the GitHub and GitLab service implementations under a consistent contract.
19+
20+
**Frontend (`bulk-import`)**
21+
22+
- The plugin now has a **soft dependency** on `@backstage/integration-react`'s `ScmAuthApi`. If the API is registered in the application, the plugin automatically requests OAuth tokens for each configured SCM host and passes them to the backend to enable user-scoped repository listings.
23+
- Added `getSCMHosts()` to the `BulkImportAPI` interface with a corresponding `GET /api/bulk-import/scm-hosts` client call, used to discover host URLs before requesting user tokens.
24+
- User OAuth tokens are transmitted to the backend via the `X-SCM-Tokens` request header as a JSON-encoded map.
25+
- If the SCM OAuth integration is not configured or token collection fails for all hosts, the repository list query is **blocked** on the frontend and the hook surfaces a descriptive error. This prevents the frontend from firing a request that will always be rejected with 401.
26+
27+
### Required Configuration
28+
29+
The GitHub and/or GitLab OAuth provider must be configured in the Backstage application for repository listing to work. Deployments that previously relied on server-side credentials alone for the repository list view must add an SCM OAuth provider to continue using this feature.
30+
31+
If `ScmAuthApi` is not registered or tokens cannot be obtained for any configured SCM host, users will see an error prompting them to configure the SCM OAuth integration.

workspaces/bulk-import/e2e-tests/app.test.ts

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,13 @@ import {
2222
mockBulkImportDryRunResponse,
2323
mockBulkImportImportsResponse,
2424
mockBulkImportRepositoriesResponse,
25+
mockBulkImportSCMHostsResponse,
2526
mockImportByRepoData,
2627
mockImportByRepoFrontendData,
2728
mockImportsData,
2829
mockImportsDryRunData,
2930
mockRepositoriesData,
31+
mockSCMHostsData,
3032
} from './utils/apiUtils';
3133
import {
3234
getPreviewSidebarSnapshots,
@@ -51,6 +53,24 @@ test.describe('Bulk Import', () => {
5153
context = await browser.newContext();
5254
sharedPage = await context.newPage();
5355

56+
// The backend's GET /repositories and GET /organizations/{org}/repositories
57+
// endpoints require the X-SCM-Tokens header (HTTP 401 otherwise). In a real
58+
// deployment, the frontend obtains these tokens from the configured GitHub /
59+
// GitLab OAuth provider via ScmAuthApi and sends them with every listing
60+
// request. See plugins/bulk-import/README.md → "Required OAuth Configuration".
61+
//
62+
// In these e2e tests we bypass that requirement in two steps:
63+
// 1. Mock GET /api/bulk-import/scm-hosts to return empty host arrays.
64+
// The useRepositories hook hits the `!urls?.length → return undefined`
65+
// early-return path, so tokenFetchError stays undefined and the query
66+
// fires without any X-SCM-Tokens header.
67+
// 2. Mock GET /api/bulk-import/repositories* with a 200 response so
68+
// Playwright intercepts the token-free request before it ever reaches
69+
// the real backend's 401 guard.
70+
//
71+
// This lets us focus on UI behaviour without needing a real OAuth provider
72+
// set up in the test environment.
73+
await mockBulkImportSCMHostsResponse(sharedPage, mockSCMHostsData);
5474
await mockBulkImportRepositoriesResponse(sharedPage, mockRepositoriesData);
5575
await sharedPage.goto('/');
5676

workspaces/bulk-import/e2e-tests/utils/apiUtils.ts

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,13 +19,14 @@ import { Page } from '@playwright/test';
1919
* API route patterns for bulk import endpoints
2020
*/
2121
export const ApiRoutes = {
22+
scmHosts: '**/api/bulk-import/scm-hosts*',
2223
repositories: '**/api/bulk-import/repositories*',
2324
importsDryRun: '**/api/bulk-import/imports?dryRun=true*',
2425
imports: '**/api/bulk-import/imports',
2526
byRepoBackend:
26-
'**/api/bulk-import/import/by-repo?repo=https://github.com/test-org/backend-service*',
27+
'**/api/bulk-import/import/by-repo?repo=https%3A%2F%2Fgithub.com%2Ftest-org%2Fbackend-service*',
2728
byRepoFrontend:
28-
'**/api/bulk-import/import/by-repo?repo=https://github.com/test-org/frontend-app*',
29+
'**/api/bulk-import/import/by-repo?repo=https%3A%2F%2Fgithub.com%2Ftest-org%2Ffrontend-app*',
2930
} as const;
3031

3132
type ApiRouteKey = keyof typeof ApiRoutes;
@@ -91,6 +92,12 @@ export const mockBulkImportByRepoFrontendResponse = (
9192
status = 200,
9293
) => mockApiResponse(page, ApiRoutes.byRepoFrontend, responseData, status);
9394

95+
export const mockBulkImportSCMHostsResponse = (
96+
page: Page,
97+
responseData: object,
98+
status = 200,
99+
) => mockApiResponse(page, ApiRoutes.scmHosts, responseData, status);
100+
94101
// Reusable repository definitions
95102
const repositories = {
96103
backendService: {
@@ -135,6 +142,23 @@ const repositories = {
135142
},
136143
} as const;
137144

145+
/**
146+
* Mock data for SCM hosts response.
147+
* Returns empty host lists so the `useRepositories` hook hits the early-return
148+
* path (`!urls?.length → return undefined`) and never attempts token collection.
149+
* This means `tokenFetchError` stays `undefined`, the query is enabled, and the
150+
* frontend fires a request without `X-SCM-Tokens`.
151+
*
152+
* In production the backend would reject such a request with HTTP 401, but the
153+
* Playwright route mock for `ApiRoutes.repositories` intercepts the request
154+
* before it reaches the backend, so the e2e tests still receive the mocked
155+
* repository data regardless of the missing header.
156+
*/
157+
export const mockSCMHostsData = {
158+
github: [],
159+
gitlab: [],
160+
};
161+
138162
/** Mock data for repositories list response */
139163
export const mockRepositoriesData = {
140164
errors: [],

workspaces/bulk-import/plugins/bulk-import-backend/README.md

Lines changed: 41 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -301,13 +301,15 @@ The Bulk Import Backend plugin emits audit events for various operations. Events
301301

302302
- **`ping`**: tracks `GET` requests to the `/ping` endpoint, which allows to make sure the bulk import backend is up and running.
303303

304-
- **`org-read`**: tracks `GET` requests to the `/organizations` endpoint, which returns the list of organizations accessible from all configured GitHub Integrations.
304+
- **`scm-hosts-read`**: tracks `GET` requests to the `/scm-hosts-read` endpoint, which returns the list of configured GitHub and GitLab integration host URLs.
305+
306+
- **`org-read`**: tracks `GET` requests to the `/organizations` endpoint, which returns the list of organizations accessible from all configured SCM Integrations (GitHub and GitLab).
305307

306308
Filter on `queryType`.
307309
- **`all`**: tracks fetching all organizations. (GET `/organizations`)
308310
- **`by-query`**: tracks fetching organization filtered by the query parameter 'search'. (GET `/organizations`)
309311

310-
- **`repo-read`**: tracks `GET` requests to the endpoint, which returns the list of repositories accessible from all configured GitHub Integrations.
312+
- **`repo-read`**: tracks `GET` requests to the endpoint, which returns the list of repositories accessible from all configured SCM Integrations (GitHub and GitLab).
311313

312314
Filter on `queryType`.
313315
- **`all`**: tracks fetching a list of all repositories accessible by Backstage Github Integrations. (GET `/repositories`)
@@ -343,8 +345,43 @@ Example:
343345

344346
The bulk import backend plugin provides a REST API to bulk import catalog entities into the catalog. The API is available at the `/api/bulk-import` endpoint.
345347

346-
As a prerequisite, you need to add at least one GitHub Integration (using either a GitHub token or a GitHub App or both) in your app-config YAML file (or a local `app-config.local.yaml` file).
347-
See https://backstage.io/docs/integrations/github/locations/#configuration and https://backstage.io/docs/integrations/github/github-apps/#including-in-integrations-config for more details.
348+
As a prerequisite, you need to add at least one SCM integration in your app-config YAML file (or a local `app-config.local.yaml` file):
349+
350+
- **GitHub**: Configure a GitHub integration using a GitHub token or a GitHub App (or both). See the [GitHub Locations](https://backstage.io/docs/integrations/github/locations/#configuration) and [GitHub Apps](https://backstage.io/docs/integrations/github/github-apps/#including-in-integrations-config) documentation for details.
351+
- **GitLab** _(optional)_: Configure a GitLab integration if you want to import from GitLab repositories. See the [GitLab Locations](https://backstage.io/docs/integrations/gitlab/locations/) documentation for details.
352+
353+
### On Behalf of User Access
354+
355+
The plugin supports fetching repository and organization listings **on behalf of the signed-in user**, using their OAuth credentials rather than the server-wide integration credentials (GitHub App, PAT, or GitLab token).
356+
357+
#### How It Works
358+
359+
1. The frontend calls `GET /api/bulk-import/scm-hosts` to retrieve the list of configured SCM integration host URLs, grouped by provider (`github` and `gitlab`).
360+
2. For each host, the frontend requests an OAuth token from the Backstage `ScmAuthApi` (provided by `@backstage/integration-react`).
361+
3. The collected tokens are sent to the backend via the **required** `x-scm-tokens` request header — a JSON-encoded string whose value, when parsed, maps each integration base URL to the user's OAuth token (e.g. `{"https://github.com":"ghp_xxx"}`).
362+
4. The backend uses these user tokens to call the GitHub or GitLab APIs on behalf of the user, so the repository listings reflect what the signed-in user can personally access.
363+
364+
#### Required OAuth Configuration
365+
366+
The `x-scm-tokens` header is **required** for `GET /repositories` and `GET /organizations/{organizationName}/repositories`. Requests that omit the header, supply an empty token map, or send a header that exceeds the allowed size are rejected with **HTTP 401**.
367+
368+
A GitHub and/or GitLab OAuth provider must therefore be configured in the Backstage application for these endpoints to work. Refer to the [Backstage GitHub auth docs](https://backstage.io/docs/auth/github/provider) and [GitLab auth docs](https://backstage.io/docs/auth/gitlab/provider) for setup instructions.
369+
370+
> **Migration note:** Deployments that previously relied solely on server-side credentials (GitHub App, PAT, or GitLab token) for the repository list view must now also configure an SCM OAuth provider. The server-side credentials are still used for all other operations (import creation, status checks, etc.) and are unaffected by this change.
371+
372+
#### Security Note
373+
374+
When user tokens are provided for GitHub, the Octokit response cache is intentionally disabled to prevent cross-user ETag cache leakage. Server-side credential paths are not affected.
375+
376+
The `x-scm-tokens` header is stripped from the request immediately upon receipt — before the permission check and before any audit event is created — so OAuth token values are never persisted in audit logs.
377+
378+
#### New API Endpoint
379+
380+
| Method | Path | Description |
381+
| ------ | ---------------------------- | ------------------------------------------------------------------------------------------- |
382+
| `GET` | `/api/bulk-import/scm-hosts` | Returns configured GitHub and GitLab integration host base URLs as an `SCMHostList` object. |
383+
384+
The existing `GET /repositories` and `GET /organizations/{organizationName}/repositories` endpoints now **require** the `x-scm-tokens` header. See the [API documentation](api-docs/README.md) for the full request/response specification.
348385

349386
## REST API
350387

workspaces/bulk-import/plugins/bulk-import-backend/__fixtures__/handlers.ts

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,10 @@ export const DEFAULT_TEST_HANDLERS: RestHandler<
122122
);
123123
}),
124124

125+
rest.get(`${LOCAL_ADDR}/orgs/my-org-1/repos`, (_, res, ctx) => {
126+
return res(ctx.status(200), ctx.json([]));
127+
}),
128+
125129
rest.get(`${LOCAL_ADDR}/orgs/my-ent-org-1/repos`, (_, res, ctx) => {
126130
return res(
127131
ctx.status(200),

workspaces/bulk-import/plugins/bulk-import-backend/api-docs/.openapi-generator/FILES

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ Models/PullRequest.md
1919
Models/Repository.md
2020
Models/RepositoryList.md
2121
Models/Repository_importStatus.md
22+
Models/SCMHostList.md
2223
Models/ScaffolderTask.md
2324
Models/Source.md
2425
Models/SourceImport.md

workspaces/bulk-import/plugins/bulk-import-backend/api-docs/Apis/ManagementApi.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,32 @@ All URIs are relative to *http://localhost:7007/api/bulk-import*
44

55
| Method | HTTP request | Description |
66
|------------- | ------------- | -------------|
7+
| [**findAllSCMHosts**](ManagementApi.md#findAllSCMHosts) | **GET** /scm-hosts | Retrieve the SCM Integration hosts |
78
| [**ping**](ManagementApi.md#ping) | **GET** /ping | Check the health of the Bulk Import backend router |
89

910

11+
<a name="findAllSCMHosts"></a>
12+
# **findAllSCMHosts**
13+
> SCMHostList findAllSCMHosts()
14+
15+
Retrieve the SCM Integration hosts
16+
17+
### Parameters
18+
This endpoint does not need any parameter.
19+
20+
### Return type
21+
22+
[**SCMHostList**](../Models/SCMHostList.md)
23+
24+
### Authorization
25+
26+
[BearerAuth](../README.md#BearerAuth)
27+
28+
### HTTP request headers
29+
30+
- **Content-Type**: Not defined
31+
- **Accept**: application/json
32+
1033
<a name="ping"></a>
1134
# **ping**
1235
> ping_200_response ping()

workspaces/bulk-import/plugins/bulk-import-backend/api-docs/Apis/OrganizationApi.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ Fetch Organizations accessible by Backstage Github Integrations
3838

3939
<a name="findRepositoriesByOrganization"></a>
4040
# **findRepositoriesByOrganization**
41-
> RepositoryList findRepositoriesByOrganization(organizationName, checkImportStatus, pagePerIntegration, sizePerIntegration, search, approvalTool)
41+
> RepositoryList findRepositoriesByOrganization(organizationName, checkImportStatus, pagePerIntegration, sizePerIntegration, search, approvalTool, x-scm-tokens)
4242
4343
Fetch Repositories in the specified GitHub organization, provided it is accessible by any of the configured GitHub Integrations.
4444

@@ -52,6 +52,7 @@ Fetch Repositories in the specified GitHub organization, provided it is accessib
5252
| **sizePerIntegration** | **Integer**| the number of items per Integration to return per page | [optional] [default to 20] |
5353
| **search** | **String**| returns only the items that match the search string | [optional] [default to null] |
5454
| **approvalTool** | **String**| the approvalTool to use | [optional] [default to GIT] |
55+
| **x-scm-tokens** | **String**| **Required.** JSON-encoded map of SCM host URL to user OAuth token. Used to fetch repositories on behalf of the signed-in user. The value must be a JSON object whose keys are SCM integration base URLs and whose values are OAuth bearer tokens (e.g. `{"https://github.com":"ghp_xxx"}`). Requests that omit this header, supply an empty object, or exceed 4 KB are rejected with HTTP 401. | [required] [default to null] |
5556

5657
### Return type
5758

workspaces/bulk-import/plugins/bulk-import-backend/api-docs/Apis/RepositoryApi.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ All URIs are relative to *http://localhost:7007/api/bulk-import*
99

1010
<a name="findAllRepositories"></a>
1111
# **findAllRepositories**
12-
> RepositoryList findAllRepositories(checkImportStatus, pagePerIntegration, sizePerIntegration, search, approvalTool)
12+
> RepositoryList findAllRepositories(checkImportStatus, pagePerIntegration, sizePerIntegration, search, approvalTool, x-scm-tokens)
1313
1414
Fetch Organization Repositories accessible by Backstage Github Integrations
1515

@@ -22,6 +22,7 @@ Fetch Organization Repositories accessible by Backstage Github Integrations
2222
| **sizePerIntegration** | **Integer**| the number of items per Integration to return per page | [optional] [default to 20] |
2323
| **search** | **String**| returns only the items that match the search string | [optional] [default to null] |
2424
| **approvalTool** | **String**| the approvalTool to use | [optional] [default to GIT] |
25+
| **x-scm-tokens** | **String**| **Required.** JSON-encoded map of SCM host URL to user OAuth token. Used to fetch repositories on behalf of the signed-in user. The value must be a JSON object whose keys are SCM integration base URLs and whose values are OAuth bearer tokens (e.g. `{"https://github.com":"ghp_xxx"}`). Requests that omit this header, supply an empty object, or exceed 4 KB are rejected with HTTP 401. | [required] [default to null] |
2526

2627
### Return type
2728

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# SCMHostList
2+
## Properties
3+
4+
| Name | Type | Description | Notes |
5+
|------------ | ------------- | ------------- | -------------|
6+
| **github** | **List** | | [optional] [default to null] |
7+
| **gitlab** | **List** | | [optional] [default to null] |
8+
9+
[[Back to Model list]](../README.md#documentation-for-models) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to README]](../README.md)
10+

0 commit comments

Comments
 (0)