Skip to main content

GitHub Discovery

GitHub Provider

The GitHub integration has a discovery provider for discovering catalog entities within a GitHub organization. The provider will crawl the GitHub organization and register entities matching the configured path. This can be useful as an alternative to static locations or manually adding things to the catalog. This is the preferred method for ingesting entities into the catalog.

Installation without Events Support

You will have to add the provider in the catalog initialization code of your backend. They are not installed by default, therefore you have to add a dependency on @backstage/plugin-catalog-backend-module-github to your backend package.

# From your Backstage root directory
yarn add --cwd packages/backend @backstage/plugin-catalog-backend-module-github

And then add the entity provider to your catalog builder:

import { GithubEntityProvider } from '@backstage/plugin-catalog-backend-module-github';

export default async function createPlugin(
env: PluginEnvironment,
): Promise<Router> {
const builder = await CatalogBuilder.create(env);
GithubEntityProvider.fromConfig(env.config, {
logger: env.logger,
scheduler: env.scheduler,

// ..

Installation with Events Support

Please follow the installation instructions at

Additionally, you need to decide how you want to receive events from external sources like

Set up your provider

import { CatalogBuilder } from '@backstage/plugin-catalog-backend';
import { GithubEntityProvider } from '@backstage/plugin-catalog-backend-module-github';
import { ScaffolderEntitiesProcessor } from '@backstage/plugin-scaffolder-backend';
import { Router } from 'express';
import { PluginEnvironment } from '../types';

export default async function createPlugin(
env: PluginEnvironment,
): Promise<Router> {
const builder = await CatalogBuilder.create(env);
builder.addProcessor(new ScaffolderEntitiesProcessor());
const githubProvider = GithubEntityProvider.fromConfig(env.config, {
logger: env.logger,
scheduler: env.scheduler,
const { processingEngine, router } = await;
await processingEngine.start();
return router;

You can check the official docs to configure your webhook and to secure your request. The webhook will need to be configured to forward push events.


To use the discovery provider, you'll need a GitHub integration set up with either a Personal Access Token or GitHub Apps.

Then you can add a github config to the catalog providers configuration:

# the provider ID can be any camelCase string
organization: 'backstage' # string
catalogPath: '/catalog-info.yaml' # string
branch: 'main' # string
repository: '.*' # Regex
schedule: # same options as in TaskScheduleDefinition
# supports cron, ISO duration, "human duration" as used in code
frequency: { minutes: 30 }
# supports ISO duration, "human duration" as used in code
timeout: { minutes: 3 }
organization: 'new-org' # string
catalogPath: '/custom/path/catalog-info.yaml' # string
filters: # optional filters
branch: 'develop' # optional string
repository: '.*' # optional Regex
organization: 'new-org' # string
catalogPath: '/groups/**/*.yaml' # this will search all folders for files that end in .yaml
filters: # optional filters
branch: 'develop' # optional string
repository: '.*' # optional Regex
organization: 'backstage' # string
catalogPath: '/catalog-info.yaml' # string
branch: 'main' # string
repository: '.*' # Regex
topic: 'backstage-exclude' # optional string
organization: 'backstage' # string
catalogPath: '/catalog-info.yaml' # string
branch: 'main' # string
repository: '.*' # Regex
include: ['backstage-include'] # optional array of strings
exclude: ['experiments'] # optional array of strings
organization: 'backstage' # string
catalogPath: '/catalog-info.yaml' # string
branch: 'main' # string
repository: '.*' # Regex
validateLocationsExist: true # optional boolean
organization: 'backstage' # string
catalogPath: '/catalog-info.yaml' # string
- public
- internal
organization: 'backstage' # string
catalogPath: '/catalog-info.yaml' # string

This provider supports multiple organizations via unique provider IDs.

Note: It is possible but certainly not recommended to skip the provider ID level. If you do so, default will be used as provider ID.

  • catalogPath (optional): Default: /catalog-info.yaml. Path where to look for catalog-info.yaml files. You can use wildcards - * or ** - to search the path and/or the filename. Wildcards cannot be used if the validateLocationsExist option is set to true.
  • filters (optional):
    • branch (optional): String used to filter results based on the branch name.
    • repository (optional): Regular expression used to filter results based on the repository name.
    • topic (optional): Both of the filters below may be used at the same time but the exclusion filter has the highest priority. In the example above, a repository with the backstage-include topic would still be excluded if it were also carrying the experiments topic.
      • include (optional): An array of strings used to filter in results based on their associated GitHub topics. If configured, only repositories with one (or more) topic(s) present in the inclusion filter will be ingested
      • exclude (optional): An array of strings used to filter out results based on their associated GitHub topics. If configured, all repositories except those with one (or more) topics(s) present in the exclusion filter will be ingested.
    • visibility (optional): An array of strings used to filter results based on their visibility. Available options are private, internal, public. If configured (non empty), only repositories with visibility present in the filter will be ingested
  • host (optional): The hostname of your GitHub Enterprise instance. It must match a host defined in integrations.github.
  • organization: Name of your organization account/workspace. If you want to add multiple organizations, you need to add one provider config each.
  • validateLocationsExist (optional): Whether to validate locations that exist before emitting them. This option avoids generating locations for catalog info files that do not exist in the source repository. Defaults to false. Due to limitations in the GitHub API's ability to query for repository objects, this option cannot be used in conjunction with wildcards in the catalogPath.
  • schedule:
    • frequency: How often you want the task to run. The system does its best to avoid overlapping invocations.
    • timeout: The maximum amount of time that a single task invocation can take.
    • initialDelay (optional): The amount of time that should pass before the first invocation happens.
    • scope (optional): 'global' or 'local'. Sets the scope of concurrency control.

GitHub API Rate Limits

GitHub rate limits API requests to 5,000 per hour (or more for Enterprise accounts). The snippet below refreshes the Backstage catalog data every 35 minutes, which issues an API request for each discovered location.

If your requests are too frequent then you may get throttled by rate limiting. You can change the refresh frequency of the catalog in your app-config.yaml file by controlling the schedule.

frequency: { minutes: 35 }
timeout: { minutes: 3 }

More information about scheduling can be found on the TaskScheduleDefinition page.

Alternatively, or additionally, you can configure github-apps authentication which carries a much higher rate limit at GitHub.

This is true for any method of adding GitHub entities to the catalog, but especially easy to hit with automatic discovery.

GitHub Processor (To Be Deprecated)

The GitHub integration has a special discovery processor for discovering catalog entities within a GitHub organization. The processor will crawl the GitHub organization and register entities matching the configured path. This can be useful as an alternative to static locations or manually adding things to the catalog.


You will have to add the processors in the catalog initialization code of your backend. They are not installed by default, therefore you have to add a dependency on @backstage/plugin-catalog-backend-module-github to your backend package, plus @backstage/integration for the basic credentials management:

# From your Backstage root directory
yarn add --cwd packages/backend @backstage/integration @backstage/plugin-catalog-backend-module-github

And then add the processors to your catalog builder:

import {
} from '@backstage/plugin-catalog-backend-module-github';
import {
} from '@backstage/integration';

export default async function createPlugin(
env: PluginEnvironment,
): Promise<Router> {
const builder = await CatalogBuilder.create(env);
const integrations = ScmIntegrations.fromConfig(env.config);
const githubCredentialsProvider =
GithubDiscoveryProcessor.fromConfig(env.config, {
logger: env.logger,
GithubOrgReaderProcessor.fromConfig(env.config, {
logger: env.logger,

// ..


To use the discovery processor, you'll need a GitHub integration set up with either a Personal Access Token or GitHub Apps.

Then you can add a location target to the catalog configuration:

# (since 0.13.5) Scan all repositories for a catalog-info.yaml in the root of the default branch
- type: github-discovery
# Or use a custom pattern for a subset of all repositories with default repository
- type: github-discovery
# Or use a custom file format and location
- type: github-discovery
# Or use a specific branch-name
- type: github-discovery

Note the github-discovery type, as this is not a regular url processor.

When using a custom pattern, the target is composed of three parts:

  • The base organization URL, in this case
  • The repository blob to scan, which accepts * wildcard tokens. This can simply be * to scan all repositories in the organization. This example only looks for repositories prefixed with service-.
  • The path within each repository to find the catalog YAML file. This will usually be /blob/main/catalog-info.yaml, /blob/master/catalog-info.yaml or a similar variation for catalog files stored in the root directory of each repository. You could also use a dash (-) for referring to the default branch.

GitHub API Rate Limits

GitHub rate limits API requests to 5,000 per hour (or more for Enterprise accounts). The default Backstage catalog backend refreshes data every 100 seconds, which issues an API request for each discovered location.

This means if you have more than ~140 catalog entities, you may get throttled by rate limiting. You can change the refresh rate of the catalog in your packages/backend/src/plugins/catalog.ts file:

const builder = await CatalogBuilder.create(env);

// For example, to refresh every 5 minutes (300 seconds).

Alternatively, or additionally, you can configure github-apps authentication which carries a much higher rate limit at GitHub.

This is true for any method of adding GitHub entities to the catalog, but especially easy to hit with automatic discovery.