Devlist Discussion: https://lists.apache.org/thread/7n4pklzcc4lxtxsy9g69ssffg9qb…dyvb
A static-site provider registry for discovering and browsing Airflow providers and their modules. Deployed at `airflow.apache.org/registry/` alongside the existing docs infrastructure (S3 + CloudFront).
Staging preview: https://airflow.staged.apache.org/registry/
## Acknowledgments
Many of you know the [Astronomer Registry](https://registry.astronomer.io), which has been the go-to for discovering providers for years. Big thanks to **Astronomer** and @josh-fell for building and maintaining it. This new registry is designed to be a community-owned successor on `airflow.apache.org`, with the eventual goal of redirecting `registry.astronomer.io` traffic here once it's stable. Thanks also to @ashb for suggesting and prototyping the Eleventy-based approach.
## What it does
The registry indexes all 99 official providers and 840 modules (operators, hooks, sensors, triggers, transfers, bundles, notifiers, secrets backends, log handlers, executors) from the existing
`providers/*/provider.yaml` files and source code in this repo. No external data sources beyond PyPI download stats.
**Pages:**
- **Homepage** — search bar (Cmd+K), stats counters, featured and new providers
- **Providers listing** — filterable by lifecycle stage (stable/incubation/deprecated), category, and sort order (downloads, name, recently updated)
- **Provider detail** — module counts by type, install command with extras/version selection, dependency info, connection builder, and a tabbed module browser with category sidebar and per-module search
- **Explore by Category** — providers grouped into Cloud, Databases, Data Warehouses, Messaging, AI/ML, Data Processing, etc.
- **Statistics** — module type distribution, lifecycle breakdown, top providers by downloads and module count
- **JSON API** — `/api/providers.json`, `/api/modules.json`, per-provider endpoints for modules, parameters, and connections
**Connection Builder** — pick a connection type (e.g. `aws`, `redshift`), fill in the form fields with placeholders and sensitivity markers, and export as URI, JSON, or environment variable format. Fields are
extracted from provider.yaml connection metadata.
## Screenshots
### Homepage
| Light | Dark |
|-------|------|
| <img width="640" alt="homepage" src="https://github.com/user-attachments/assets/33cea9e3-b906-4e4d-a26b-9acf2de38272" /> | <img width="640" alt="homepage-dark" src="https://github.com/user-attachments/assets/5043097f-4a15-4df1-9924-96c55ed24266" /> |
### Providers List
| Light | Dark |
|-------|------|
| <img width="640" alt="providers-list" src="https://github.com/user-attachments/assets/46395130-9ce9-4730-a949-97959165da14" /> | <img width="640" alt="providers-list-dark" src="https://github.com/user-attachments/assets/0e8dd3b7-aee1-4604-a97f-8d21429623d3" /> |
### Provider Detail (Amazon)
| Light | Dark |
|-------|------|
| <img width="640" alt="provider-detail-amazon" src="https://github.com/user-attachments/assets/0b9d9a0f-fbc2-4173-b96b-259b7cc8d2b4" /> | <img width="640" alt="provider-detail-amazon-dark" src="https://github.com/user-attachments/assets/c9beb13c-72de-4520-bcd3-1d30832edfcb" /> |
### Module Browser
| Light | Dark |
|-------|------|
| <img width="640" alt="module-browser" src="https://github.com/user-attachments/assets/60d78c57-3a86-4658-a697-06d81b880b5b" /> | <img width="640" alt="module-browser-dark" src="https://github.com/user-attachments/assets/3cbd41b0-dbf4-4456-b823-95ef32fc8a78" /> |
### Connection Builder
| Light | Dark |
|-------|------|
| <img width="640" alt="connection-builder" src="https://github.com/user-attachments/assets/39d15d12-624c-4cce-86a7-f7d3028a1230" /> | <img width="640" alt="connection-builder-dark" src="https://github.com/user-attachments/assets/7ac3eec0-ce73-483e-b92f-c4c058b48568" /> |
### Explore by Category
| Light | Dark |
|-------|------|
| <img width="640" alt="explore-categories" src="https://github.com/user-attachments/assets/3c8c10da-6741-41b5-9da3-9eb437ae27c9" /> | <img width="640" alt="explore-categories-dark" src="https://github.com/user-attachments/assets/04500e2d-dc65-4b5c-8869-fb351e5d1a91" /> |
### Statistics
| Light | Dark |
|-------|------|
| <img width="640" alt="stats" src="https://github.com/user-attachments/assets/068e5667-a121-4fb9-83e7-950c97d814a9" /> | <img width="640" alt="stats-dark" src="https://github.com/user-attachments/assets/a409f154-cac0-4520-9371-07be1deafe3c" /> |
## Motivation
With [AIP-95](https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-95+Provider+lifecycle+update+proposal) approved, Airflow now has a formal provider lifecycle (incubation, production, mature, deprecated). That opens the door for accepting more community-built providers and giving them an official home, while setting clear expectations about maturity and support. But lifecycle stages only work if users can actually see them. Right now there's no place on `airflow.apache.org` where someone can browse providers, check their lifecycle stage, or discover what modules they ship.
This registry fills that gap:
1. **Surface governance visibly** — AIP-95 lifecycle stages are first-class citizens in the discovery experience (badges, filters, explore by stage)
2. **Stay in sync automatically** — generates directly from `provider.yaml` files in the repo, no separate data pipeline or manual curation
3. **Community-owned** — an official Apache project resource on `airflow.apache.org`
## Architecture
```
provider.yaml + source code (providers/*/)
│
▼
extract_metadata.py ← AST-parses Python files, fetches PyPI stats
│
▼
registry/src/_data/
├── providers.json ← 99 providers with metadata, quality scores
├── modules.json ← 840 modules with import paths, docstrings
└── search-index.json ← Pagefind custom records
│
▼
Eleventy build ← Generates 2,740 static HTML pages
│
▼
Pagefind postbuild ← Builds search index from custom records
│
▼
S3 sync + CloudFront ← registry-build.yml workflow
```
Four Python extraction scripts run at build time:
| Script | What it does | Runs in |
|--------|-------------|---------|
| `extract_metadata.py` | Parses provider.yaml, AST-parses source for class names/docstrings, fetches PyPI stats and release dates | CI (host Python) |
| `extract_versions.py` | Reads older provider versions from git tags | CI (host Python) |
| `extract_parameters.py` | Inspects constructor signatures via runtime import | Breeze (needs provider packages installed) |
| `extract_connections.py` | Extracts connection form fields from provider.yaml + hook classes | Breeze (needs provider packages installed) |
The site itself is vanilla HTML/CSS/JS built with [Eleventy](https://www.11ty.dev/) — no React, no bundler. Search uses Pagefind (client-side, loads lazily on first search interaction). Fonts are self-hosted (Plus Jakarta Sans, JetBrains
Mono).
## Design decisions worth calling out
**Why AST parsing instead of runtime import?** `extract_metadata.py` runs on the CI host without installing 100+ provider packages. It reads `.py` files and extracts class names, base classes, and docstrings from
the AST. This means it works with just `pyyaml` as a dependency. The trade-off: it can't resolve dynamic class definitions or runtime-computed attributes. For the 99 providers currently in the repo, AST parsing
captures everything.
**Why four separate scripts?** `extract_parameters.py` and `extract_connections.py` need runtime access to provider classes (to inspect `__init__` signatures and call `get_connection_form_widgets()`). They run
inside Breeze where all providers are installed. `extract_metadata.py` and `extract_versions.py` only need filesystem access and run on the host. Keeping them separate means the CI workflow can run the fast
scripts (metadata) without spinning up Breeze, while parameter/connection extraction is a separate optional step.
**Why Eleventy?** Static site generators produce zero-JS pages by default. The registry works without JavaScript — filtering and search are layered on top progressively. Eleventy also has no opinion on frontend
frameworks, which keeps the dependency surface small (the lockfile has ~30 packages total).
**Path prefix handling:** The site deploys at `/registry/` on airflow.apache.org but runs at `/` during local dev. Eleventy's `pathPrefix` config handles this via the `REGISTRY_PATH_PREFIX` env var. Templates use
the `| url` filter, and client-side JS reads `window.__REGISTRY_BASE__` (injected in `base.njk`).
**Module filtering:** The extraction script filters classes based on type-specific suffix patterns (e.g. `Operator`, `Hook`, `Sensor` suffixes for their respective types) and base class inheritance. This avoids
indexing helper classes, dataclasses, and exceptions that happen to live in operator/hook modules.
## What's NOT changed
- No modifications to any provider package code
- No changes to `provider.yaml` schema
- No changes to core Airflow code
- No new Python dependencies for Airflow itself
- No changes to existing documentation build process
## What's NOT included (future work)
- [ ] `apache/airflow-site` PR for `.htaccess` rewrite and nav link
- [x] `apache/airflow-site-archive` — update `s3-to-github.yml` and `github-to-s3.yml` workflows to sync the `registry/` S3 prefix (same pattern as `docs/`)
- [ ] Third-party provider support (Cosmos, Great Expectations, etc.) — will use `.airflow-registry.yaml` files in provider repos. ([Example Integration](https://github.com/astronomer/astronomer-cosmos/pull/2387))
- [ ] LLM-friendly exports (`llms.txt`) and "Copy for AI" buttons
- [ ] Redirect `registry.astronomer.io` traffic once the official registry is stable
- [ ] Explicit categories in `provider.yaml` (replacing keyword matching)
- [ ] Version changelog/diff on provider detail pages
- [ ] Example DAGs for Providers
- [ ] Integration with pre-commit/prek checks
## How to test locally
```bash
# 1. Extract metadata
uv run python dev/registry/extract_metadata.py
# 2. Install Node dependencies
cd registry && pnpm install
# 3. Start dev server at http://localhost:8080
pnpm dev
```