Database Schema
Chantal uses SQLAlchemy ORM with support for PostgreSQL and SQLite.
Overview
The database stores:
Content item metadata (packages of any type)
Repository metadata/installer files
Repository state
Snapshots
Views (virtual repositories) and view snapshots
Sync history
All models are defined in src/chantal/db/models.py.
Design: Generic Content Items
Chantal does not have a type-specific Package model. Instead, all package
types (RPM, Helm, PyPI, APT/DEB, Alpine APK, etc.) are stored in a single generic
ContentItem model. Type-specific attributes are kept in a JSON column
(content_metadata) rather than as dedicated columns. This keeps the schema
stable when new content types are added: a new plugin only needs to agree on the
JSON layout for its content_type.
For RPM, attributes like epoch, release and arch live inside
content_metadata and are exposed through read-only @property accessors on the
model (including a derived nevra string).
Core Models
ContentItem
Generic, content-addressed model for all package types (table content_items).
class ContentItem(Base):
id: int # Primary key (auto-increment)
content_type: str # 'rpm', 'helm', 'pypi', 'apt', 'apk', ... (indexed)
name: str # Package name (e.g., "nginx"), indexed
version: str # Version, indexed
sha256: str # Content address - UNIQUE, indexed (NOT the PK)
size_bytes: int # File size in bytes
pool_path: str # Relative path in pool (e.g., "content/ab/cd/<sha>_file.rpm")
filename: str # Original filename
content_metadata: dict # JSON; type-specific fields (NOT called 'metadata'
# because that name is reserved by SQLAlchemy)
created_at: datetime
reference_count: int # For garbage collection
# Relationships (many-to-many)
repositories: list[Repository] # via repository_content_items
snapshots: list[Snapshot] # via snapshot_content_items
Key features:
Integer
idis the primary key;sha256is a separate unique indexed column.Deduplication is enforced by the unique
sha256constraint.Composite indexes:
(content_type, name)and(content_type, name, version).
RPM-specific properties (read JSON from content_metadata):
item.epoch # content_metadata["epoch"] (rpm only)
item.release # content_metadata["release"] (rpm only)
item.arch # content_metadata["arch"] (rpm/deb)
item.nevra # "name-epoch:version-release.arch" (rpm only)
Repository
Configured repositories from YAML (table repositories).
class Repository(Base):
id: int # Primary key (auto-increment)
repo_id: str # Stable config identifier - UNIQUE
name: str # Human-readable name
type: str # Repository type (rpm, apt, helm, apk)
feed: str # Upstream URL (Pulp terminology, NOT "feed_url")
enabled: bool
mode: RepositoryMode # Enum: mirror / filtered / hosted (default: filtered)
# Paths
latest_path: str | None
snapshots_path: str | None
# Timestamps
created_at: datetime
updated_at: datetime
# Sync state
last_sync_at: datetime | None
last_sync_status: str | None # success, failed, running
# Relationships
snapshots: list[Snapshot] # One-to-many
sync_history: list[SyncHistory] # One-to-many
content_items: list[ContentItem] # Many-to-many via repository_content_items
repository_files: list[RepositoryFile] # Many-to-many via repository_repository_files
RepositoryMode
class RepositoryMode(enum.StrEnum):
MIRROR = "mirror" # Full mirror, no filtering, metadata unchanged
FILTERED = "filtered" # Filtered packages with customized metadata (default)
HOSTED = "hosted" # Self-hosted / uploaded packages (no upstream sync)
The mode column was added in the head migration e190d159daac.
ContentItem vs. RepositoryFile
A repository is more than just its packages. Metadata files (updateinfo.xml,
filelists.xml, comps.xml, modules.yaml), signatures, and installer/kickstart
artifacts (vmlinuz, initrd.img, .treeinfo) are stored separately in the
RepositoryFile model. Both ContentItem and RepositoryFile use
content-addressed pool storage, but they live in different pool subdirectories
(pool/content/ vs. pool/files/).
RepositoryFile
Non-package repository files: metadata, signatures, installer/kickstart artifacts
(table repository_files). Added in migration 4ae7cc6b7243.
class RepositoryFile(Base):
id: int # Primary key (auto-increment)
file_category: str # "metadata", "signature", "kickstart", "debian-installer" (indexed)
file_type: str # "updateinfo", "filelists", "comps", "modules",
# "vmlinuz", "initrd", ".treeinfo", ... (indexed)
sha256: str # Content address (indexed, not unique)
pool_path: str # e.g. "files/ab/cd/<sha>_updateinfo.xml.gz"
size_bytes: int
original_path: str # Exact upstream path to preserve when publishing
# e.g. "repodata/<hash>-updateinfo.xml.gz", ".treeinfo"
file_metadata: dict | None # JSON; type-specific info (reserved-name workaround)
created_at: datetime
updated_at: datetime
# Relationships (many-to-many, like ContentItem)
repositories: list[Repository] # via repository_repository_files
snapshots: list[Snapshot] # via snapshot_repository_files
Snapshot
Immutable point-in-time repository state (table snapshots).
class Snapshot(Base):
id: int # Primary key (auto-increment)
repository_id: int # Foreign key to Repository.id
name: str # Snapshot name (e.g., "2025-01")
description: str | None
created_at: datetime
is_published: bool
published_path: str | None
# Cached statistics
package_count: int
total_size_bytes: int
# Relationships
repository: Repository
content_items: list[ContentItem] # Many-to-many via snapshot_content_items
repository_files: list[RepositoryFile] # Many-to-many via snapshot_repository_files
Unique constraint: (repository_id, name) (uq_snapshot_name).
View
Virtual repository combining multiple repositories (table views).
class View(Base):
id: int # Primary key (auto-increment)
name: str # UNIQUE, indexed
description: str | None
repo_type: str # All repos in a view must share this type (rpm, apt)
created_at: datetime
updated_at: datetime
is_published: bool
published_at: datetime | None
published_path: str | None
# Relationships
view_repositories: list[ViewRepository] # One-to-many (ordered membership)
view_snapshots: list[ViewSnapshot] # One-to-many
ViewRepository
Membership of repositories in a view, with ordering. This is a full ORM model
(table view_repositories), not a plain association table.
class ViewRepository(Base):
id: int # Primary key (auto-increment)
view_id: int # Foreign key to View.id
repository_id: int # Foreign key to Repository.id
order: int # Precedence for metadata merging (lower = higher priority)
added_at: datetime
# Relationships
view: View
repository: Repository
Unique constraint: (view_id, repository_id) (uq_view_repository).
ViewSnapshot
Atomic snapshot of all repositories in a view (table view_snapshots). It does
not use a junction table; instead it stores the included repository snapshot
IDs directly in a JSON array.
class ViewSnapshot(Base):
id: int # Primary key (auto-increment)
view_id: int # Foreign key to View.id
name: str # Snapshot name, indexed
description: str | None
created_at: datetime
snapshot_ids: list[int] # JSON array of Snapshot.id values (e.g. [12, 45, 67])
is_published: bool
published_at: datetime | None
published_path: str | None
# Cached statistics
package_count: int
total_size_bytes: int
# Relationships
view: View
Unique constraint: (view_id, name) (uq_view_snapshot_name).
SyncHistory
Tracks sync operations for audit and debugging (table sync_history).
class SyncHistory(Base):
id: int # Primary key (auto-increment)
repository_id: int # Foreign key to Repository.id
# Sync timing
started_at: datetime
completed_at: datetime | None
# Result
status: str # running, success, failed
error_message: str | None
# Statistics
packages_added: int
packages_removed: int
packages_updated: int
bytes_downloaded: int
# Snapshot created during this sync (optional)
snapshot_id: int | None # Foreign key to Snapshot.id
# Relationships
repository: Repository
snapshot: Snapshot | None
duration_seconds is a derived @property (completed_at - started_at).
Junction Tables
The following are plain SQLAlchemy association Tables (no ORM class):
repository_content_items
Many-to-many between repositories and content items (the “latest” state).
repository_content_items = Table(
"repository_content_items",
Column("repository_id", ForeignKey("repositories.id"), primary_key=True),
Column("content_item_id", ForeignKey("content_items.id"), primary_key=True),
Column("added_at", DateTime),
)
snapshot_content_items
Many-to-many between snapshots and content items.
snapshot_content_items = Table(
"snapshot_content_items",
Column("snapshot_id", ForeignKey("snapshots.id"), primary_key=True),
Column("content_item_id", ForeignKey("content_items.id"), primary_key=True),
)
repository_repository_files
Many-to-many between repositories and repository files (the “latest” state).
repository_repository_files = Table(
"repository_repository_files",
Column("repository_id", ForeignKey("repositories.id"), primary_key=True),
Column("repository_file_id", ForeignKey("repository_files.id"), primary_key=True),
Column("added_at", DateTime),
)
snapshot_repository_files
Many-to-many between snapshots and repository files.
snapshot_repository_files = Table(
"snapshot_repository_files",
Column("snapshot_id", ForeignKey("snapshots.id"), primary_key=True),
Column("repository_file_id", ForeignKey("repository_files.id"), primary_key=True),
)
Note: View-to-repository membership is the ORM model
ViewRepository(see above), and view snapshots reference repository snapshots through theViewSnapshot.snapshot_idsJSON array — there is noview_repositoriesplain table and noview_snapshot_snapshotsjunction table.
Indexes
Indexes declared on the models:
content_items:
- content_type (column index)
- name (column index)
- version (column index)
- sha256 (unique index)
- idx_content_type_name (content_type, name)
- idx_content_type_name_version (content_type, name, version)
repository_files:
- file_category, file_type, sha256 (column indexes)
- idx_repo_file_category (file_category)
- idx_repo_file_type (file_type)
repositories:
- repo_id (unique)
snapshots:
- name (column index)
views / view_snapshots:
- name (column index)
Database Migrations
Chantal uses Alembic for schema migrations. Migration scripts live in
alembic/versions/.
Migration Files
alembic/
├── versions/
│ ├── 20260110_2202_a4a922fdfc63_initial_schema_with_generic_content_.py
│ ├── 20260111_1138_4ae7cc6b7243_add_repository_files_table_and_.py
│ └── 20260111_1242_e190d159daac_add_repository_mode_field.py
└── env.py
Revision chain (head is e190d159daac):
a4a922fdfc63 (initial schema, generic ContentItem)
→ 4ae7cc6b7243 (add repository_files table + junctions)
→ e190d159daac (add Repository.mode field) ← head
Run Migrations
# Upgrade to latest
alembic upgrade head
# Downgrade one version
alembic downgrade -1
# Show current version
alembic current
Example Queries
Find all content items in a repository
repo = session.query(Repository).filter_by(repo_id="epel9-vim").first()
items = repo.content_items
Find all repositories containing a content item
item = session.query(ContentItem).filter_by(sha256="f256abc...").first()
repos = item.repositories
Create snapshot
snapshot = Snapshot(
repository_id=repo.id,
name="2025-01",
description="January 2025",
)
snapshot.content_items = list(repo.content_items) # Copy current items
snapshot.repository_files = list(repo.repository_files)
session.add(snapshot)
session.commit()
Find content items added between snapshots
snapshot1 = session.query(Snapshot).filter_by(name="2025-01").first()
snapshot2 = session.query(Snapshot).filter_by(name="2025-02").first()
added = set(snapshot2.content_items) - set(snapshot1.content_items)
removed = set(snapshot1.content_items) - set(snapshot2.content_items)
Database Backends
SQLite (Development)
Connection string:
database:
url: sqlite:///chantal.db
Pros:
No external service
Easy setup
Good for testing
Cons:
Limited concurrency
Not suitable for large-scale
PostgreSQL (Production)
Connection string:
database:
url: postgresql://chantal:password@localhost/chantal
Pros:
Better performance
Concurrent access
Better for large datasets
Cons:
Requires PostgreSQL installation
More complex setup
Maintenance
Vacuum (SQLite)
sqlite3 /var/lib/chantal/chantal.db "VACUUM;"
Analyze (PostgreSQL)
ANALYZE;
Cleanup Orphaned Content Items
# Find content items not in any repository
orphaned = session.query(ContentItem).filter(
~ContentItem.repositories.any()
).all()
Schema Diagram
┌──────────────┐ ┌───────────────────────────┐ ┌──────────────┐
│ Repository │◄───►│ repository_content_items │◄───►│ ContentItem │
│ │ │ │ │ │
│ - id (PK) │ │ - repository_id │ │ - id (PK) │
│ - repo_id (U)│ │ - content_item_id │ │ - sha256 (U) │
│ - type │ │ - added_at │ │ - content_type│
│ - feed │ └───────────────────────────┘ │ - name │
│ - mode │ │ - version │
└──┬───────┬───┘ ┌───────────────────────────┐ │ - content_ │
│ │◄───────►│ repository_repository_files│◄──►│ metadata │
│ │ └───────────────────────────┘ │ └──────────────┘
│ │ │
│ ▼ ▼
│ ┌──────────────┐ ┌──────────────────────┐ ┌──────────────┐
│ │ Snapshot │◄►│ snapshot_content_items│◄►│ RepositoryFile│
│ │ - id (PK) │ └──────────────────────┘ │ - id (PK) │
│ │ - repo_id FK │ ┌──────────────────────┐ │ - sha256 │
│ │ - name │◄►│snapshot_repository_files│◄│ - file_category│
│ └──────────────┘ └──────────────────────┘ │ - original_path│
│ └──────────────┘
▼
┌──────────────┐ ┌──────────────────┐
│ SyncHistory │ │ View │
│ - id (PK) │ │ - id (PK) │
│ - repo_id FK │ │ - name (U) │
│ - status │ │ - repo_type │
└──────────────┘ └───┬──────────┬───┘
│ │
┌────────────▼───┐ ┌───▼──────────────┐
│ ViewRepository │ │ ViewSnapshot │
│ - id (PK) │ │ - id (PK) │
│ - view_id FK │ │ - view_id FK │
│ - repo_id FK │ │ - snapshot_ids │ (JSON array of Snapshot.id)
│ - order │ └──────────────────┘
└────────────────┘