Database Schema
Chantal uses SQLAlchemy ORM with support for PostgreSQL and SQLite.
Overview
The database stores:
Package metadata
Repository state
Snapshots
Views (virtual repositories)
Sync history
Core Models
Package
Stores package metadata with content-addressed storage (SHA256).
class Package(Base):
sha256: str # Primary key (content address)
filename: str # Original filename
size: int # File size in bytes
# RPM metadata
name: str # Package name (e.g., "nginx")
version: str # Version
release: str # Release
epoch: int # Epoch (default: 0)
architecture: str # Architecture (e.g., "x86_64")
# Timestamps
created_at: datetime
# Relationships
repositories: List[Repository] # Many-to-many
snapshots: List[Snapshot] # Many-to-many
Key features:
SHA256 as primary key (content-addressed)
Deduplication via unique SHA256
Many-to-many relationships with repositories and snapshots
Repository
Configured repositories from YAML.
class Repository(Base):
id: str # Primary key (from config)
name: str # Human-readable name
type: str # Repository type (rpm, apt, pypi)
feed_url: str # Upstream URL
enabled: bool # Whether enabled
# Timestamps
created_at: datetime
updated_at: datetime
last_sync_at: datetime
# Relationships
packages: List[Package] # Many-to-many
snapshots: List[Snapshot] # One-to-many
sync_history: List[SyncHistory] # One-to-many
Snapshot
Immutable point-in-time repository state.
class Snapshot(Base):
id: int # Primary key (auto-increment)
repository_id: str # Foreign key to Repository
name: str # Snapshot name (e.g., "2025-01")
description: str # Optional description
# Timestamps
created_at: datetime
# Relationships
repository: Repository
packages: List[Package] # Many-to-many
Unique constraint: (repository_id, name)
View
Virtual repository combining multiple repositories.
class View(Base):
name: str # Primary key (from config)
description: str # Human-readable description
# Timestamps
created_at: datetime
updated_at: datetime
# Relationships
repositories: List[Repository] # Many-to-many via ViewRepository
snapshots: List[ViewSnapshot] # One-to-many
ViewSnapshot
Snapshot of a view (references multiple repository snapshots).
class ViewSnapshot(Base):
id: int # Primary key (auto-increment)
view_name: str # Foreign key to View
name: str # Snapshot name (e.g., "2025-01")
description: str # Optional description
# Timestamps
created_at: datetime
# Relationships
view: View
snapshots: List[Snapshot] # Many-to-many
SyncHistory
Tracks sync operations for audit and debugging.
class SyncHistory(Base):
id: int # Primary key (auto-increment)
repository_id: str # Foreign key to Repository
# Sync details
started_at: datetime
completed_at: datetime
status: str # success, failed, partial
error_message: str # Error details (if failed)
# Statistics
packages_total: int
packages_downloaded: int
packages_skipped: int
bytes_transferred: int
# Relationships
repository: Repository
Junction Tables
repository_packages
Many-to-many relationship between repositories and packages.
repository_packages = Table(
'repository_packages',
Column('repository_id', ForeignKey('repositories.id')),
Column('package_sha256', ForeignKey('packages.sha256')),
Column('added_at', DateTime),
PrimaryKey('repository_id', 'package_sha256')
)
snapshot_packages
Many-to-many relationship between snapshots and packages.
snapshot_packages = Table(
'snapshot_packages',
Column('snapshot_id', ForeignKey('snapshots.id')),
Column('package_sha256', ForeignKey('packages.sha256')),
PrimaryKey('snapshot_id', 'package_sha256')
)
view_repositories
Many-to-many relationship between views and repositories.
view_repositories = Table(
'view_repositories',
Column('view_name', ForeignKey('views.name')),
Column('repository_id', ForeignKey('repositories.id')),
Column('order', Integer), # Repository order in view
PrimaryKey('view_name', 'repository_id')
)
view_snapshot_snapshots
Many-to-many relationship between view snapshots and repository snapshots.
view_snapshot_snapshots = Table(
'view_snapshot_snapshots',
Column('view_snapshot_id', ForeignKey('view_snapshots.id')),
Column('snapshot_id', ForeignKey('snapshots.id')),
PrimaryKey('view_snapshot_id', 'snapshot_id')
)
Indexes
Critical indexes for performance:
-- Package lookups
CREATE INDEX idx_packages_name_arch ON packages(name, architecture);
CREATE INDEX idx_packages_name ON packages(name);
-- Repository queries
CREATE INDEX idx_repositories_enabled ON repositories(enabled);
-- Snapshot queries
CREATE INDEX idx_snapshots_repo_name ON snapshots(repository_id, name);
-- Sync history
CREATE INDEX idx_sync_history_repo_started ON sync_history(repository_id, started_at);
Database Migrations
Chantal uses Alembic for schema migrations.
Migration Files
migrations/
├── versions/
│ ├── 001_initial_schema.py
│ ├── 002_add_views.py
│ └── 003_add_sync_history.py
└── env.py
Run Migrations
# Upgrade to latest
alembic upgrade head
# Downgrade one version
alembic downgrade -1
# Show current version
alembic current
Example Queries
Find all packages in repository
repo = session.query(Repository).filter_by(id="epel9-vim").first()
packages = repo.packages
Find all repositories containing a package
package = session.query(Package).filter_by(sha256="f256abc...").first()
repos = package.repositories
Create snapshot
snapshot = Snapshot(
repository_id="epel9-vim",
name="2025-01",
description="January 2025"
)
snapshot.packages = repo.packages # Copy current packages
session.add(snapshot)
session.commit()
Find packages added between snapshots
snapshot1 = session.query(Snapshot).filter_by(name="2025-01").first()
snapshot2 = session.query(Snapshot).filter_by(name="2025-02").first()
added = set(snapshot2.packages) - set(snapshot1.packages)
removed = set(snapshot1.packages) - set(snapshot2.packages)
Database Backends
SQLite (Development)
Connection string:
database:
url: sqlite:///chantal.db
Pros:
No external service
Easy setup
Good for testing
Cons:
Limited concurrency
Not suitable for large-scale
PostgreSQL (Production)
Connection string:
database:
url: postgresql://chantal:password@localhost/chantal
Pros:
Better performance
Concurrent access
Better for large datasets
Cons:
Requires PostgreSQL installation
More complex setup
Database Size Estimates
Typical sizes:
1,000 packages: ~1 MB (SQLite) / ~500 KB (PostgreSQL)
10,000 packages: ~10 MB / ~5 MB
100,000 packages: ~100 MB / ~50 MB
1,000,000 packages: ~1 GB / ~500 MB
With snapshots:
10 snapshots × 10,000 packages: ~15 MB (snapshots are metadata only)
Maintenance
Vacuum (SQLite)
sqlite3 /var/lib/chantal/chantal.db "VACUUM;"
Analyze (PostgreSQL)
ANALYZE;
Cleanup Orphaned Packages
# Find packages not in any repository
orphaned = session.query(Package).filter(
~Package.repositories.any()
).all()
Schema Diagram
┌─────────────┐ ┌──────────────────┐ ┌─────────────┐
│ Repository │◄─────►│ repository_ │◄─────►│ Package │
│ │ │ packages │ │ │
│ - id (PK) │ │ │ │ - sha256(PK)│
│ - name │ │ - repository_id │ │ - name │
│ - type │ │ - package_sha256 │ │ - version │
│ - feed_url │ │ - added_at │ │ - arch │
└─────┬───────┘ └──────────────────┘ └─────┬───────┘
│ │
│ │
▼ ▼
┌─────────────┐ ┌──────────────────┐ ┌─────────────┐
│ Snapshot │◄─────►│ snapshot_ │◄──────┤ │
│ │ │ packages │ │ │
│ - id (PK) │ │ │ │ │
│ - repo_id │ │ - snapshot_id │ │ │
│ - name │ │ - package_sha256 │ │ │
└─────────────┘ └──────────────────┘ └─────────────┘
┌─────────────┐ ┌──────────────────┐
│ View │◄─────►│ view_ │
│ │ │ repositories │
│ - name (PK) │ │ │
│ - desc │ │ - view_name │
└─────┬───────┘ │ - repository_id │
│ │ - order │
│ └──────────────────┘
▼
┌─────────────┐ ┌──────────────────┐
│ ViewSnapshot│◄─────►│ view_snapshot_ │
│ │ │ snapshots │
│ - id (PK) │ │ │
│ - view_name │ │ - view_snap_id │
│ - name │ │ - snapshot_id │
└─────────────┘ └──────────────────┘