Architecture Overview
Chantal follows a clean, modular architecture designed for simplicity and extensibility.
Core Principles
Content-Addressed Storage - SHA256-based deduplication
Plugin Architecture - Extensible repository type support
Database-Backed Metadata - Fast lookups, no re-scanning
Hardlink-Based Publishing - Zero-copy, instant publishing
No Daemons - Simple CLI tool, no background services
System Architecture
┌─────────────────────────────────────────────────────────┐
│ CLI Layer │
│ (Click commands: init, repo, snapshot, publish, etc.) │
└────────────────────┬────────────────────────────────────┘
│
┌────────────────────▼────────────────────────────────────┐
│ Core Layer │
│ • Config Management (Pydantic models) │
│ • Storage Manager (content-addressed pool) │
│ • Database Manager (SQLAlchemy ORM) │
└────────────────────┬────────────────────────────────────┘
│
┌────────────────────▼────────────────────────────────────┐
│ Plugin Layer │
│ • Sync Plugins (RPM, APT, Helm, APK) │
│ • Publisher Plugins (metadata generation) │
└────────────────────┬────────────────────────────────────┘
│
┌────────────────────▼────────────────────────────────────┐
│ External Systems │
│ • Upstream Repositories (HTTP/HTTPS) │
│ • Database (PostgreSQL/SQLite) │
│ • Filesystem (pool, published) │
└─────────────────────────────────────────────────────────┘
Components
CLI Layer
Technologies: Click (Python CLI framework)
Responsibilities:
Parse command-line arguments
Load configuration
Initialize services
Execute commands
Format output
Key Files:
src/chantal/cli/main.py- CLI entry point and command registrationsrc/chantal/cli/repo_commands.py- Repository management commandssrc/chantal/cli/snapshot_commands.py- Snapshot management commandssrc/chantal/cli/publish_commands.py- Publishing commandssrc/chantal/cli/view_commands.py- View management commandssrc/chantal/cli/content_commands.py- Content search and listingsrc/chantal/cli/db_commands.py- Database management commandssrc/chantal/cli/pool_commands.py- Storage pool management commands
Core Layer
Configuration Management
Technologies: Pydantic (validation), PyYAML (parsing)
Responsibilities:
Load and parse YAML configuration
Validate configuration structure
Provide type-safe configuration access
Handle configuration includes
Key Files:
src/chantal/core/config.py- Configuration modelsGlobalConfig,RepositoryConfig,FilterConfig
Storage Manager
Technologies: Python pathlib, SHA256 hashing
Responsibilities:
Content-addressed storage (2-level SHA256-based directories)
Deduplication (automatic via content addressing)
Hardlink creation for publishing
Pool statistics and verification
Key Files:
src/chantal/core/storage.py- StorageManager class
Storage Layout:
The pool is split into two subdirectories by pool type: content/ for packages
(ContentItem) and files/ for metadata/installer files (RepositoryFile).
pool/
├── content/ # ContentItem (packages)
│ ├── f2/
│ │ └── 56/
│ │ └── f256abc...def789_nginx-1.20.2-1.el9.x86_64.rpm
│ └── 95/
│ └── 05/
│ └── 9505484...c1264fde_nginx-module-njs-1.24.0.rpm
└── files/ # RepositoryFile (metadata/installer)
└── 56/
└── 78/
└── 5678abc..._updateinfo.xml.gz
Database Manager
Technologies: SQLAlchemy (ORM), Alembic (migrations)
Responsibilities:
Content (package) metadata storage
Repository metadata/installer file tracking
Repository state tracking
Snapshot management
Sync history
Junction tables for many-to-many relationships
Key Files:
src/chantal/db/models.py- SQLAlchemy modelssrc/chantal/db/connection.py- Database connection/session management
Database Models:
Repository- Configured repositories (withmode: mirror/filtered/hosted)ContentItem- Content-addressed packages (generic, all types)RepositoryFile- Content-addressed metadata/installer filesSnapshot- Immutable snapshotsView/ViewRepository/ViewSnapshot- Virtual repositories spanning reposSyncHistory- Sync tracking
Repository Modes:
Each Repository has a mode (RepositoryMode enum):
mirror- Full mirror of the upstream; no filtering, metadata unchanged.filtered- Filtered package set with customized/regenerated metadata (include/exclude rules, retention, etc.). This is the default.hosted- Self-hosted packages with no upstream sync. Content items are uploaded into the repository directly rather than fetched from a feed.
Plugin Layer
Sync Plugins
Technologies: Plain Python classes (no shared base class), Requests (HTTP)
Responsibilities:
Fetch repository metadata
Parse package lists
Apply filters
Download packages
Verify checksums
Plugins:
RpmSyncPlugin- RPM/DNF/YUM repositoriesAptSyncPlugin- Debian/Ubuntu APT repositoriesHelmSyncer- Helm chart repositories (HTTP and OCI)ApkSyncer- Alpine APK repositories
Key Files:
src/chantal/plugins/base.py-PublisherPluginbase class (publishers only; sync plugins are a convention, not a base class)src/chantal/plugins/rpm/sync.py- RPM sync implementationsrc/chantal/plugins/rpm/publisher.py- RPM publisher implementation
Publisher Plugins
Technologies: XML generation, compression (gzip, xz)
Responsibilities:
Generate repository metadata
Create hardlinks to pool
Compress metadata files
Sign repositories (future)
Key Files:
src/chantal/plugins/base.py-PublisherPlugininterfacesrc/chantal/plugins/rpm/publisher.py- RPM publisher (repomd.xml, primary.xml.gz)
Data Flow
Sync Workflow
1. User: chantal repo sync --repo-id example
│
2. CLI loads configuration
│
3. Identify repository type (RPM)
│
4. Load RpmSyncPlugin
│
5. Fetch repomd.xml from upstream
│
6. Parse primary.xml.gz (package list)
│
7. Apply filters (patterns, metadata, post-processing)
│
8. For each package:
│
├─> Calculate SHA256
├─> Check if exists in pool
├─> Download if missing
└─> Store in pool (f2/56/f256abc...rpm)
│
9. Update database (packages, repository associations)
│
10. Done!
Publish Workflow
1. User: chantal publish repo --repo-id example
│
2. Query database for repository packages
│
3. Load RpmPublisher plugin
│
4. Create target directory structure
│
5. Create hardlinks from pool to published/
│
6. Generate repomd.xml
│
7. Generate primary.xml.gz
│
8. Generate filelists.xml.gz (future)
│
9. Done! Repository ready to serve
Technology Stack
Runtime
Python 3.12+ - Project runtime requirement (
requires-python = ">=3.12"); hardlinks are created withos.link(), notPath.hardlink_to()SQLAlchemy - Database ORM
Alembic - Database migrations
Click - CLI framework
Requests - HTTP client
lxml - XML parsing
Pydantic - Configuration validation
PyYAML - YAML parsing
Development
pytest - Testing framework
black - Code formatting
ruff - Linting
mypy - Type checking
Database
PostgreSQL (production) - Recommended for large deployments
SQLite (development) - Simple, embedded database
Design Decisions
Why Content-Addressed Storage?
Alternatives considered:
Flat directory with all packages
Mirror upstream directory structure
Hash-based subdirectories (chosen)
Chosen approach:
2-level SHA256-based directories (f2/56/f256…)
Automatic deduplication
Efficient filesystem performance (65,536 buckets)
Why Database for Metadata?
Alternatives considered:
Scan filesystem on every operation
JSON/YAML metadata files
Database (chosen)
Chosen approach:
PostgreSQL/SQLite for metadata
Fast queries
Relationship management
Transactional integrity
Why Hardlinks for Publishing?
Alternatives considered:
Copy files (wastes space)
Symlinks (may break permissions)
Hardlinks (chosen)
Chosen approach:
Zero-copy (no disk space wasted)
Instant publishing (milliseconds)
Atomic updates
Preserves permissions
Performance Characteristics
Sync Performance
First sync: Limited by network bandwidth
Subsequent syncs: Fast (skip existing packages via SHA256 check)
Filter overhead: Minimal (in-memory regex matching)
Storage Efficiency
Typical deduplication: 60-80% across RHEL variants
Snapshot overhead: Near-zero (metadata only)
Publishing overhead: Zero (hardlinks)
Database Performance
SQLite: Good for <100K packages
PostgreSQL: Excellent for millions of packages
Indexes on: sha256, (name, arch), repo_id
Extensibility Points
Adding Repository Types
Write a sync plugin class following the sync convention (
sync_repository(session, repository) -> SyncResult; no base class)Implement the
PublisherPlugininterfaceAdd an
elifdispatch branch incli/repo_commands.pyandcli/publish_commands.py(there is no plugin registry)Add the new type to
RepositoryConfig.typevalidation
See Plugin System for details.
Adding Filter Types
Add to
FilterConfiginconfig.pyImplement filter logic in plugin
Add tests
Adding Output Formats
Add format option to CLI command
Implement formatter (JSON, CSV, etc.)
Update command output logic
Security Considerations
Certificate validation: Always verify SSL certificates
Checksum verification: All packages verified via SHA256 (integrity)
Signature verification (authenticity): GPG/OpenPGP verification of upstream metadata (RPM
repomd.xml.asc, APTInRelease/Release.gpg, RPM package header signatures) is implemented but OFF by default — enable it under the repository’sverify:block. Until enabled, a mirror is authenticated only by checksum, so a compromised or MITM’d upstream that serves a self-consistent metadata set is trusted. APT additionally rejects an expired (Valid-Untilin the past) Release when verification is enabled.Database injection: SQLAlchemy prevents SQL injection
Path traversal: All paths validated and normalized
Permissions: Follow principle of least privilege
Future Enhancements
Parallel downloads: Download multiple packages concurrently
Compression: Compress pool storage
Web UI: Read-only web interface
REST API: HTTP API for automation
Delta RPMs (drpm/prestodelta) are not planned — see the RPM plugin limitations.