System Design¶

Component-by-component breakdown of the stack.

Quantum Pipeline Module¶

The simulation module runs VQE using Qiskit Aer (or IBM Quantum). It is structured around ABC base classes (Runner, Solver, Mapper) with concrete implementations:

VQERunner - orchestrates the full pipeline per molecule
VQESolver - ansatz construction, circuit preparation, optimization loop
JordanWignerMapper - fermionic-to-qubit operator mapping

By default VQERunner.run() processes every molecule in the input file. Passing --molecule-index restricts a run to a single molecule. That single-molecule run is the per-container unit the vqe_batch_generation batch job fans out across hardware lanes.

Execution Flow¶

sequenceDiagram
    participant User
    participant VQERunner
    participant VQESolver
    participant QiskitAer
    participant Monitor
    participant Prometheus
    participant Kafka

    User->>VQERunner: Start VQE Simulation
    VQERunner->>VQERunner: Load Molecule
    VQERunner->>VQERunner: Build Hamiltonian
    Note over VQERunner: Track hamiltonian_time

    VQERunner->>VQERunner: Map to Qubits
    Note over VQERunner: Track mapping_time

    VQERunner->>Monitor: Start Performance Monitoring
    Monitor->>Prometheus: Export System Metrics

    VQERunner->>VQESolver: solve()
    Note over VQESolver: Build ansatz, initial parameters
    loop Each Optimizer Iteration (scipy.optimize.minimize)
        VQESolver->>QiskitAer: Evaluate energy (EstimatorV2)
        QiskitAer->>VQESolver: Energy and standard deviation
        VQESolver->>VQESolver: Record iteration (parameters, energy)
    end
    Note over VQERunner: Track vqe_time

    VQESolver->>VQERunner: Return VQEResult
    VQERunner->>VQERunner: _collect_accuracy_metrics()
    VQERunner->>VQERunner: _build_metrics_data()
    Note over VQERunner: Calculate total_time

    VQERunner->>Kafka: _stream_result() to experiment.vqe
    Monitor->>Prometheus: Export Final Metrics
    VQERunner->>User: Simulation Complete

Hold "Alt" / "Option" to enable pan & zoom

VQERunner Methods¶

Method	What it does
`_process_molecule()`	Full VQE execution for a single molecule
`_collect_accuracy_metrics()`	Compares VQE energy against HF reference
`_build_metrics_data()`	Constructs the metrics dict for Prometheus export
`_stream_result()`	Sends a decorated VQE result to Kafka

Data Collected Per Simulation¶

Per iteration: parameters, energy, standard deviation, energy delta, parameter delta norm, cumulative minimum energy
Timing: hamiltonian construction, Jordan-Wigner mapping, VQE optimization, total wall time
Molecule info: atomic symbols, coordinates, charge, multiplicity, basis set
System metrics: CPU usage, memory consumption (exported to Prometheus)
Accuracy: HF reference energy comparison, error in millihartree, accuracy score

The result structure is documented in Serialization - Schema Structure.

Metrics Export¶

Each simulation container exports metrics to Prometheus PushGateway via PerformanceMonitor:

graph LR
    QC1[Container 1] -->|HTTP POST| PG[PushGateway]
    QC2[Container 2] -->|HTTP POST| PG
    QC3[Container 3] -->|HTTP POST| PG
    PG -->|Scrape| PROM[Prometheus]
    PROM -->|Query| GRAF[Grafana]

    style QC1 fill:#c5cae9,color:#1a237e
    style QC2 fill:#c5cae9,color:#1a237e
    style QC3 fill:#c5cae9,color:#1a237e
    style PG fill:#e8f5e9,color:#1b5e20
    style PROM fill:#e8f5e9,color:#1b5e20
    style GRAF fill:#e8f5e9,color:#1b5e20

Hold "Alt" / "Option" to enable pan & zoom

VQE metrics: qp_vqe_total_time, qp_vqe_hamiltonian_time, qp_vqe_mapping_time, qp_vqe_vqe_time, qp_vqe_minimum_energy, qp_vqe_iterations_count, qp_vqe_optimal_parameters_count

Accuracy metrics: qp_vqe_reference_energy, qp_vqe_energy_error_hartree, qp_vqe_energy_error_millihartree, qp_vqe_accuracy_score

Derived: qp_vqe_iterations_per_second, qp_vqe_time_per_iteration, qp_vqe_overhead_ratio, qp_vqe_efficiency, qp_vqe_setup_ratio

System: qp_sys_cpu_percent, qp_sys_cpu_load_1m, qp_sys_memory_percent, qp_sys_memory_used_bytes, qp_sys_uptime_seconds

For dashboard configuration, see Monitoring.

Apache Kafka Integration¶

All VQE results are published to a single Kafka topic: experiment.vqe. The schema subject is experiment.vqe-value. For serialization details, see Serialization.

The Schema Registry supports running multiple versions of the simulation in parallel - each registers its own schema version, and consumers decode using the schema ID embedded in each message:

sequenceDiagram
    participant P1 as Producer v1
    participant P2 as Producer v2
    participant K as Kafka
    participant SR as Schema Registry

    P1->>SR: Register Schema v1
    SR-->>P1: Schema ID: 101

    P2->>SR: Register Schema v2
    SR-->>P2: Schema ID: 102

    par Parallel Production
        P1->>K: Publish to experiment.vqe
        P2->>K: Publish to experiment.vqe
    end

    Note over K: Consumer handles both versions via schema ID

Hold "Alt" / "Option" to enable pan & zoom

Producer configuration is in ProducerConfig (servers, topic, retries, acks, timeout). Security options (SSL, SASL) are in SecurityConfig.

Garage Storage¶

Garage is a lightweight, Rust-based S3-compatible object store that replaced MinIO from v1.x. Data flows from Kafka to Garage via Redpanda Connect.

graph LR
    K[Kafka Topic:<br/>experiment.vqe] -->|Stream| RC[Redpanda Connect]
    RC <-->|Decode| SR[Schema Registry]
    RC -->|S3 PutObject| GARAGE[Garage<br/>raw-results]

    style K fill:#ffe082,color:#000
    style RC fill:#ffe082,color:#000
    style SR fill:#ffe082,color:#000
    style GARAGE fill:#b39ddb,color:#311b92

Hold "Alt" / "Option" to enable pan & zoom

Redpanda Connect Configuration¶

The Redpanda Connect config is at compose/redpanda-connect.yaml. It consumes from experiment.vqe, decodes Avro via schema_registry_decode, and writes JSON files to the raw-results bucket. For the Kafka Connect alternative (Avro at rest), see Serialization - Overview.

Directory Structure¶

Raw results written by Redpanda Connect:

s3://raw-results/
  experiments/
    experiment.vqe/
      1-1774375957377835260.json
      2-1774887616244742408.json
      ...

File names follow {counter}-{timestamp_unix_nano}.json. The default Redpanda Connect config writes every file flat under the topic directory, with no Hive-style partitioning (see the path template in compose/redpanda-connect.yaml).

Iceberg feature tables (written by Spark):

s3://features/
  warehouse/
    quantum_features/
      vqe_results/
        metadata/
        data/
      molecules/
        metadata/
        data/
      ...

Each table has Iceberg's metadata/ (snapshots, manifests) and data/ (Parquet files) subdirectories.

Tip

Garage is S3 compatible. Use aws configure --profile garage with your S3_ACCESS_KEY / S3_SECRET_KEY from .env to browse files with aws s3.

Apache Airflow Orchestration¶

Airflow runs 4 DAGs using CeleryExecutor with Redis as the broker. Shared configuration lives in docker/airflow/common/ (S3 paths, catalog names, default args, Spark session factory).

graph LR
    subgraph "Airflow"
        DAG1["quantum_feature_processing"]
        DAG2["quantum_ml_feature_processing"]
        DAG3["vqe_batch_generation"]
        DAG4["r2_sync"]
    end

    SPARK[Spark Cluster]
    GARAGE[(Garage)]

    DAG1 -->|SparkSubmitOperator| SPARK
    DAG2 -->|SparkSubmitOperator| SPARK
    SPARK -->|read/write| GARAGE

    style DAG1 fill:#90caf9,color:#0d47a1
    style DAG2 fill:#90caf9,color:#0d47a1
    style DAG3 fill:#90caf9,color:#0d47a1
    style DAG4 fill:#90caf9,color:#0d47a1
    style SPARK fill:#a5d6a7,color:#1b5e20
    style GARAGE fill:#ffe082,color:#e65100

Hold "Alt" / "Option" to enable pan & zoom

DAGs¶

DAG	Schedule	What it does	Source
`quantum_feature_processing`	Daily	Reads raw data from Garage, transforms into 9 normalized Iceberg tables	`quantum_processing_dag.py`
`quantum_ml_feature_processing`	Daily	Joins normalized tables into 2 ML-ready feature tables. Waits for upstream via `ExternalTaskSensor`	`quantum_ml_feature_dag.py`
`vqe_batch_generation`	Manual	Builds Docker images, runs batch VQE generation script. Trigger conf: `{"tier": N}`	`vqe_batch_generation_dag.py`
`r2_sync`	Manual by default, or set the `R2_SYNC_SCHEDULE` Airflow Variable	Waits for `quantum_ml_feature_processing` via an `ExternalTaskSensor`, health-checks rclone, then syncs ML feature Parquet from Garage to Cloudflare R2	`r2_sync_dag.py`

Execution Flow¶

sequenceDiagram
    participant S as Scheduler
    participant SSO as SparkSubmitOperator
    participant SM as Spark Master
    participant SW as Spark Workers
    participant G as Garage
    participant I as Iceberg

    Note over S: Daily at 00:00
    S->>SSO: quantum_feature_processing

    SSO->>SM: Submit PySpark job
    SM->>SW: Distribute tasks

    SW->>G: Read JSON files
    SW->>SW: Transform to feature tables
    SW->>I: Write to Iceberg
    I->>G: Persist Parquet + metadata

    SW->>SM: Done
    SSO->>S: Task success

    Note over S: quantum_ml_feature_processing
    S->>SSO: ExternalTaskSensor satisfied
    SSO->>SM: Submit ML feature job
    SM->>SW: Join normalized tables
    SW->>I: Write ml_iteration_features + ml_run_summary
    SSO->>S: Task success

Hold "Alt" / "Option" to enable pan & zoom

Incremental Processing¶

Only new data is processed on each run. The Spark scripts use an anti-join on key columns to identify records not yet in the target Iceberg table, then append only those.

graph LR
    RAW[(Garage<br/>Raw JSON)]
    META[Iceberg Metadata]

    subgraph "Spark"
        FILTER[Identify new records]
        PROC[Compute features]
        FILTER -->|new only| PROC
    end

    FEAT[(Garage<br/>Feature tables)]

    RAW -->|list files| FILTER
    META -->|existing keys| FILTER
    PROC --> FEAT
    PROC -->|update snapshot| META

    style FILTER fill:#ffe082,color:#000
    style RAW fill:#90caf9,color:#0d47a1
    style FEAT fill:#a5d6a7,color:#1b5e20
    style META fill:#b39ddb,color:#311b92

Hold "Alt" / "Option" to enable pan & zoom

The deduplication logic is in identify_new_records() and the write logic in process_incremental_data(). Each write is tagged with a version (v_{batch_id} or v_incr_{batch_id}), enabling Iceberg time-travel queries.

Feature Tables¶

The first Spark job (quantum_incremental_processing.py) produces 9 normalized tables:

Table	Key Columns	Partition	Purpose
`molecules`	experiment_id, molecule_id	processing_date	Geometry, symbols, masses, charge
`ansatz_info`	experiment_id, molecule_id	processing_date, basis_set	QASM circuit, repetitions
`performance_metrics`	experiment_id, molecule_id, basis_set	processing_date, basis_set	Timing breakdown
`vqe_results`	experiment_id, molecule_id, basis_set	processing_date, basis_set, backend	Energy, iterations, optimizer
`initial_parameters`	parameter_id	processing_date, basis_set	Starting parameter values (exploded)
`optimal_parameters`	parameter_id	processing_date, basis_set	Best parameter values (exploded)
`vqe_iterations`	iteration_id	processing_date, basis_set, backend	Per-step energy + std
`iteration_parameters`	parameter_id	processing_date, basis_set	Per-step parameter values (exploded)
`hamiltonian_terms`	term_id	processing_date, basis_set, backend	Pauli terms with coefficients

The second Spark job (quantum_ml_feature_processing.py) joins these into 2 ML-ready tables:

Table	Purpose
`ml_iteration_features`	Per-iteration feature vectors for convergence prediction
`ml_run_summary`	Per-run aggregate features for energy estimation

All tables live under quantum_catalog.quantum_features (Hadoop catalog backed by Garage).

Monitoring Stack¶

Source	Exporter	Prometheus Target
VQE containers	PushGateway	`pushgateway:9091`
Airflow	statsd-exporter	`airflow-statsd-exporter:9102`
PostgreSQL	postgres-exporter	`postgres-exporter:9187`
Redis	redis-exporter	`redis-exporter:9121`
GPU	nvidia-gpu-exporter	`nvidia-gpu-exporter:9835`

Grafana dashboards at http://grafana:3000. For configuration details, see Monitoring.

ML Modules¶

The quantum_pipeline/ml/ package contains ML modules that are not yet integrated into the core VQE flow:

convergence_predictor - predicts whether a VQE run will converge based on early iteration features
energy_estimator - estimates final ground-state energy from partial optimization trajectories
preprocessing - feature extraction utilities for ML model training
tracking - MLflow experiment tracking integration

These are intended for the next phase of the project (ML predictive model PoC).