Skip to content

Architecture

This section illustrates how the components fit together architecturally. That includes the simulation model and its relation to other pieces of the pipeline.

graph TD
    QP[Quantum Pipeline] -->|Publish Results| KAFKA[Kafka]
    QP -->|Export Metrics| PROM[Prometheus + Grafana]
    KAFKA <-->|Validate| SR[Schema Registry]
    KAFKA -->|S3 Sink| RC[Redpanda Connect]
    RC -->|Write| GARAGE[Garage S3]
    AIRFLOW[Airflow] -->|Schedule| SPARK[Spark]
    SPARK -->|Read| GARAGE
    SPARK -->|Write| ICE[Iceberg Tables]

    style QP fill:#c5cae9,color:#1a237e
    style KAFKA fill:#ffe082,color:#000
    style SR fill:#ffe082,color:#000
    style RC fill:#ffe082,color:#000
    style AIRFLOW fill:#90caf9,color:#0d47a1
    style SPARK fill:#a5d6a7,color:#1b5e20
    style GARAGE fill:#b39ddb,color:#311b92
    style ICE fill:#b39ddb,color:#311b92
    style PROM fill:#e8f5e9,color:#1b5e20
Hold "Alt" / "Option" to enable pan & zoom

Project runs VQE simulations, streams results to Kafka, and processes them into Iceberg feature tables via Spark. Airflow orchestrates the batch jobs. Prometheus and Grafana provide observability.

Section Guide

  • System Design


    Component-by-component breakdown: simulation module, Kafka integration, Garage storage, Airflow DAGs, Spark processing, and the incremental feature engineering pipeline.

    System Design

  • Data Flow


    How data moves through the system, stage by stage - from molecule JSON input through VQE simulation, Kafka streaming, Spark batch processing, to Iceberg tables. Includes an end-to-end \(H_2\) example.

    Data Flow

  • Serialization


    Wire format between the producer and Kafka (Avro), schema registry integration, schema definitions, and the JSON vs Avro storage choice depending on which connector you use.

    Serialization