Skip to content

Data Platform

Processes, transforms, and stores VQE simulation results into analytics-ready ML feature tables. For the full component-by-component architecture, see System Design.

  • Kafka Streaming


    Single-topic ingestion with Schema Registry validation. Redpanda Connect decodes Avro messages and writes JSON to Garage. Covers producer config, wire format, and security options.

    Kafka Streaming

  • Spark Processing


    Standalone Spark cluster transforms raw JSON into 9 base feature tables and 2 ML feature tables. Incremental processing via Iceberg metadata.

    Spark Processing

  • Airflow Orchestration


    Four DAGs handle feature processing, ML materialization, batch generation, and R2 cloud sync. CeleryExecutor with Redis broker and PostgreSQL metadata.

    Airflow Orchestration

  • Iceberg Storage


    Iceberg table format on top of Garage (S3-compatible storage). ACID transactions, snapshot tagging, partition pruning, and time-travel queries.

    Iceberg Storage