Data Platform¶
Processes, transforms, and stores VQE simulation results into analytics-ready ML feature tables. For the full component-by-component architecture, see System Design.
-
Kafka Streaming
Single-topic ingestion with Schema Registry validation. Redpanda Connect decodes Avro messages and writes JSON to Garage. Covers producer config, wire format, and security options.
-
Spark Processing
Standalone Spark cluster transforms raw JSON into 9 base feature tables and 2 ML feature tables. Incremental processing via Iceberg metadata.
-
Airflow Orchestration
Four DAGs handle feature processing, ML materialization, batch generation, and R2 cloud sync. CeleryExecutor with Redis broker and PostgreSQL metadata.
-
Iceberg Storage
Iceberg table format on top of Garage (S3-compatible storage). ACID transactions, snapshot tagging, partition pruning, and time-travel queries.