Quantum Pipeline¶
Overview¶
The Quantum Pipeline project is an extensible framework designed for exploring quantum algorithms. Currently, only Variational Quantum Eigensolver (VQE) is implemented. It combines quantum and classical computing to estimate the ground-state energy of molecular systems with a comprehensive data engineering pipeline.
The framework provides modules to handle algorithm orchestration, parametrization, monitoring, and data visualization. Data can be streamed via Apache Kafka for real-time processing, transformed into ML features using Apache Spark, and stored in Apache Iceberg tables for scalable analytics.
Key Features¶
Core Quantum Computing¶
- Molecule Loading - Load and validate molecular data from files
- Hamiltonian Preparation - Generate second-quantized Hamiltonians for molecular systems
- Quantum Circuit Construction - Create parameterized ansatz circuits with customizable repetitions
- VQE Execution - Solve Hamiltonians using the VQE algorithm with support for various optimizers
- Advanced Backend Options - Customize simulation parameters such as qubit count, shot count, and optimization levels
Data Engineering Pipeline¶
- Real-time Streaming - Stream simulation results to Apache Kafka with Avro serialization
- ML Feature Engineering - Transform quantum experiment data into ML features using Apache Spark
- Data Lake Storage - Store processed data in Apache Iceberg tables with versioning and time-travel
- Object Storage - Persist data using MinIO S3-compatible storage with automated backup
- Workflow Orchestration - Automate data processing workflows using Apache Airflow
Analytics and Visualization¶
- Visualization Tools - Plot molecular structures, energy convergence, and operator coefficients
- Report Generation - Automatically generate detailed reports for each processed molecule
- Scientific Reference Validation - Compare VQE results against experimentally verified ground state energies
- Feature Tables - Access structured data through 9 specialized ML feature tables
- Processing Metadata - Track data lineage and processing history
Production Deployment¶
- Containerized Execution - Deploy as multi-service Docker containers with GPU support
- CI/CD Pipeline - Automated testing, building, and publishing of Docker images
- Scalable Architecture - Distributed processing with Spark clusters and horizontal scaling
- Security - Comprehensive secrets management and secure communication between services
Quick Links¶
-
Getting Started
Install Quantum Pipeline and run your first VQE simulation in minutes
-
Configuration
Learn about optimizers, simulation methods, and parameter tuning
-
Architecture
Understand the system design, data flow, and Avro serialization
-
Deployment
Deploy with Docker, enable GPU acceleration, configure environments
System Architecture¶
Thesis Experiment Architecture¶
The following diagram presents the architecture as deployed for the engineering thesis experiments. Kafka Connect writes raw Avro files to MinIO, and Spark (triggered by Airflow) reads from MinIO to produce ML features, while utilising Iceberg for incremental processing.
graph TB
subgraph "Quantum Simulation"
QP[Quantum Pipeline<br/>VQE Runner]
end
subgraph "Streaming Layer"
KAFKA[Apache Kafka<br/>Message Broker]
SR[Schema Registry<br/>Avro Schemas]
KC[Kafka Connect<br/>S3 Sink]
end
subgraph "Storage Layer"
MINIO[MinIO<br/>Object Storage]
ICEBERG[Apache Iceberg<br/>Feature Tables]
end
subgraph "Processing Layer"
AIRFLOW[Apache Airflow<br/>Orchestration]
SPARK[Apache Spark<br/>Feature Engineering]
end
QP -->|Publish Results| KAFKA
KAFKA <-->|Schema Validation| SR
KAFKA -->|Consume Topics| KC
KC -->|Write Avro Files| MINIO
AIRFLOW -->|Trigger| SPARK
SPARK -->|Read Raw Data| MINIO
SPARK -->|Write Features| ICEBERG
ICEBERG -->|Store Parquet| MINIO
style QP fill:#c5cae9,color:#1a237e
style KAFKA fill:#ffe082,color:#000
style SR fill:#ffe082,color:#000
style KC fill:#ffe082,color:#000
style SPARK fill:#a5d6a7,color:#1b5e20
style AIRFLOW fill:#90caf9,color:#0d47a1
style ICEBERG fill:#b39ddb,color:#311b92
style MINIO fill:#b39ddb,color:#311b92
General Architecture¶
The project is configurable - Kafka can stream directly to Spark consumers, bypassing the MinIO intermediate storage. This is useful for real-time processing scenarios.
graph TB
subgraph "Quantum Simulation"
QP2[Quantum Pipeline<br/>VQE Runner]
end
subgraph "Streaming Layer"
KAFKA2[Apache Kafka<br/>Message Broker]
SR2[Schema Registry<br/>Avro Schemas]
end
subgraph "Processing Layer"
SPARK2[Apache Spark<br/>Feature Engineering]
AIRFLOW2[Apache Airflow<br/>Orchestration]
end
subgraph "Storage Layer"
ICEBERG2[Apache Iceberg<br/>Data Lake]
MINIO2[MinIO<br/>Object Storage]
end
QP2 -->|Stream Results| KAFKA2
KAFKA2 <-->|Schema Validation| SR2
KAFKA2 -->|Consume| SPARK2
AIRFLOW2 -->|Schedule| SPARK2
SPARK2 -->|Write Features| ICEBERG2
ICEBERG2 -->|Store| MINIO2
SPARK2 -->|Store Raw| MINIO2
style QP2 fill:#c5cae9,color:#1a237e
style KAFKA2 fill:#ffe082,color:#000
style SR2 fill:#ffe082,color:#000
style SPARK2 fill:#a5d6a7,color:#1b5e20
style AIRFLOW2 fill:#90caf9,color:#0d47a1
style ICEBERG2 fill:#b39ddb,color:#311b92
style MINIO2 fill:#b39ddb,color:#311b92
Technology Stack¶
- Qiskit - IBM's quantum computing framework
- Qiskit Aer - Quantum circuit simulator
- PySCF - Quantum chemistry library for Python
- CUDA/cuQuantum - GPU acceleration for quantum simulations
- Apache Kafka - Distributed event streaming platform
- Apache Spark - Unified analytics engine for big data
- Apache Airflow - Workflow orchestration platform
- Apache Iceberg - Open table format for data lakes
- MinIO - S3-compatible object storage
- Docker - Container platform
- Prometheus - Monitoring and alerting toolkit
- Grafana - Metrics visualization and dashboards
- PostgreSQL - Relational database for metadata
Use Cases¶
Research & Development
- Explore VQE convergence behavior across different molecules
- Benchmark CPU vs GPU acceleration for quantum simulations
- Compare optimizer performance
Data Science & ML
- Analyze quantum experiment metadata at scale
- Create time-series predictions for molecular properties
Production Deployments
- Run automated quantum simulations
- Monitor system performance and scientific accuracy
- Scaling processing with distributed Spark clusters
Next Steps¶
- Install Quantum Pipeline - Get up and running
- First Simulation - H₂ molecule example
- Configuration Options - Customize your setup
- Full Platform Deployment - Launch all services
Links related to the project¶
- GitHub: straightchlorine/quantum-pipeline
- Codeberg (mirror): piotrkrzysztof/quantum-pipeline
- Docker Hub: straightchlorine/quantum-pipeline
- PyPI: quantum-pipeline
- Issues: Report bugs or request features
Engineering Thesis Project
This project was developed as part of an engineering thesis at the DSW University of Lower Silesia focusing on GPU-accelerated quantum simulations and production-grade data engineering for quantum computing workflows.