Skip to content

Docker Compose Deployment

The full Quantum Pipeline platform runs as a set of interconnected Docker containers managed by Docker Compose. This deployment includes the quantum simulation engine, Apache Kafka for data streaming, Apache Spark for data processing, Apache Airflow for workflow orchestration, MinIO for object storage, and monitoring agents.

Overview

The platform follows a microservices architecture where each component runs in its own container, communicating over a shared Docker bridge network. Services can be scaled, replaced, or upgraded independently without affecting the rest of the system.

The Docker Compose configuration defines:

  • Simulation containers - Quantum Pipeline (single GPU instance in default config; CPU + multi-GPU in thesis config)
  • Streaming layer - Kafka broker, Schema Registry, Kafka Connect
  • Processing layer - Spark master and workers
  • Orchestration - Airflow webserver, scheduler, and triggerer with PostgreSQL
  • Storage - MinIO object storage with automatic bucket initialization
  • Monitoring - Dozzle and Portainer agents for container management (thesis config only)

Port mappings and dependencies are listed in the service tables below.

Quick Start

Prerequisites

  • Docker Engine 24.0 or later
  • Docker Compose v2.20 or later
  • NVIDIA Container Toolkit (for GPU containers)

Step 1: Clone the Repository

git clone https://github.com/straightchlorine/quantum-pipeline.git
cd quantum-pipeline

Step 2: Configure Environment

Copy the example environment file and customize it:

cp .env.thesis.example .env

Edit .env to set credentials, resource limits, and service ports. See the Environment Variables reference for all options.

Step 3: Deploy

For the full thesis experiment setup (docker-compose.thesis.yaml):

docker compose -f docker-compose.thesis.yaml up -d

For a single-GPU deployment without monitoring agents (docker-compose.yaml):

docker compose up -d

Step 4: Verify

Check that all services are running:

docker compose -f docker-compose.thesis.yaml ps

Example: Thesis Experiment Setup

The docker-compose.thesis.yaml file serves as an example of a multi-container deployment for hardware benchmarking. It defines a three-way comparison of CPU and GPU performance across different hardware configurations, running three Quantum Pipeline containers simultaneously alongside monitoring agents (Dozzle and Portainer). Use this as a reference for building your own multi-instance deployments.

Container Configuration

1. CPU Pipeline (quantum-pipeline-cpu)

  • Built from docker/Dockerfile.cpu
  • Resource limits: 2 CPUs, 10 GB RAM
  • Simulation method: statevector
  • Optimizer: L-BFGS-B with 300 max iterations
  • Convergence threshold: 1e-6
  • Publishes results to Kafka topic vqe_results_cpu

2. GPU Pipeline - GTX 1060 (quantum-pipeline-gpu1)

  • Built from docker/Dockerfile.gpu
  • Resource limits: 2 CPUs, 10 GB RAM + GTX 1060 6GB (device 0)
  • Same simulation parameters as CPU
  • Publishes results to Kafka topic vqe_results_gpu1
  • Environment: CUDA_VISIBLE_DEVICES=0

3. GPU Pipeline - GTX 1050 Ti (quantum-pipeline-gpu2)

  • Built from docker/Dockerfile.gpu
  • Resource limits: 2 CPUs, 10 GB RAM + GTX 1050 Ti 4GB (device 1)
  • Same simulation parameters as CPU
  • Publishes results to Kafka topic vqe_results_gpu2
  • Environment: CUDA_VISIBLE_DEVICES=0 (mapped as device 0 inside container)

All three containers use the same molecule dataset (molecules.thesis.json), basis set (sto3g), and optimization parameters, ensuring a fair comparison.

GPU Resource Allocation

Each GPU container reserves a specific GPU using Docker's device reservation:

deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          device_ids: ['0']  # Specific GPU device
          capabilities: [gpu]

This ensures each container has exclusive access to its assigned GPU, preventing resource contention during benchmarks.

Service Configuration

Streaming Services

Service Image Port(s) Description
kafka bitnami/kafka 9092 (internal), 9094 (external) Message broker with KRaft mode
schema-registry confluentinc/cp-schema-registry 8081 Avro schema management
kafka-connect confluentinc/cp-kafka-connect 8083 S3 Sink connector to MinIO (minio-sink-config.json)
kafka-connect-init curlimages/curl - Registers the MinIO Sink connector

Kafka runs in KRaft mode. Schema Registry manages Avro schemas. Kafka Connect runs the S3 Sink connector to stream data to MinIO.

Processing Services

Service Image Port(s) Description
spark-master Custom (Dockerfile.spark) 8080 (UI), 7077 (RPC) Spark cluster master
spark-worker Custom (Dockerfile.spark) - Spark executor node — thesis config: 4 GB, 2 cores; default: 1 GB, 1 core

Orchestration Services

Service Image Port(s) Description
airflow-webserver Custom (Dockerfile.airflow) 8084 Airflow web interface
airflow-scheduler Custom (Dockerfile.airflow) - DAG scheduling
airflow-triggerer Custom (Dockerfile.airflow) - Deferred task execution
airflow-init Custom (Dockerfile.airflow) - Database migration, user creation
postgres postgres:13 - (internal only) Airflow metadata database

Airflow uses the LocalExecutor with PostgreSQL. The init container runs migrations, creates the admin user, and registers the Spark connection.

Storage and Monitoring Services

Service Image Port(s) Description
minio minio/minio $MINIO_API_PORT (API), $MINIO_CONSOLE_PORT (Console) S3-compatible object storage
mc-setup minio/mc - Creates buckets and sets policies
dozzle amir20/dozzle 7007 Real-time container log viewer (Dozzle) — thesis config only
portainer portainer/agent 9002 Container management agent (Portainer) — thesis config only

The mc-setup container automatically creates required buckets on first run. The default bucket name is configurable via the MINIO_BUCKET environment variable (default: quantum-data).

Networking

All services share a single Docker bridge network (quantum-pipeline-network). Services communicate using container names as hostnames (e.g., kafka:9092, minio:9000). The Kafka external listener on port 9094 allows access from outside the Docker network.

Volumes

The deployment uses named Docker volumes for persistent data:

Volume Service Purpose
quantum-minio-data MinIO Object storage data
quantum-spark-warehouse Spark Spark SQL warehouse
quantum-airflow-postgres PostgreSQL Airflow metadata
quantum-airflow-logs Airflow Task execution logs
quantum-kafka-data Kafka Kafka log segments

Bind mounts provide access to project files: ./gen/ (simulation output), ./data/ (molecule definitions), ./docker/airflow/ (DAG definitions), and ./docker/connectors/ (Kafka Connect configs).

Health Checks

Services include health checks for proper startup ordering via depends_on conditions. The Quantum Pipeline containers wait for kafka-connect-init to complete before starting. See the docker-compose.thesis.yaml file for the full health check configuration.

Scaling

Scale Spark workers with:

docker compose -f docker-compose.thesis.yaml up -d --scale spark-worker=3

For additional simulation instances, add new service definitions with a unique container_name, distinct --topic, appropriate GPU device assignment, and separate output volume mounts. Each Spark worker reserves 2 cores and 4 GB RAM; each GPU container requires exclusive GPU access.

Stopping the Deployment

# Stop services, preserve data
docker compose -f docker-compose.thesis.yaml down

# Stop and remove all data volumes
docker compose -f docker-compose.thesis.yaml down -v