Docker Basics¶
Quantum Pipeline provides Docker images for CPU and GPU workloads, as well as images for Spark and Airflow services. This page covers the available images, building from source, and running containers.
Available Images¶
All images are published to Docker Hub under the
straightchlorine/quantum-pipeline
repository.
| Image Tag | Base Image | Purpose |
|---|---|---|
straightchlorine/quantum-pipeline:cpu |
python:3.12-slim-bookworm |
CPU-only simulations |
straightchlorine/quantum-pipeline:gpu |
nvidia/cuda:12.6.3-cudnn-devel-ubuntu24.04 |
GPU-accelerated simulations |
quantum-pipeline-spark |
apache/spark:4.0.2-python3 |
Spark master and worker nodes |
airflow (custom) |
apache/airflow:3.1.8 |
Airflow API server, scheduler, workers, triggerer |
CPU Image¶
The CPU image is a lightweight container based on Python 3.12 (Bookworm). It installs
the Quantum Pipeline package and its dependencies directly from the project source.
The data/ directory with sample molecule files is included in the image, so you
can run simulations immediately with --file data/molecules.json.
GPU Image¶
The GPU image is built on NVIDIA CUDA 12.6.3 with cuDNN. It includes a custom compilation of Qiskit Aer from source with CUDA Thrust backend support.
The straightchlorine/quantum-pipeline:gpu image on Docker Hub is compiled for
Ampere architecture (CUDA_ARCH=8.6). If your GPU uses a different
architecture, rebuild locally with the appropriate CUDA_ARCH:
CUDA_ARCH |
Architecture | Example GPUs |
|---|---|---|
6.1 |
Pascal | GTX 1060, GTX 1050 Ti |
7.5 |
Turing | RTX 2070, RTX 2080 |
8.6 |
Ampere (Docker Hub default) | RTX 3060, RTX 3080 |
8.9 |
Ada Lovelace | RTX 4070, RTX 4090 |
The GPU image is not built in CI/CD due to the long compilation time (Aer from source). It is built and pushed manually.
Spark Image¶
The Spark image extends the Apache Spark 4.0.2 base by replacing Python 3.10 with
3.12 so it matches the Airflow driver environment. The image
itself contains no application code or JARs - those are resolved at runtime via
spark.jars.packages in spark-defaults.conf and cached in a named Ivy volume.
Airflow Image¶
The Airflow image adds Java 17 (required for Spark job submission), Docker CE CLI
with buildx and compose plugins (for batch generation DAGs that spawn simulation
containers), rclone (for R2 sync), and Python dependencies (apache-airflow-providers-apache-spark,
pyspark) to the Apache Airflow 3.1.8 base image.
The DOCKER_GID build argument sets the GID of the docker group inside the
container so the Airflow user can access the host Docker socket. Check your host
GID with stat -c '%g' /var/run/docker.sock and pass it at build time if it
differs from the default (970).
Building from Source¶
Using the justfile (recommended)¶
The simplest way to build images:
# Build CPU image
just docker-build cpu
# Build GPU image (default CUDA_ARCH=8.6/Ampere)
just docker-build gpu
# Build GPU image for Pascal GPUs
CUDA_ARCH=6.1 just docker-build gpu
# Build both CPU and GPU images
just docker-build all
Manual builds¶
Each image can be built directly with Docker:
# CPU
docker build -t quantum-pipeline:cpu -f docker/Dockerfile.cpu .
# GPU (default Ampere)
docker build -t quantum-pipeline:gpu -f docker/Dockerfile.gpu .
# GPU targeting Pascal
docker build -t quantum-pipeline:gpu -f docker/Dockerfile.gpu \
--build-arg CUDA_ARCH="6.1" .
# Airflow (with custom Docker GID)
docker build -t quantum-pipeline-airflow -f docker/airflow/Dockerfile \
--build-arg DOCKER_GID="$(stat -c '%g' /var/run/docker.sock)" \
docker/airflow/
GPU Build Time
Building the GPU image compiles Qiskit Aer from source with CUDA Thrust backend
support. This takes a while. Make sure CUDA_ARCH matches your target GPU!
Running Containers¶
CPU Simulation¶
Run a basic CPU simulation with:
docker run --rm \
-v $(pwd)/data:/usr/src/quantum_pipeline/data \
-v $(pwd)/gen:/usr/src/quantum_pipeline/gen \
quantum-pipeline:cpu \
--file ./data/molecules.json \
--simulation-method statevector \
--max-iterations 150 \
--convergence
GPU Simulation¶
For GPU-accelerated simulation, pass the --gpus flag and add the --gpu argument:
docker run --rm --gpus all \
-v $(pwd)/data:/usr/src/quantum_pipeline/data \
-v $(pwd)/gen:/usr/src/quantum_pipeline/gen \
quantum-pipeline:gpu \
--file ./data/molecules.json \
--gpu \
--simulation-method statevector \
--max-iterations 150 \
--convergence
Connecting to Kafka¶
To send results to a Kafka broker, add the --kafka flag and set the
KAFKA_SERVERS environment variable:
docker run --rm --gpus all \
--network quantum-ml-network \
-e KAFKA_SERVERS=kafka:9092 \
-v $(pwd)/data:/usr/src/quantum_pipeline/data \
quantum-pipeline:gpu \
--file ./data/molecules.json \
--gpu \
--kafka \
--simulation-method statevector \
--max-iterations 150 \
--convergence
Common Options¶
| Flag | Description |
|---|---|
--file <path> |
Path to molecule definition file (JSON) |
--gpu |
Enable GPU acceleration |
--kafka |
Enable Kafka output |
--topic <name> |
Kafka topic name for results |
--simulation-method <method> |
Simulation method (statevector, automatic) |
--max-iterations <n> |
Maximum VQE optimizer iterations |
--convergence |
Enable convergence threshold |
--threshold <value> |
Convergence threshold value (default: 1e-6) |
--optimizer <name> |
Optimizer algorithm (e.g., L-BFGS-B, COBYLA) |
--basis <set> |
Basis set (e.g., sto3g, cc-pvdz) |
--ansatz <type> |
Ansatz type (EfficientSU2, RealAmplitudes, ExcitationPreserving) |
--seed <n> |
Random seed for reproducible parameter initialization |
--init-strategy <strategy> |
Parameter initialization (random, hf) |
--log-level <level> |
Log level (DEBUG, INFO, WARNING, ERROR) |
--report |
Generate a report after simulation |
--enable-performance-monitoring |
Enable resource monitoring |