Programming models

Programming models are fundamental specifications that define how software is structured and executed. They provide a framework for developers to express algorithms and organize code, often abstracting away low-level details of the underlying hardware or execution environment. Different models are suited to different types of problems and hardware architectures, offering varying levels of abstraction and control.

In this lesson, we will review quantum and classical programming models and see how we can combine them to operate algorithms in heterogeneous environments. Iskandar Sitdikov gives us an overview in the following video.

Programming model for QPUs

We will start with the programming model for quantum computers. The fundamental programming model which is familiar to nearly all quantum developers is the quantum circuit. We will not get into the details of the quantum circuit model here, as we already have a great lecture by John Watrous that explains this in detail. We will only mention that the circuit is built out of a set of lines (called wires) that represent qubits, gates that represent operations on quantum states, and a set of measurements.

A quantum circuit diagram showing qubits as horizontal lines and quantum gates as boxes or connections between qubits.

Another important programming model concept for quantum computing is what we call computational primitives. These primitives represent some of the most common tasks that users aim to accomplish with a quantum computer. There are several primitives available at the moment, including Executor. In this course we will focus primarily on the primitives Sampler and Estimator. Sampler gives you the ability to sample a state prepared by your quantum circuit. It tells you which computational basis states make up the quantum state prepared on your quantum circuit. Estimator allows you to estimate the expectation value of an observable for a system in the state prepared by your quantum circuit. A common context is estimating the energy of a system in a specific state.

A model histogram of results from sampler. Some states are very likely to be measured, others are very unlikely.

The last thing we are going to talk about in this section is transpilation. Transpilation is the process of rewriting a given input circuit to match the physical constraints and Instruction Set Architecture (ISA) of a specific quantum device. Similar to classical compilers, this means translating abstract unitary operations into the native gate set that the target device can execute. It also optimizes the circuit instructions for efficient execution on noisy quantum computers, with the routine gradually changing the circuit's structure by applying several optimization stages.

A diagram of transpilation showing how an abstract circuit is mapped into an instruction set architecture circuit. That is, the circuit is rewritten using the native gates and connectivity of the target hardware.

Check your understanding

How many qubits are in the circuit below? A circuit diagram with four horizontal lines and many gates.

Answer:

Four.

Check your understanding

Suppose you are modeling the electrons in a molecule. You want to approximate (a) the ground state energy of the molecule, and (b) which computational basis states are most dominant in the ground state of the molecule. In each case, would you use the Estimator or Sampler primitive?

Answer:

(a) Estimator (b) Sampler

Classical programming models

There are many programming models for classical computers, but for this section we will focus on two of the most popular: parallel programming and task workflows. Using these two models alongside quantum programming models, one can express almost any hybrid quantum-classical workflow of any complexity.

Parallel programming

Parallel programming is a model that divides a program into sub-problems that can be executed simultaneously. There are two main paradigms of parallel programming:

Shared memory parallelism (Open Multiprocessing, or OpenMP) : Used to exploit multiple cores within a single compute node. Threads of execution share a single memory space.
Distributed memory parallelism (Message Passing Interface, or MPI): Used for scaling across multiple separate compute nodes. Each process has its own isolated memory space.

Here, we'll focus on the distributed memory model because it is essential for multi-node supercomputing and coordinating large-scale heterogeneous quantum-classical jobs.

There are a few concepts we need to understand to operate in distributed memory parallel programming models:

Process - An independent instance of the program with its own memory space.
Rank - A unique integer identifier assigned to each process, used specifically to identify the sender and receiver during communication (not necessarily a "rank" in the sense of prioritizing).
Synchronization - A mechanism for coordination among different ranks and processes.
Single program, multiple data (SPMD) - An abstract computational model where a single source code instance runs simultaneously on multiple processes, each operating on a different subset of the total data.
Message passing - The communication paradigm used in distributed memory architectures that allows independent processes to exchange data and intermediate results. It relies on explicit 'send' and 'receive' operations to coordinate execution between different compute nodes.

There is a standard called MPI that implements this message passing paradigm for parallel architectures. MPI serves as the functional embodiment of all the concepts listed above, providing the specific library calls necessary to manage processes, assign ranks, facilitate synchronization, and enable message passing under the SPMD model. Gathering all of these concepts together, we can say the execution of a parallel program happens in the following way:

A single compiled program (the same binary file) is copied to and executed by a job launcher to create multiple parallel processes across multiple nodes.
The main control flow of the program is dictated by the rank of the process. This is the SPMD principle in action: the program uses conditional logic (for example, if (rank == 0)) to ensure that only certain, parallelized sections of the code are executed by the worker processes, while a master process (often Rank 0) handles initialization and final aggregation.
Communication between processes occurs through message passing (using MPI), which is called whenever a process needs to exchange data or intermediate results with another rank.

Visually, it will look something like this:

A diagram of a task being divided between nodes.

Let's try applying some of the concepts that we just learned to code.

First, we will try to run a simple "hello world" parallel program using OpenMPI, which is an implementation of the MPI protocol, a standard for message passing in parallel programming. Here, we will use the mpi4py Python package, which is a Python binding for the Message Passing Interface (MPI) standard.

$ vim mpi-hello-world.py

from mpi4py import MPI
import sys
 
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
 
sys.stdout.write(f"[Rank {rank}] Hello from process {rank} of {size}!\n")
 
if rank == 0:
    data = {'answer': 42, 'pi': 3.14}
    sys.stdout.write(f"[Rank {rank}] Sending: {data}\n")
    comm.send(data, dest=1, tag=42)
elif rank == 1:
    data = comm.recv(source=0, tag=42)
    sys.stdout.write(f"[Rank {rank}] Received: {data}\n")
 
~
~

We will use two nodes to run this program, which we will specify in our submission script.

$ vim mpi-hello-world.sh
 
#!/bin/bash
#
#SBATCH --job-name=mpi-hello-world
#SBATCH --output=mpi-hello-world.out
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --partition=normal
 
/usr/lib64/openmpi/bin/mpirun python /data/ch3/parallel/mpi-hello-world.py

Then run the shell script.

$ sbatch mpi-hello-world.sh

We can check the result logs of the job.

$ cat mpi-hello-world.out | grep Rank
 
[Rank 1] Hello from process 1 of 2!
[Rank 0] Hello from process 0 of 2!
[Rank 0] Sending: {'answer': 42, 'pi': 3.14}
[Rank 1] Received: {'answer': 42, 'pi': 3.14}

Here we used two nodes and the process on each node is now identified by a rank - Rank 0 and Rank 1 - which are used to decide program control flow.

Task workflows

Now let's talk about the Task workflow programming model. A task workflow abstracts computation into a directed acyclic graph (DAG). In this graph, each node represents a particular task or job, and the edges (the arrows connecting the nodes) represent the dependencies (data and ordering) between them. A scheduler is the component that maps tasks to resources and orchestrates execution.

A concrete example of a task workflow model applied to quantum computing is the Qiskit patterns framework. A Qiskit pattern is a general framework designed to break down domain-specific problems into a sequence of stages, especially for quantum tasks. This allows for the seamless composability of new capabilities developed by IBM Quantum® researchers (and others) and enables a future in which quantum computing tasks are performed by powerful heterogeneous (CPU/GPU/QPU) computing infrastructure. The four steps of a Qiskit pattern are mapping, optimization, execution, and post-processing, where all tasks are executed one after another in a pipeline. But with task workflows we are not bound to a linear execution order and can execute tasks in parallel. Each task of a workflow can be an entire parallel job of its own. So, you can mix and match these models to describe arbitrarily complex algorithms, and a workload manager like Slurm will handle these.

A diagram of computing tasks organized into a workflow in which some processes are executed in parallel and others in sequence.

The image above illustrates the Qiskit pattern in action. The workflow has a graph structure with four stages. This branch-like structure is orchestrated and executed by the scheduler. The problem is mapped into quantum-executable form (quantum circuit) at the initial stage. In the next stage, this quantum circuit is optimized for the specific quantum hardware. The image shows this as a parallel process, which demonstrates how multiple optimization strategies could be applied at the same time. The optimized quantum circuit is then executed on the actual quantum hardware. This is the third stage of the image where the scheduler works with one purple quantum processing unit. Finally, the results are post-processed by classical resources.

Why both?

So why do we need both parallel programming and task workflows? For all the talk about quantum parallelism, it is worth clarifying that not everything is parallel in quantum computing.

The previous lesson on the SQD workflow mentioned some processes that cannot be parallelized. For example, we need the results of many quantum measurements in order to project our matrix into a subspace of tractable dimension. In turn, we need the diagonalized matrix and the associated state vectors to check self-consistency of the quantum measurements (using, for example, charge conservation). After all that, we need to decide whether the ground state energy has converged sufficiently for our purposes. These steps are necessarily sequential and require testing of convergence and self-consistency conditions before proceeding.

A schematic of the workflow specific to sample-based quantum diagonalization. The steps include a variational quantum circuit, using measurements to project the Hamiltonian into a subspace, then using a classical optimizer to update variational parameters in the circuit and repeating.

This workflow will be revisited in greater detail and implemented in the next section. The only thing you need to take away from this section is that task workflows are necessary.

Programming practice

The beauty of programming models is that you can mix and match them all together. Knowing quantum and classical programming models, you can describe a heterogeneous computation of arbitrary complexity and execute it on hardware. Let's practice this with a small example of a combined workflow, which implements the Qiskit pattern (map, optimize, execute, and post-process) within Slurm that we learned in the last chapter. Each of the four tasks will be a separate Slurm job, each with its own resources. The optimization task will use MPI to optimize circuits in parallel (only for the sake of example, like the above image). The execution task will use quantum resources and quantum programming models (circuit and sampler). The last task - post-processing - will again use MPI in parallel with classical resources.

Mapping

The mapping.py program is designed to build a PauliTwoDesign circuit, which is frequently used in quantum machine learning literature and quantum benchmark literature, with a simple observable which measures the $(n-1)^\text{th}$ qubit in the $Z$ direction of an $n$ -qubit system with a random initial parameters. Each of these (the quantum circuit converted into a qasm file, the observable, and the parameters) will be saved into a separate file under the data directory and will be used as an input in the optimization stage.

The shell script of this stage (mapping.sh) is

#!/bin/bash
#
#SBATCH --job-name=mapping
#SBATCH --output=mapping.out
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --partition=normal
 
 
srun python /data/ch3/workflows/mapping.py

which defines its job name, output format, and the number of nodes/tasks/CPUs.

Optimization

The optimization.py program starts by bringing files from the mapping stage. Here you will use QRMI to bring quantum resources into this program.

qrmi = QRMI()
resources = qrmi.resources()
quantum_resource = resources[0]
...

It then performs a light optimization by setting optimization_level=1 to transpile the quantum circuit and apply the circuit's layout to the observable, then save these to the data folder.

The shell script of this stage (optimization.sh) is

#!/bin/bash
#SBATCH --job-name=optimization
#SBATCH --output=output/optimization.out
#SBATCH --ntasks=4
#SBATCH --partition=classical
 
srun python3 /tmp/optimization.py

Here --ntasks=4 requests four classical tasks from Slurm for a parallel process.

Execution

This is the core quantum stage where the optimized quantum circuit from the previous step is run on the QPU by Estimator. To do this, first we will bring three files - the transpiled quantum circuit, the observable, and the initial parameters - then pass it to Estimator. It yields the estimated value of the observable and prints it out.

The execution.sh script leverages a Slurm plugin to use a quantum resource.

#!/bin/bash
#
#SBATCH --job-name=execution
#SBATCH --output=execution.out
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --partition=quantum
#SBATCH --gres=qpu:1


srun python /data/ch3/workflows/execution.py

Post-processing

The post-processing step often involves classical diagonalization and self-consistency checks. It might also be iterative. It is most useful to consider the post-processing step in the next lesson, in which the physical context and the purpose of iterative steps are clear.

Combining it all together

We can chain all of these tasks into a workflow by using the dependency argument for the sbatch command:

$ MAPPING_JOB=$(sbatch --parsable mapping.sh)
$ OPTIMIZE_JOB=$(sbatch --parsable --dependency=afterok:$MAPPING_JOB optimization.sh)
$ EXECUTE_JOB=$(sbatch --parsable --dependency=afterok:$OPTIMIZE_JOB execute.sh)

And we can check our Slurm execution queue.

$ squeue
#             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
#                 3 classical  mapping    admin PD       0:00      1 (None)
#                 4 classical optimiza    admin PD       0:00      1 (Dependency)
#                 5   quantum  execute    admin PD       0:00      1 (Dependency)

This was a toy example to demonstrate the mixture of programming models. In the next chapter we will look at real-world algorithms and demonstrate programming models and resource management on useful workflows.

Summary

In this lesson, we have demonstrated how to combine multiple classical and quantum programming models to build, manage, and execute a complete four-stage workflow. We started with the fundamental concepts of quantum circuits and primitives, then explored classical models like parallel programming and task workflows. By combining all concepts, we constructed a Qiskit pattern — map, optimize, execute, and post-process — orchestrated by the Slurm workload manager with a simple quantum circuit and an observable.

In the next lesson, we will use this framework to run sample-based quantum algorithms, showing how this workflow can be applied to solve meaningful problems.

All the code and scripts used in this chapter are available to you within this Github repository.

Was this page helpful?

Report a bug, typo, or request content on GitHub.