{/* cspell:ignore operatorname */}

# Convex combinations of density matrices

## Probabilistic selections of density matrices

A key aspect of density matrices is that *probabilistic selections* of quantum states are represented by *convex combinations* of their associated density matrices.

For example, if we have two density matrices, $\rho$ and $\sigma,$ representing quantum states of a system $\mathsf{X},$ and we prepare the system in the state $\rho$ with probability $p$ and $\sigma$ with probability $1 - p,$ then the resulting quantum state is represented by the density matrix

$$
p \rho + (1 - p) \sigma.
$$

More generally, if we have $m$ quantum states represented by density matrices $\rho_0,\ldots,\rho_{m-1},$ and a system is prepared in the state $\rho_k$ with probability $p_k$ for some probability vector $(p_0,\ldots,p_{m-1}),$ the resulting state is represented by the density matrix

$$
\sum_{k = 0}^{m-1} p_k \rho_k.
$$

This is a *convex combination* of the density matrices $\rho_0,\ldots,\rho_{m-1}.$

It follows that if we have $m$ quantum state vectors $\vert\psi_0\rangle,\ldots,\vert\psi_{m-1}\rangle,$ and we prepare a system in the state $\vert\psi_k\rangle$ with probability $p_k$ for each $k\in\{0,\ldots,m-1\},$ the state we obtain is represented by the density matrix

$$
\sum_{k = 0}^{m-1} p_k \vert\psi_k\rangle\langle\psi_k\vert.
$$

For example, if a qubit is prepared in the state $\vert 0\rangle$ with probability $1/2$ and in the state $\vert + \rangle$ with probability $1/2,$ the density matrix representation of the state we obtain is given by

$$
\frac{1}{2} \vert 0\rangle\langle 0 \vert + \frac{1}{2} \vert +\rangle\langle + \vert
= \frac{1}{2} \begin{pmatrix} 1 & 0\\[1mm] 0 & 0 \end{pmatrix} + \frac{1}{2} \begin{pmatrix} \frac{1}{2} & \frac{1}{2}\\[2mm] \frac{1}{2} & \frac{1}{2} \end{pmatrix}
= \begin{pmatrix} \frac{3}{4} & \frac{1}{4}\\[2mm]
\frac{1}{4} & \frac{1}{4}
\end{pmatrix}.
$$

In the simplified formulation of quantum information, averaging quantum state vectors like this doesn't work.
For instance, the vector

$$
\frac{1}{2} \vert 0\rangle + \frac{1}{2} \vert + \rangle
= \frac{1}{2} \begin{pmatrix}1\\[1mm] 0\end{pmatrix} + \frac{1}{2} \begin{pmatrix}\frac{1}{\sqrt{2}}\\[2mm]\frac{1}{\sqrt{2}}\end{pmatrix}
= \begin{pmatrix}\frac{2 + \sqrt{2}}{4}\\[2mm]\frac{\sqrt{2}}{4}\end{pmatrix}
$$

is not a valid quantum state vector because its Euclidean norm is not equal to $1.$
A more extreme example that shows that this doesn't work for quantum state vectors is that we fix any quantum state vector $\vert\psi\rangle$ that we wish, and then we take our state to be $\vert\psi\rangle$ with probability $1/2$ and $-\vert\psi\rangle$ with probability $1/2.$
These states differ by a global phase, so they're actually the same state — but averaging gives us the zero vector, which is not a valid quantum state vector.

## The completely mixed state

Suppose we set the state of a qubit to be $\vert 0\rangle$ or $\vert 1\rangle$ randomly, each with probability $1/2.$
The density matrix representing the resulting state is as follows.

$$
\frac{1}{2} \vert 0\rangle\langle 0\vert + \frac{1}{2} \vert 1\rangle\langle 1\vert
= \frac{1}{2}
\begin{pmatrix}
1 & 0\\[1mm]
0 & 0
\end{pmatrix} + \frac{1}{2}
\begin{pmatrix}
0 & 0\\[1mm]
0 & 1
\end{pmatrix}
= \begin{pmatrix}
\frac{1}{2} & 0\\[1mm]
0 & \frac{1}{2}
\end{pmatrix}
= \frac{1}{2} \mathbb{I}
$$

(In this equation the symbol $\mathbb{I}$ denotes the $2\times 2$ identity matrix.)
This is a special state known as the *completely mixed state*.
It represents complete uncertainty about the state of a qubit, similar to a uniform random bit in the probabilistic setting.

Now suppose that we change the procedure: in place of the states $\vert 0\rangle$ and $\vert 1\rangle$ we'll use the states $\vert + \rangle$ and $\vert - \rangle.$
We can compute the density matrix that describes the resulting state in a similar way.

$$
\frac{1}{2} \vert +\rangle\langle +\vert + \frac{1}{2} \vert -\rangle\langle -\vert
= \frac{1}{2}
\begin{pmatrix}
\frac{1}{2} & \frac{1}{2}\\[2mm]
\frac{1}{2} & \frac{1}{2}
\end{pmatrix}
+ \frac{1}{2}
\begin{pmatrix}
\frac{1}{2} & -\frac{1}{2}\\[2mm]
-\frac{1}{2} & \frac{1}{2}
\end{pmatrix}
=
\begin{pmatrix}
\frac{1}{2} & 0\\[2mm]
0 & \frac{1}{2}
\end{pmatrix}
= \frac{1}{2} \mathbb{I}
$$

It's the same density matrix as before, even though we changed the states.
In fact, we would again obtain the same result — the completely mixed state — by substituting *any* two orthogonal qubit state vectors for $\vert 0\rangle$ and $\vert 1\rangle.$

This is a feature, not a bug!
We do in fact obtain exactly the same state either way.
That is, there's no way to distinguish the two procedures by measuring the qubit they produce, even in a statistical sense.
Our two different procedures are simply different ways to prepare this state.

We can verify that this makes sense by thinking about what we could hope to learn given a random selection of a state from one of the two possible state sets $\{\vert 0\rangle,\vert 1\rangle\}$ and $\{\vert +\rangle,\vert -\rangle\}.$
To keep things simple, let's suppose that we perform a unitary operation $U$ on our qubit and then measure in the standard basis.

In the first scenario, the state of the qubit is chosen uniformly from the set $\{\vert 0\rangle,\vert 1\rangle\}.$
If the state is $\vert 0\rangle,$ we obtain the outcomes $0$ and $1$ with probabilities

$$
\vert \langle 0 \vert U \vert 0 \rangle \vert^2
\quad\text{and}\quad
\vert \langle 1 \vert U \vert 0 \rangle \vert^2
$$

respectively.
If the state is $\vert 1\rangle,$ we obtain the outcomes $0$ and $1$ with probabilities

$$
\vert \langle 0 \vert U \vert 1 \rangle \vert^2
\quad\text{and}\quad
\vert \langle 1 \vert U \vert 1 \rangle \vert^2.
$$

Because the two possibilities each happen with probability $1/2,$ we obtain the outcome $0$ with probability

$$
\frac{1}{2}\vert \langle 0 \vert U \vert 0 \rangle \vert^2
+ \frac{1}{2}\vert \langle 0 \vert U \vert 1 \rangle \vert^2
$$

and the outcome $1$ with probability

$$
\frac{1}{2}\vert \langle 1 \vert U \vert 0 \rangle \vert^2
+ \frac{1}{2}\vert \langle 1 \vert U \vert 1 \rangle \vert^2.
$$

Both of these expressions are equal to $1/2.$
One way to argue this is to use a fact from linear algebra that can be seen as a generalization of the Pythagorean theorem.

<Figure title="Theorem">
  Suppose $\{\vert\psi_1\rangle,\ldots,\vert\psi_n\rangle\}$ is an orthonormal basis of a (real or complex) vector space $\mathcal{V}.$ For every vector $\vert \phi\rangle \in \mathcal{V}$ we have
  $\vert \langle \psi_1\vert\phi\rangle\vert^2 + \cdots + \vert \langle \psi_n \vert \phi \rangle\vert^2 = \| \vert\phi\rangle \|^2.$
</Figure>

We can apply this theorem to determine the probabilities as follows.
The probability to get $0$ is

$$
\begin{aligned}
\frac{1}{2}\vert \langle 0 \vert U \vert 0 \rangle \vert^2
+ \frac{1}{2}\vert \langle 0 \vert U \vert 1 \rangle \vert^2
& = \frac{1}{2} \Bigl(
\vert \langle 0 \vert U \vert 0 \rangle \vert^2
+ \vert \langle 0 \vert U \vert 1 \rangle \vert^2
\Bigr) \\[2mm]
& = \frac{1}{2} \Bigl(
\vert \langle 0 \vert U^{\dagger} \vert 0 \rangle \vert^2
+ \vert \langle 1 \vert U^{\dagger} \vert 0 \rangle \vert^2
\Bigr)\\[2mm]
& = \frac{1}{2} \bigl\| U^{\dagger} \vert 0 \rangle \bigr\|^2
\end{aligned}
$$

and the probability to get $1$ is

$$
\begin{aligned}
\frac{1}{2}\vert \langle 1 \vert U \vert 0 \rangle \vert^2
+ \frac{1}{2}\vert \langle 1 \vert U \vert 1 \rangle \vert^2
& = \frac{1}{2} \Bigl(
\vert \langle 1 \vert U \vert 0 \rangle \vert^2
+ \vert \langle 1 \vert U \vert 1 \rangle \vert^2
\Bigr) \\[2mm]
& = \frac{1}{2} \Bigl(
\vert \langle 0 \vert U^{\dagger} \vert 1 \rangle \vert^2
+ \vert \langle 1 \vert U^{\dagger} \vert 1 \rangle \vert^2
\Bigr)\\[2mm]
& = \frac{1}{2} \bigl\| U^{\dagger} \vert 1 \rangle \bigr\|^2.
\end{aligned}
$$

Because $U$ is unitary, we know that $U^{\dagger}$ is unitary as well, implying that both $U^{\dagger} \vert 0 \rangle$ and $U^{\dagger} \vert 1 \rangle$ are unit vectors.
Both probabilities are therefore equal to $1/2.$
This means that no matter how we choose $U,$ we're just going to get a uniform random bit from the measurement.

We can perform a similar verification for any other pair of orthonormal states in place of $\vert 0\rangle$ and $\vert 1\rangle.$
For example, because $\{\vert + \rangle, \vert - \rangle\}$ is an orthonormal basis, the probability to obtain the measurement outcome $0$ in the second procedure is

$$
\frac{1}{2}\vert \langle 0 \vert U \vert + \rangle \vert^2
+ \frac{1}{2}\vert \langle 0 \vert U \vert - \rangle \vert^2
= \frac{1}{2} \bigl\| U^{\dagger} \vert 0 \rangle \bigr\|^2 = \frac{1}{2}
$$

and the probability to get $1$ is

$$
\frac{1}{2}\vert \langle 1 \vert U \vert + \rangle \vert^2
+ \frac{1}{2}\vert \langle 1 \vert U \vert - \rangle \vert^2
= \frac{1}{2} \bigl\| U^{\dagger} \vert 1 \rangle \bigr\|^2 = \frac{1}{2}.
$$

In particular, we obtain exactly the same output statistics as we did for the states $\vert 0\rangle$ and $\vert 1\rangle.$

## Probabilistic states

Classical states can be represented by density matrices.
In particular, for each classical state $a$ of a system $\mathsf{X},$ the density matrix

$$
\rho = \vert a\rangle \langle a \vert
$$

represents $\mathsf{X}$ being definitively in the classical state $a.$
For qubits we have

$$
\vert 0\rangle \langle 0 \vert = \begin{pmatrix}1 & 0 \\ 0 & 0\end{pmatrix}
\quad\text{and}\quad
\vert 1\rangle \langle 1 \vert = \begin{pmatrix}0 & 0 \\ 0 & 1\end{pmatrix},
$$

and in general we have a single $1$ on the diagonal in the position corresponding to the classical state we have in mind, with all other entries zero.

We can then take convex combinations of these density matrices to represent probabilistic states.
Supposing for simplicity that our classical state set is $\{0,\ldots,n-1\},$ if  $\mathsf{X}$ is in the state $a$ with probability $p_a$ for each $a\in\{0,\ldots,n-1\},$ then the density matrix we obtain is

$$
\rho = \sum_{a = 0}^{n-1} p_a \vert a\rangle \langle a \vert
= \begin{pmatrix}
p_0 & 0 & \cdots & 0\\
0 & p_1 & \ddots & \vdots\\
\vdots & \ddots & \ddots & 0\\
0 & \cdots & 0 & p_{n-1}
\end{pmatrix}.
$$

Going in the other direction, any diagonal density matrix can naturally be identified with the probabilistic state we obtain by simply reading the probability vector off from the diagonal.

To be clear, when a density matrix is diagonal, it's not necessarily the case that we're talking about a classical system, or that the system must have been prepared through the random selection of a classical state, but rather that the state *could* have been obtained through the random selection of a classical state.

The fact that probabilistic states are represented by diagonal density matrices is consistent with the intuition suggested at the start of the lesson that off-diagonal entries describe the degree to which the two classical states corresponding to the row and column of that entry are in quantum superposition.
Here, all of the off-diagonal entries are zero, so we just have classical randomness and nothing is in quantum superposition.

## Density matrices and the spectral theorem

We've seen that if we take a convex combination of pure states,

$$
\rho = \sum_{k = 0}^{m-1} p_k \vert \psi_k\rangle \langle \psi_k \vert,
$$

we obtain a density matrix.
Every density matrix $\rho,$ in fact, can be expressed as a convex combination of pure states like this.
That is, there will always exist a collection of unit vectors $\{\vert\psi_0\rangle,\ldots,\vert\psi_{m-1}\rangle\}$ and a probability vector $(p_0,\ldots,p_{m-1})$ for which the equation above is true.

We can, moreover, always choose the number $m$ so that it agrees with the number of classical states of the system being considered, and we can select the quantum state vectors to be orthogonal.
The spectral theorem, which we encountered in the "Foundations of quantum algorithms" course, allows us to conclude this.
Here's a restatement of the spectral theorem for convenience.

<Figure title="Theorem">
  Spectral theorem: Let $M$ be a *normal* $n\times n$ complex matrix.
  There exists an orthonormal basis of $n$ dimensional complex vectors $\{\vert\psi_0\rangle,\ldots,\vert\psi_{n-1}\rangle \}$ along with complex numbers $\lambda_0,\ldots,\lambda_{n-1}$ such that

  $$
  M = \lambda_0 \vert \psi_0\rangle\langle \psi_0\vert + \cdots + \lambda_{n-1} \vert \psi_{n-1}\rangle\langle \psi_{n-1}\vert.
  $$
</Figure>

(Recall that a matrix $M$ is *normal* if it satisfies $M^{\dagger} M = M M^{\dagger}.$ In words, normal matrices are matrices that commute with their own conjugate transpose.)

We can apply the spectral theorem to any given density matrix $\rho$ because density matrices are always Hermitian and therefore normal.
This allows us to write

$$
\rho = \lambda_0 \vert \psi_0\rangle\langle \psi_0\vert + \cdots + \lambda_{n-1} \vert \psi_{n-1}\rangle\langle \psi_{n-1}\vert
$$

for some orthonormal basis $\{\vert\psi_0\rangle,\ldots,\vert\psi_{n-1}\rangle\}.$
It remains to verify that $(\lambda_0,\ldots,\lambda_{n-1})$ is a probability vector, which we can then rename to $(p_0,\ldots,p_{n-1})$ if we wish.

The numbers $\lambda_0,\ldots,\lambda_{n-1}$ are the eigenvalues of $\rho,$ and because $\rho$ is positive semidefinite, these numbers must therefore be nonnegative real numbers.
We can conclude that $\lambda_0 + \cdots + \lambda_{n-1} = 1$ from the fact that $\rho$ has trace equal to $1.$
Going through the details will give us an opportunity to point out the following important and very useful property of the trace.

<Figure title="Theorem">
  Cyclic property of the trace: For any two matrices $A$ and $B$ that give us a square matrix $AB$ by multiplying, the equality $\operatorname{Tr}(AB) = \operatorname{Tr}(BA)$ is true.
</Figure>

Note that this theorem works even if $A$ and $B$ are not themselves square matrices.
That is, we may have that $A$ is $n\times m$ and $B$ is $m\times n,$ for some choice of positive integers $n$ and $m,$ so that $AB$ is an $n\times n$ square matrix and $BA$ is $m\times m.$

In particular, if we let $A$ be a column vector $\vert\phi\rangle$ and let $B$ be the row vector $\langle \phi\vert,$ then we see that

$$
\operatorname{Tr}\bigl(\vert\phi\rangle\langle\phi\vert\bigr)
= \operatorname{Tr}\bigl(\langle\phi\vert\phi\rangle\bigr) = \langle\phi\vert\phi\rangle.
$$

The second equality follows from the fact that $\langle\phi\vert\phi\rangle$ is a scalar, which we can also think of as a $1\times 1$ matrix whose trace is its single entry.
Using this fact, we can conclude that $\lambda_0 + \cdots + \lambda_{n-1} = 1$ by the linearity of the trace function.

$$
\begin{gathered}
1 = \operatorname{Tr}(\rho) =
\operatorname{Tr}\bigl(\lambda_0 \vert \psi_0\rangle\langle \psi_0\vert + \cdots + \lambda_{n-1} \vert \psi_{n-1}\rangle\langle \psi_{n-1}\vert\bigr)\\[2mm]
= \lambda_0 \operatorname{Tr}\bigl(\vert \psi_0\rangle\langle \psi_0\vert\bigr) + \cdots + \lambda_{n-1} \operatorname{Tr}\bigl(\vert \psi_{n-1}\rangle\langle \psi_{n-1}\vert\bigr)
= \lambda_0 + \cdots + \lambda_{n-1}
\end{gathered}
$$

Alternatively, we can reach the same conclusion by using the fact that the trace of a square matrix (even one that isn't normal) is equal to the sum of its eigenvalues.

We have therefore concluded that any given density matrix $\rho$ can be expressed as a convex combination of pure states.
We also see that we can, moreover, take the pure states to be *orthogonal*.
This means, in particular, that we never need the number $n$ to be larger than the size of the classical state set of $\mathsf{X}.$

In general, it must be understood that there will be different ways to write a density matrix as a convex combination of pure states, not just the ways that the spectral theorem provides.
A previous example illustrates this.

$$
\frac{1}{2} \vert 0\rangle\langle 0 \vert + \frac{1}{2} \vert +\rangle\langle + \vert
= \begin{pmatrix}
\frac{3}{4} & \frac{1}{4}\\[2mm]
\frac{1}{4} & \frac{1}{4}
\end{pmatrix}
$$

This is not a spectral decomposition of this matrix because $\vert 0\rangle$ and $\vert + \rangle$ are not orthogonal.
Here's a spectral decomposition:

$$
\begin{pmatrix}
\frac{3}{4} & \frac{1}{4}\\[2mm]
\frac{1}{4} & \frac{1}{4}
\end{pmatrix}
= \cos^2(\pi/8) \vert \psi_{\pi/8} \rangle \langle \psi_{\pi/8}\vert +
\sin^2(\pi/8) \vert \psi_{5\pi/8} \rangle \langle \psi_{5\pi/8}\vert,
$$

where $\vert \psi_{\theta} \rangle = \cos(\theta)\vert 0\rangle + \sin(\theta)\vert 1\rangle.$
The eigenvalues are numbers that will likely look familiar:

$$
\cos^2(\pi/8) = \frac{2+\sqrt{2}}{4} \approx 0.85 \quad\text{and}\quad \sin^2(\pi/8) = \frac{2-\sqrt{2}}{4} \approx 0.15.
$$

The eigenvectors can be written explicitly like this.

$$
\begin{aligned}
  \vert\psi_{\pi/8}\rangle & =
  \frac{\sqrt{2 + \sqrt{2}}}{2}\vert 0\rangle + \frac{\sqrt{2 - \sqrt{2}}}{2}\vert 1\rangle \\[3mm]
  \vert\psi_{5\pi/8}\rangle & =
  -\frac{\sqrt{2 - \sqrt{2}}}{2}\vert 0\rangle + \frac{\sqrt{2 + \sqrt{2}}}{2}\vert 1\rangle
\end{aligned}
$$

As another, more general example, suppose $\vert \phi_0\rangle,\ldots,\vert \phi_{99} \rangle$ are quantum state vectors representing states of a single qubit, chosen arbitrarily — so we're not assuming any particular relationships among these vectors.
We could then consider the state we obtain by choosing one of these $100$ states uniformly at random:

$$
\rho = \frac{1}{100} \sum_{k = 0}^{99} \vert \phi_k\rangle\langle \phi_k \vert.
$$

Because we're talking about a qubit, the density matrix $\rho$ is $2\times 2,$ so by the spectral theorem we could alternatively write

$$
\rho = p \vert\psi_0\rangle\langle\psi_0\vert + (1 - p) \vert\psi_1\rangle\langle\psi_1\vert
$$

for some real number $p\in[0,1]$ and an orthonormal basis $\{\vert\psi_0\rangle,\vert\psi_1\rangle\}$ — but naturally the existence of this expression doesn't prohibit us from writing $\rho$ as an average of 100 pure states if we choose to do that.



© IBM Corp., 2017-2025