Quantum Journal

Thursday, February 8, 2024

Shor's algorithm

Shor's algorithm is a quantum algorithm for factoring in polynomial time. For sources, I am using this Qiskit tutorial and Wikipedia.

I'm mostly interested in the quantum part of the algorithm, so instead I will take as a starting point the following problem. Let $N$ be some modulus, and $a \in (\mathbb{Z}/N\mathbb{Z})^\times$ (i.e. $a, N$ must be coprime). We wish to compute the (multiplicative) order of $a$ in this group, i.e. the smallest $r > 0$ such that $a^r \equiv 1 \pmod{N}$.

First, let's forget the fact that we have to use qubits and pretend we can produce an $N$-ary quantum bit, e.g. when $N = 2^n$ this is just $n$ qubits. That is, suppose we have a basis for our quantum system:

$\lvert 0 \rangle = \lvert 0^n \rangle, \; \lvert 1 \rangle = \lvert 0^{n-1} 1 \rangle, \ldots, \; \lvert N-1 \rangle = \lvert 1^n \rangle.$
Implicitly, $\lvert m \rangle$ is a representative for $m \in \mathbb{Z}/N\mathbb{Z}$. We define an operator $U_a$ on this system
$U_a\lvert m \rangle = \lvert am \rangle.$
This is a permutation matrix, so it is unitary. In practice we need to think about how to implement the set-up above, i.e. to represent $\mathbb{Z}/N\mathbb{Z}$ using qubits and the operator $U_a$ using quantum gates. This is a completely classical algorithm, and it is known that classical algorithms can be implemented using quantum gates, so we will not address this directly.

Let $r$ be the period of $a$; by construction, $U_a^r = \mathbb{I}$, so all eigenvalues are $r$th roots of unity. For convenience, let $\omega = e^{2 \pi i / r}$. Some interesting eigenvectors and eigenvalues for this operator are for various $s \in \mathbb{Z}/r\mathbb{Z}$:

$v_s := \displaystyle \frac{1}{\sqrt{r}}\sum_{k=0}^{r-1} \omega^{-ks} \lvert a^k \rangle, \;\;\;\;\;\;\;\;\;\; \lambda_s = \omega^s.$
The renormalized sum, over $s = 0, \ldots, r-1$, of the above eigenvectors is $\lvert 1 \rangle$, i.e. we have an eigenvector decomposition
$\lvert 1 \rangle = \displaystyle \frac{1}{\sqrt{r}}(v_0 + v_1 + \cdots + v_{r-1}).$
We note in passing that the rest of the eigenvectors and eigenvalues are shifted from this one by various other elements of $\mathbb{Z}/N\mathbb{Z}$; we don't care about them.

Recall that phase estimation allowed us, given a unitary operator $U$ and an eigenvector, to extract its eigenvalue. We do not have an eigenvector. However, we do have $\lvert 1 \rangle$, which is a sum of eigenvectors with equal weights. Running phase estimation on this superposition amounts to doing so on one of the eigenvectors chosen randomly, say with eigenvalue $s/r$, and after iterating phase estimation our estimate will converge to one of these with high probability. In fact, this estimation allows us to pinpoint the precise fraction. First, we have a bound on $r$, i.e. $r \leq N-1$. Roughly, there are finitely many fractions of the form $s/r$ where $r \leq N-1$, so they should be distinguishable by intervals of some length, which turns out to be (inverse) quadratic in $N$. This is polynomial, but for computing runtime in $n$ we take the logarithm, so it is linear in $n$, which is good enough. The algorithm for computing the $s/r$ is the continued fractions algorithm.

Now, knowing one of the fractions $s/r$ only reveals $r$ when $\gcd(s, r) = 1$; otherwise it is some factor of $r$. We can keep track of the denominators appearing and take their least common multiple, and this will yield $r$ with probability exponentially approaching $1$.

Finally, it should be remarked that the algorithm works in $O(n^3)$ time. An accounting:

We need to implement phase estimation circuit. This involves constructing the multiplication operators $U_a^{2^i}$ for $i = 0, 1, \ldots, n-1$, which is $O(n)$ operations of $O(n^2)$ and the QFT gate which is order $O(n^2)$. In total, we have $O(n^3)$.
We need to do continued fractions, which is $O(n^3)$.

Quantum (and discrete) Fourier transform

First, I want to note there is a nice discussion with pictures at the Qiskit tutorials. I won't try to replicate the pictures.

Recall the basic idea behind Pontryagin and Fourier duality. Given some abelian group $G$, we have the dual group $\hat{G} = \text{Hom}(G, \mathbb{C}^\times)$, the group of characters, i.e. continuous homomorphisms to the multiplicative group of complex numbers. For expository clarity, let us assume that $G$ is compact and that $\hat{G} = \text{Hom}(G, S^1)$ is discrete; in our only case of interest, both will be both.

There is a tautological character pairing $\hat{G} \times G \rightarrow \mathbb{C}^\times$. For example, taking $G = \mathbb{Z} /N\mathbb{Z} $, we have $\hat{G} = \mu_N \subset \mathbb{C}^\times$, the group of $N$th roots of unity, and the pairing is

$\mu_N \times \mathbb{Z}/N\mathbb{Z} \ni (z, m) \mapsto z^m.$
Given a function $f: G \rightarrow \mathbb{C}$, we have its Fourier transform:

$\hat{f}: \hat{G} \rightarrow \mathbb{C}, \;\;\;\;\;\;\; \hat{f}(\xi) = \displaystyle\frac{1}{|G|}\int_{x \in G} \xi^{-1}(x) f(x) \; dx.$
and given a function $g: \hat{G} \rightarrow \mathbb{C}$, we have its inverse Fourier transform:

$\hat{g}: G \rightarrow \mathbb{C}, \;\;\;\;\;\;\; \hat{g}(x) = \displaystyle\sum_{\xi \in \hat{G}} \xi(x) f(x).$
We chose $G$ to be compact so there is no quesiton of convergence of the integral, and $G$ to be discrete so that the integral on $\hat{G}$ is a (possibly infinite) sum for which we ignore all convergence issues. Aside from this, the Fourier transform and its inverse are not entirely symmetric; they differ by an inversion in the character pairing.

Moreover, Fourier inversion says that (pretend that, say, $G$ is compact and $\hat{G}$ is discrete; for us, they will be finite thus both):

$\hat{\hat{f}} = f, \;\;\;\;\;\;\; f(x) = \displaystyle\sum_{\xi \in \hat{G}} \hat{f}(\xi) \xi(x).$
Now, let us consider the case where $G = \mu_N$ and $\hat{G} = \mathbb{Z}/N\mathbb{Z}$. We represent the characteristic function on $m \in \mathbb{Z}/N\mathbb{Z}$ by $\lvert m \rangle$ (we will see why later). A function $f$ on $G$ is then a linear combination, i.e. given that $f(m) = a_m \lvert m \rangle$, we write

$f = \displaystyle\sum_{m \in \mathbb{Z}/N\mathbb{Z}} a_m \lvert m \rangle.$
Letting $\theta \in \displaystyle\frac{1}{N} \mathbb{Z}$, the Fourier transform takes the characteristic function at $e^{2 \pi i \theta}$ on $G = \mu_N$ to

$\lvert \phi_\theta \rangle := \displaystyle\frac{1}{N} \sum_{m \in \mathbb{Z}/N\mathbb{Z}} e^{2 \pi i \theta m} \lvert m \rangle.$
Note that in our convention, we will normalize using $1/\sqrt{N}$ rather than $\frac{1}{N}$. In this way, the inverse transform also picks up a normalization of $1/\sqrt{N}$, and both are unitary. This matrix performs a base change between the standard basis of characteristic functions on $\hat{G} = \mathbb{Z}/n\mathbb{Z}$ and the basis coming from applying the Fourier transform to charactersitic functions on $G = \mu_N$. We let $QFT_N$ denote the unitary transformation such that

$QFT_N \lvert \phi_{m/N} \rangle = \lvert m \rangle.$
But by Fourier inversion, we know that it is

$QFT_N = \displaystyle\frac{1}{\sqrt{N}}\sum_{x, y \in \mathbb{Z}/N\mathbb{Z}} e^{2 \pi ixy/N} \lvert x \rangle \langle y \rvert.$
Note that this matrix is symmetric, not conjugate symmetric. This is due to the difference between Fourier transform and inverse Fourier transform, i.e. the inversion in the character pairing. Its inverse is then the conjugate transpose

$QFT_N^{-1} = \displaystyle\frac{1}{\sqrt{N}}\sum_{x, y \in \mathbb{Z}/N\mathbb{Z}} e^{-2 \pi ixy/N} \lvert y \rangle \langle x \rvert.$

Now, recall we were trying to do phase estimation given a unitary operator $U$ and eigenvector $\lvert \psi \rangle$ and eigenvalue $\lambda = e^{2 \pi i \theta}$. The rightmost measurement on

$(\mathbb{I} \otimes H) \circ CU \circ (\mathbb{I} \otimes H)\lvert \psi 0 \rangle$
will, with certainty, measure $0$ if $\theta = 0$ and $1$ if $\theta = 1/2$. Otherwise, we get 0 or 1 according to some probability distribution. We can iterate this:

$(\mathbb{I} \otimes H) \circ CU^2 \circ (\mathbb{I} \otimes H)\lvert \psi 0 \rangle$
which will measure $0$ if $\theta = 0, 1/2$ (i.e. $4\theta \cong 0 \pmod{2}$) and $1$ if $\theta = 1/4, 3/4$ (i.e. $4\theta \cong 1 \pmod{2}$). More generally,

$(\mathbb{I} \otimes H) \circ CU^{n-1} \circ (\mathbb{I} \otimes H)\lvert \psi 0 \rangle$
will measure $0$ if $\theta = x/2^n$ for even $m$ (i.e. $2^k\theta \cong 0 \pmod{2}$) and $1$ if $\theta = x/2^k$ for odd $x$ (i.e. $2^n\theta \cong 1 \pmod{2}$).

Now, we want to put these together. Theoretically, doing the above should be able to identify with certainty eigenvalues of the form $e^{2 \pi i \theta}$ with $\theta = x/2^n$ for any $x \in \mathbb{Z}$. To begin with we follow our nose: instead of a single register on the right, we will have $n$ registers. We let $C_kU$ mean to apply the controlled $U$ gate with the control on the $k$th register from the right. We take the $n$ rightmost measurements on:

$(\mathbb{I} \otimes H^{\otimes n}) \circ (C_{n-1}U)^{2^{n-1}} \circ \cdots\circ (C_2U)^{2^1} \circ (C_1U)^{2^0} \circ (\mathbb{I} \otimes H^{\otimes n})\lvert \psi 0^n \rangle.$
Since $\psi$ is an eigenvector, this state is separable into the left part and right part, and we focus only on the right part. A calculation shows that, writing $x/2^n = 0.a_1a_2\ldots a_{n}$ and letting $\lvert x \rangle = \lvert a_n \ldots a_2 a_1 \rangle$ represent the corresponding state, we have that the resulting state is:

$\displaystyle\frac{1}{2^n}\sum_{x=0}^{2^n - 1} e^{2 \pi i \theta x} \lvert x \rangle.$
As discussed earlier, this is precisely the Fourier transform applied to the characteristic function on $e^{2 \pi i \theta}$, given that this is a $2^n$th root of unity. To recover $\theta$, one merely has to apply the inverse Fourier transform as defined above. So, assuming we can build it, the circuit becomes

$(\mathbb{I} \otimes QFT_{2^n}^{-1})\circ (\mathbb{I} \otimes H^{\otimes n}) \circ (C_nU)^n \circ \cdots\circ (C_2U)^2 \circ C_1U \circ (\mathbb{I} \otimes H^{\otimes n})\lvert \psi 0^n \rangle$
with the final state being

$QFT^{-1}_{2^n} \displaystyle\frac{1}{\sqrt{2^n}}\sum_{x=0}^{2^n - 1} e^{2 \pi i \theta x} \lvert x \rangle = \displaystyle \frac{1}{ \sqrt{2^n}\sqrt{2^n}} \sum_{x,t,y=0}^{2^n - 1} e^{-2 \pi i y t/2^n} \lvert y \rangle \langle t \rvert e^{2 \pi i \theta x} \lvert x \rangle = \displaystyle \frac{1}{2^n} \sum_{x,y=0}^{2^n - 1} e^{2 \pi i x (\theta - y/2^n)} \lvert y \rangle.$
Adding it and using standard formulas for geometric series, the coefficient to $\lvert y \rangle$ is:

$\displaystyle \frac{1 - e^{2^n 2 \pi i (\theta - y^/2^n)}}{1 - e^{2 \pi i (\theta - y/2^n)}} = \frac{1 - e^{2 \pi i (2^n\theta - y^/2^n)}}{1 - e^{2 \pi i (\theta - y/2^n)}}.$
giving the final formula

$\displaystyle \frac{1}{2^n} \sum_{y=0}^{2^n - 1} \frac{1 - e^{2 \pi i (2^n\theta - y^/2^n)}}{1 - e^{2 \pi i (\theta - y/2^n)}} \lvert y \rangle.$
Now, if $\theta \in \displaystyle \frac{1}{2^n}\mathbb{Z}$, then a measurement on the $n$ rightmost qubits will reveal the corresponding numerator with certainty. This leaves two remaining questions: what happens if the denominator is not a power of 2, and how to implement this.

Before answering those questions, we also need to discuss the case where the input $\lvert \psi\rangle$ is in fact not an eigenvector for $U$, but a sum of eigenvectors. In this case, the resulting quantum state from applying the above algorithm is no longer separable but entangled. To see this, assume that the $\theta_j$ is of the form $m_j/2^n$ as above. Suppose that
$\lvert \psi \rangle = \displaystyle\sum_{j=1}^\ell a_j \lvert \psi_j \rangle$
where $\lvert \psi_j \rangle$ are (unit) eigenvectors with eigenvalue $\lambda_j = e^{2 \pi i \theta_j}$, and $\sum |a_j|^2 = 1$. By linearity, running phase estimation on $\lvert \psi \rangle$ will return $m_j$ with probability $||a_j||$. Note it's possible that the $m_j$ coincide, in which case the probabilities add.

Intuitively, if all the eigenvalues are distinct and we measure $m$ by running the algorithm above, this means this measurement should project $\lvert \psi \rangle$ to $\lvert \psi_i \rangle$ where $m_i = m$. Mathematically, applying the above algorithm to $\lvert \psi \rangle$ gives
$\displaystyle\sum_{j=1}^\ell a_j \lvert \psi_j \rangle \lvert m_j \rangle$.
After measurement, we see that the only $\lvert \psi_j \rangle$ appearing are those where $m_j = m$, more or less by definition, i.e.
$\displaystyle\frac{1}{\sum_{m_j = m} ||a_j||} \displaystyle\sum_{m_j = m} a_j \lvert \psi_j \rangle \lvert m \rangle$
In particular, if we ran the algorithm again on the output, we would find the same eigenvalue $\lambda_i$ again.

Let's address the first question of what happens when $\theta \not\in \displaystyle \frac{1}{2^n} \mathbb{Z}$. I'll spoil the answer. Suppose we take a measurement and get a fraction $m/2^n$.

The probability that $m/2^n$ is the fraction in $\displaystyle\frac{1}{2^n}\mathbb{Z}$ which is closest (or tied to being closest) to $\theta$ is at least $4/\pi^2$, i.e. at least 40%.
The probability that $m/2^n$ is not the closest to $\theta$ is false is at most $1/4$, i.e. 25%.

Since these bounds are constant in $n$, this means that for any probability cutoff for having found the closest $m/2^n$ to $\theta$, the number of iterations we need to take to get within the cutoff is constant.

Let us briefly derive this. Recall that the coefficient to $\lvert y \rangle$ is:

$\displaystyle \frac{1}{2^n} \displaystyle \frac{1 - e^{2^n 2 \pi i (\theta - y^/2^n)}}{1 - e^{2 \pi i (\theta - y/2^n)}} = \displaystyle \frac{1}{2^n} \frac{1 - e^{2 \pi i (2^n\theta - y^/2^n)}}{1 - e^{2 \pi i (\theta - y/2^n)}}.$
We can visualize a circle with $2^n$ tick marks, one of which corresponds to $y/2^n$, while $\theta$ is some arbitrary point. The numerator says to take the points modulo the tick marks, then blow up the segment between ticks to take the whole circle, and then take the distance between them. The denominator says to simply take the distance between them. Using the fact that the cord length is at least the arc length, we can deduce the above inequalities.

Now, we return to the previous question of what happens when $\lvert \psi \rangle$ is not an eigenvector, but now we do not assume that $r = 2^n$. Suppose $\lambda_j = e^{2 \pi i \theta_j}$. By linearity, the state is

$\displaystyle\frac{1}{2^n}\sum_{y=0}^{2^n-1} \displaystyle\sum_{j=1}^\ell \displaystyle a_j \frac{1 - e^{2 \pi i (2^n\theta_j - y/2^n)}}{1 - e^{2 \pi i (\theta_j - y/2^n)}}\lvert \psi_j \rangle \lvert y \rangle .$
Unlike before, if we measure $m$ in the right registers, this doesn't cause all terms involving $\lvert \psi_j \rangle$ for $m_j \ne m$ to vanish. We still have contributions from all $\vert \psi_j \rangle$:

$\displaystyle\sum_{j=1}^\ell \displaystyle a_j \frac{1 - e^{2 \pi i (2^n\theta_j - m/2^n)}}{1 - e^{2 \pi i (\theta_j - m/2^n)}}\lvert \psi_j \rangle \lvert m \rangle .$
In the above formula, we did not renormalize. Roughly speaking, once we do, what happens is that the $\theta_j$ near $m$ gain weight, while those further from it lose weight. In particular, with enough iterations, the vector $\lvert \psi \rangle$ begins to converges exponentially to an eigenvector in the eigenspace for $\lambda = e^{2 \pi i m/2^n}$ with probability exponentially approaching $1$.

Now, to implement it. We ignore all normalizations. We denote our input as $\lvert a_{n-1} \cdots a_0 \rangle = \lvert m \rangle \lvert a_0 \rangle = \lvert 2m + a_0 \rangle$. Say we've already implemented the gate $QFT_{2^{n-1}}$. Then, applying it to $\lvert m \rangle$, we have:

$(QFT_{2^{n-1}} \otimes \mathbb{I}) \lvert m \rangle \lvert a_0 \rangle = \displaystyle\sum_{x=0}^{2^{n-1} -1} e^{2 \pi i \cdot m \cdot (x/2^{n-1})} \lvert x \rangle \lvert a_0 \rangle = \displaystyle\sum_{x=0}^{2^{n-1} -1} e^{2 \pi i \cdot 2m \cdot (x/2^{n})} \lvert 2x + a_0 \rangle$
In the next step, we want to tweak $m$ so that it becomes $2m + a_0$. To do this, for $x = 2^k$, i.e. the $k+1$th register, we need to add the phase $(x/2^n)a_0$. That is, we apply a controlled phase gate, with the rightmost register as the control, with phase $2 \pi i \cdot (2^{k-1}/2^n)$ to the $k$th register:

$\displaystyle\sum_{x=0}^{2^{n-1} -1} e^{2 \pi i \cdot (2m + a_0) \cdot (x/2^{n})} \lvert 2x + a_0 \rangle.$
We next apply the QFT to the remaining register, i.e. the Hadamard gate to the leftmost register:

$\displaystyle\sum_{x=0}^{2^{n-1} -1} e^{2 \pi i \cdot (2m + a_0) \cdot (x/2^{n})} (\lvert 2x \rangle + (-1)^{a_0} \lvert 2x + 1 \rangle) = \displaystyle\sum_{x=0}^{2^{n-1} -1} \sum_{y = 0}^1 e^{2 \pi i \cdot (2m + a_0) \cdot (x/2^{n}) + \pi i a_0 y} \lvert 2x + y\rangle.$
Intuitvely, this is applying a Fourier transform of degree $2$ to the rightmost bit. There are two ways to interpret $\lvert a_{n-1} \ldots a_0 \rangle$:

the integer $a = a_0 + 2a_1 + \cdots + 2^{n-1} a_{n-1}$, i.e. as the binary expression $a_{n-1} \cdots a_0$,
the root of unity $e^{2 \pi i \theta}$ where $\theta = a/2^n$, i.e. as the binary expression $0.a_{n-1} \cdots a_0$.

In particular, it is not the rightmost bit that should be $2$th root of unity but the leftmost bit. So we rewrite

$= \displaystyle\sum_{x=0}^{2^{n-1} -1} \sum_{y = 0}^1 e^{2 \pi i \cdot (2m + a_0) \cdot (x + 2^{n-1} y)/2^{n}} \lvert 2x + y\rangle.$
and rotate it there using swap gates:

$\displaystyle\sum_{x=0}^{2^{n-1} -1} \sum_{y = 0}^1 e^{2 \pi i \cdot (2m + a_0) \cdot (x + 2^{n-1} y)/2^{n}} \lvert 2^{n-1} y + x\rangle = \displaystyle\sum_{z=0}^{2^n-1} e^{2 \pi i \cdot (2m+a_0) z} \lvert z \rangle.$
This is the desired formula.

Tuesday, February 6, 2024

Some standard quantum gates

There appear to be some quantum states and gates that are more or less standard. I'm not sure what the theoretical or physical reason for this is, but I'll fill this post out as I learn more.

Plus and minus states for a qubit: $\lvert + \rangle = \displaystyle\frac{1}{\sqrt{2}}\left( \lvert 0 \rangle + \lvert 1 \rangle \right), \;\;\;\;\;\;\;\;\;\; \lvert-\rangle = \displaystyle\frac{1}{\sqrt{2}}\left( \lvert 0 \rangle - \lvert 1 \rangle \right).$
Bell states/basis on two qubits: $\lvert \phi^+ \rangle = \displaystyle\frac{1}{\sqrt{2}}\left( \lvert 00 \rangle + \lvert 11 \rangle \right)$
$\lvert \phi^- \rangle = \displaystyle\frac{1}{\sqrt{2}}\left( \lvert 00 \rangle - \lvert 11 \rangle \right)$
$\lvert \psi^+ \rangle = \displaystyle\frac{1}{\sqrt{2}}\left( \lvert 01 \rangle + \lvert 10 \rangle \right)$
$\lvert \psi^- \rangle = \displaystyle\frac{1}{\sqrt{2}}\left( \lvert 01 \rangle - \lvert 10 \rangle \right)$
These are essentially variants of entangled states. The Bell state $\lvert \phi^+ \rangle$ can be created from the qubit $\lvert 00 \rangle$ by applying the operator $XC \circ \mathbb{I}H$ (where $XC$ is the controlled NOT, with control on the right factor). The other bell states may be obtained by applying $X$ and $Z$ to this one.
The Hamamard gate, which acts on a single qubit $H = \displaystyle\frac{1}{\sqrt{2}}\begin{pmatrix}1 &1\\1&-1\end{pmatrix}$ which transforms a qubit between the standard basis and the plus/minus basis. Note that $H^2 = \mathbb{I}$.
Phase operations: $P_\theta = \begin{pmatrix}1&0\\0&e^{i\theta}\end{pmatrix}, \;\;\;\;\; S = P_{\pi/2} = \begin{pmatrix}1&0\\0&i\end{pmatrix}, \;\;\;\;\; T = P_{\pi/4} = \begin{pmatrix}1&0\\0&(1+i)/\sqrt{2}\end{pmatrix}$
Pauli operations, which act on a single qubit $\mathbb{I} = \begin{pmatrix} 1&0\\0&1\end{pmatrix}, \;\;\;\;\;X = \begin{pmatrix} 0&1\\1&0\end{pmatrix}, \;\;\;\;\;Y = \begin{pmatrix} 0&-i\\i&0\end{pmatrix}, \;\;\;\;\;Z = \begin{pmatrix} 1&0\\0&-1\end{pmatrix}.$ The operation $X$ is also called the NOT operation, and $Z = P_{\pi}$ is a phase operation. These have the relations $X^2 = Y^2 = Z^2 = \mathbb{I},$
$XY = iZ, YZ = iX, ZX = iY,$
$XZ = -iY, ZY = -iX, YX = -iZ.$
Rotations in the Bloch sphere. The Bloch sphere is the space of states for a qubit, modulo global phase, i.e. $S^3/S^1 \simeq S^2$. In spherical coordinates $\theta \in [0, \pi], \psi \in [0, 2\pi)$: $\cos(\theta/2) \lvert 0 \rangle + e^{i \psi} \sin(\theta/2) \lvert 1 \rangle.$ That is, given a qubit $\alpha \lvert 0 \rangle + \beta \lvert 1 \rangle$, we can use the global phase change to make $\alpha$ a real positive number in $[0, 1]$, which we parameterize by $\cos(\theta/2)$. Then, $\beta$ has magnitude $\sin(\theta/2)$ and arbitrary phase. We can also embed the Bloch sphere in $\mathbb{R}^3$ using the usual spherical coordinates: $(\theta, \psi) \mapsto (\sin \theta \cos \phi, \sin\theta \sin \phi, \cos \theta).$ The rotation about the three axes then can be interpreted as an action on qubits, giving rise to the rotation gates.
- $R_x(\theta) = \begin{pmatrix} \cos(\theta/2) & -i \sin(\theta/2) \\ -i \sin(\theta/2) & \cos(\theta/2) \end{pmatrix}$
- $R_y(\theta) = \begin{pmatrix} \cos(\theta/2) & -\sin(\theta/2) \\ \sin(\theta/2) & \cos(\theta/2) \end{pmatrix}$
- $R_z(\theta) = \begin{pmatrix} e^{-\theta/2} & 0 \\ 0 & e^{\theta/2} \end{pmatrix}$
We also have $X = iR_x(\pi), Y = iR_y(\pi), Z = iR_z(\pi),$
$R_z(\theta) \sim P_\theta.$
The SWAP gate, which swaps tensor factors: $\text{SWAP}\lvert x \rangle \lvert y \rangle = \lvert y \rangle \lvert x \rangle$
Controlled operations: given a unitary operation $U$ on any quantum system $R$, the associated controlled operation $CU$ acts on $Q \otimes R$ where $Q$ is a single qubit by $CU = \lvert 0 \rangle \langle 0 \rvert \otimes \mathbb{I}_R + \lvert 1 \rangle \langle 1 \rvert \otimes U,$
i.e. the qubit $Q$ controls whether or not (or with what probability) the operation $U$ is performed.
- The controlled NOT operation $CX$ sends the standard basis to the Bell basis (with a slight sign difference): $CU\lvert 00 \rangle = \lvert \phi^+\rangle, \;\;\;\;\; CU\lvert 01 \rangle = \lvert \phi^- \rangle, \;\;\;\;\; CU\lvert 10 \rangle = \lvert \psi^+ \rangle, \;\;\;\;\; CU\lvert 11 \rangle = -\lvert \psi^- \rangle$

Monday, February 5, 2024

Quantum mechanics vs. probability

A few similarities and differences between quantum systems and probabilistic systems.

A probability distribution on a set $X$ is a function $\rho: X \rightarrow \mathbb{R}$ such that $\int_X \rho(x) \; dx = 1$. A quantum state in a set $X$ is a norm 1 element of the free vector space $\mathbb{C}\{X\}$. An operator on a probability distribution is a stochastic matrix, i.e. a matrix of non-negative real entries whose columns sum to 1. An operator on a quantum system is a unitary matrix.
Given two probability distributions on $X$ and $Y$, one can form the product $\rho_X \times \rho_Y: X \times Y \rightarrow \mathbb{R}$, which has $\int_{X \times Y} \rho_X \rho_Y = 1$ by Fubini's theorem. Given two quantum states $v \in V$ and $w \in W$, one can form $v \otimes w \in V \otimes W$, which is also a unit vector.
Given a probability distribution on a product $X \times Y$, one can project the distribution to either $X$ or $Y$ by integration. However, given a tensor product of vector spaces $V \otimes W$, there is no way to project to $V$ or $W$.
There is a notion of conditional probability, corresponding to the notion of partial measurement of a quantum state. Note that we have to renormalize the vector to be a unit vector in this case.
The property of independence of distributions corresponds to the property of a quantum state being separable, i.e. a rank one tensor. The notion of having non-zero correlation corresponds to entanglement.
Stochastic matrices often have no stochastic square root, i.e. one can not take half steps. The square roots of unitary matrices are always unitary.
Stochasitc matrices can never turn uncertainty into certainty, i.e. there is no stochastic matrix $A$ such that $A \begin{pmatrix} 1/2 \\ 1/2 \end{pmatrix} = \begin{pmatrix} 1 \\ 0 \end{pmatrix}$. However, unitary matrices act transitively on the unit vectors, e.g. the Hadamard operator $H = \begin{pmatrix} 1/\sqrt{2} & 1/\sqrt{2} \\ 1/\sqrt{2} & -1/\sqrt{2} \end{pmatrix}$ has $H \begin{pmatrix} 1/\sqrt{2} \\ 1/\sqrt{2} \end{pmatrix} = \begin{pmatrix} 1 \\ 0 \end{pmatrix}$.
We can measure the the probability of some event $E \subset X$ by taking $P(E) = \int_E \rho(x) \; dx$. For quantum systems, measurements on a complex inner product space $V$ correspond to Hermitian projections (i.e. self-adjoint idempotents).
- More precisely, a measurement should be viewed as a "complete" collection of such self-adjoint idempotents that sum to the identity. This is mostly semantic and refers to the fact that we usually do not simply ask if a coin flip is heads, we ask if it is heads or tails. Of course, any non-complete collection of "orthogonal" projections can be completed by adding the projection to the orthogonal complement of the sums of the images.

Tuesday, September 27, 2022

Phase Kickback

Phase kickback appears to refer to a technique in quantum computing where one alters one factor of a rank 1 tensor in a way dependent on the other factor $(\sum v) \otimes w \mapsto \sum v \otimes T_v(w)$ and then, because the $w$ are eigenvectors for the $T_v$, one can rewrite the result as a rank 1 tensor again $\sum v \otimes T_v(w) = (\sum \lambda_v v) \otimes w$. In some sense, the right tensor factor is "auxilliary" and the effect is to attach coefficients we want to the left tensor factor. We prefer rank 1 tensors because we can then perform measurements separately on the factors. Let's give a few examples, from the Qiskit tutorial.

Bernstein-Vazirani Algorithm: Let's first discuss the mathematics. Consider a function $f: \{0, 1\}^n \rightarrow \{0, 1\}$ which is of the form $f(v) = s \cdot v$ for some fixed $s \in \{0, 1\}^n$. We want to determine $s$. Let $V = \mathrm{Fun}(\{0, 1\}, \mathbb{C})$ be the state space with characteristic functions $e_0, e_1$. For $v \in \{0, 1\}^n$, we let $e_v \in V^{\otimes n}$ denote the corresponding basis vector. Recall the Hadamard gate $H = \displaystyle\frac{1}{\sqrt{2}}\begin{pmatrix} 1 & 1 \\ 1 & - 1 \end{pmatrix}$, whose higher tensors act by: $H^{\otimes n} e_v = \displaystyle\sum_{w \in \{0, 1\}^n} (-1)^{w \cdot v} e_w, \;\;\;\;\;\;\;\;\;\; H^{\otimes n} e_0 = \displaystyle\sum_{w \in \{0, 1\}^n} e_w$.
There is a standard way to attach a unitary (in fact, permutation) operator $U_f$, acting on $n+m$ qubits, to a map $f: \{0, 1\}^n \rightarrow \{0, 1\}^m$: $U_f(e_v \otimes e_i) = e_v \otimes e_{i + f(v)}.$ Here, $m=1$. Let us quickly observe its eigenvectors: they are $e_x \otimes (e_0 + e_1)$ with eigenvalue $1$,
$e_x \otimes (e_0 - e_1)$ with eigenvalue $(-1)^{f(x)}$. This is important. One notes that the right tensors are the plus/minus basis, thus we might expect to do a change of basis using the Hadamard operator.

Using the above description of eigenvectors, we have the following: $(U_f \circ H^{\otimes n+1})(e_0^{\otimes n} \otimes e_1) =\displaystyle\frac{1}{\sqrt{2^{n+1}}} U_f( \sum_{x \in \{0, 1\}^n} e_x \otimes (e_0 - e_1))$
$ = \displaystyle\frac{1}{\sqrt{2^{n+1}}}\displaystyle \sum_{x \in \{0, 1\}^n} (-1)^{f(x)} e_x \otimes (e_{0} - e_{1}).$ This is a rank 1 tensor, which essentially happens because $H(e_1)$ is an eigenvector of $U_f$. Thus, the tensor factors are independent, so they can be dealt with separately. Applying $H^{\otimes n+1}$, we obtain $H^{\otimes n+1} \left(\displaystyle\frac{1}{\sqrt{2^{n+1}}}\displaystyle \sum_{x \in \{0, 1\}^n} (-1)^{f(x)} e_x \otimes (e_{0} - e_{1}) \right) = e_s \otimes e_1$ and, measuring the first $n$ registers, we are done! Note that we did not need to apply $H$ to the last tensor factor, so the quantum circuit we want to apply is: $(H^{\otimes n} \otimes \mathbb{I}) \circ U_f \circ H^{\otimes n+1} \vert 0 \cdots 0 1\rangle$ and then we measure the left $n$ registers, discarding the rightmost register. As a practical matter, we need to be able to implement all these gates. Maybe we believe $H$ can be implemented since it is so standard, but it is less clear for $U_f$.

Deutsch-Jozsa Algorithm: This is the same problem as above, except the function $f$ is either constant or balanced (half inputs take value 0, half take 1), and we want to determine which it is. The procedure is the same, except at the very end, where when we apply $H^\otimes$ to the eigenvectors we get $\displaystyle\frac{1}{\sqrt{2^{n+1}}}\displaystyle \sum_{x \in \{0, 1\}^n} (-1)^{f(x)} e_x \otimes (e_{0} - e_{1}).$ Let us ignore the right tensor factor (and renormalize). Applying $H^{\otimes n}$ to this expression we get $\displaystyle\frac{1}{\sqrt{2^{2n}}}\displaystyle \sum_{x \in \{0, 1\}^n} (-1)^{f(x)} \sum_{y \in \{0,1\}^n} (-1)^{x \cdot y} e_y = \displaystyle\frac{1}{\sqrt{2^{2n}}}\displaystyle \sum_{x, y \in \{0, 1\}^n} (-1)^{f(x) + y \cdot x} e_y.$ Now, taking $y = 0$, we note that the coefficient $\pm 1$ is the function is constant, and $0$ if the function is balanced. So, if we take a measurement and get $\langle 0\cdots 0 \vert$, then the function is constant, and if we get anything else, then the function is not constant, so in this case, balanced.

In summary, we apply the same circuit as above: $(H^{\otimes n} \otimes \mathbb{I}) \circ U_f \circ H^{\otimes n+1} \vert 0 \cdots 0 1\rangle$ and also measure the rightmost $n$ registers. Before, under the assumption that $f(x) = s \cdot x$, these registers recovered $s$. Now, under the assumption that $f$ is either balanced or constant, if these registers are measured to be $0^n$ then the function is constant, and otherwise it is balanced. A few remarks:

The only function which is both of the form $f(x) = s \cdot x$ and constant is the zero function, in which case $s = 0^n$. This is in agreement with both problems.
A function of the form $f(x) = s \cdot x$ is balanced if and only if $s \ne 0^n$. This is in agreement with both problems.

Phase Estimation: Now, suppose we have a unitary operator $U$ on $n$ qubits. We have some state and we observe it, so it becomes some eigenvector $\lvert psi \rangle$ of $U$. The question is, can we extract the eigenvalue $\lambda = e^{2\pi i \theta}$, or at least estimate it?

The idea is to apply the following unitary operator, letting $CU$ be the controlled unitary operator with the control by convention to the right: $(\mathbb{I}^{\otimes n} \otimes H) \circ CU \circ (\mathbb{I}^{\otimes n} \otimes H)$. We apply this to the state $\langle \psi 0 \rangle$ and get, first ignoring renormalization: $\lvert \psi 0 \rangle \mapsto \lvert \psi 0 \rangle + \lvert \psi 1 \rangle \mapsto \lvert \psi 0 \rangle + \lambda \lvert \psi 1 \rangle \mapsto (1 + \lambda) \lvert \psi 0 \rangle + (1 - \lambda) \lvert \psi 1 \rangle.$ Renormalizing, we have $\displaystyle\frac{1}{2} \lvert \psi \rangle \left((1 + \lambda) \lvert 0 \rangle + (1 - \lambda)\lvert 1 \rangle\right)$ Now, recalling that $\lambda = e^{2 \pi i \theta}$, the probability that we measure 0 and 1 for the rightmost qubit are respectively: $p_0 := \displaystyle\frac{1}{2}||1 + e^{2 \pi i \theta}|| = \displaystyle\frac{2 + e^{2 \pi i \theta} + e^{-2 \pi i \theta}}{2} = 1 + \cos(2\pi \theta) = \cos(\pi \theta)^2,$ $p_1 :=\displaystyle\frac{1}{2}||1- e^{2 \pi i \theta}|| = \displaystyle\frac{2 - e^{2 \pi i \theta} - e^{-2 \pi i \theta}}{2} = 1 - \cos(2\pi \theta) = \sin(\pi \theta)^2.$ Now, assuming we can measure these probabilities perfectly (which we can't, but we can get probably get close if we do this enough times) allows us to determine $|\theta|$, i.e. we cannot distinguish $\theta$ and $1 - \theta$. In practice, I don't think people do this, because it requires running the algorithm a bunch of times to get within an error, and one cannot extract algebraic information from this. However, we note that if $\theta = 0, 1/2$, we can extract precise algebraic information. This will be the subject of a future post on the quantum Fourier transform.

Monday, September 26, 2022

Why does quantum mechanics involve complex numbers?

First, I am using this helpful expository paper by Ricardo Karam. He gives four reasons, labeled (A) through (D). I want to discuss his third reason, (C), based on the Stern-Gerlach experiment, and my understanding of it.

First, there is an experiment which does the following; I'll be as vague as possible to avoid getting details wrong. A particle is sent through various boxes which observe its spin in the $x$, $y$ and $z$ directions, with the "equal filtering probability" property, i.e. that if a pure state for one observable is then observed by any other, it has equal probability of $1/2$ of being positive or negative spin.

Let's introduce some notation. First, the eigenvectors for the $z$-observation $S_z$ are $\lvert + \rangle$ and $\lvert - \rangle$, and we take this to be our standard basis. After performing a $x$-observation $S_x$ (or a $y$-observation $S_y$), we obtain the states $\lvert + \rangle_x$ and $\lvert - \rangle_x$ (respectively $\lvert + \rangle_y, \lvert - \rangle_y$). The matrices for the observables in this standard basis are
$A_z = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}, \;\;\; A_x = \begin{pmatrix} \vert & \vert \\ \lvert + \rangle_x & \lvert - \rangle_x \\ \vert & \vert \end{pmatrix}, \;\;\; A_y = \begin{pmatrix} \vert & \vert \\ \lvert + \rangle_y & \lvert - \rangle_y \\ \vert & \vert \end{pmatrix}$.
For convenience, we record that a solution (up to various symmetries) for the problem is: $A_z = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}, \;\;\; A_x = \frac{1}{\sqrt{2}}\begin{pmatrix} 1 & 1 \\ 1 & -1 \end{pmatrix}, \;\;\; A_y = \frac{1}{\sqrt{2}}\begin{pmatrix} 1 & 1 \\ i & -i \end{pmatrix}$. We observe that these matrices are unitary.

Because I'm not too familiar with bra-ket notation, I need to make a digression. Vectors of the form $\langle \bullet \rvert$ are covectors, e.g. ${}_x\langle + \rvert$ is the first column of the conjugate transpose $A_x^*$. What is the meaning of these covectors? By definition, to pair with $\langle + \vert$ and $\langle - \vert$ gives the proportion of particles which have positive or negative $z$-spin. We generalize: ${}_y\langle +\vert - \rangle_x$ is a number whose square magnitude is the probability that a particle which has been observed to have negative $x$-spin will then be observed to have positive $y$-spin. Note that we compute the quantity ${}_y\langle +\vert - \rangle_x$ by taking the corresponding entry of the matrix $A_y^* A_x$.

Why square magnitude and not magnitude? All our matrices will have the property that $A^* A = I$, i.e. they are unitary, because repeating an observation should result in the same thing. The columns of unitary matrices are have norm one, which satisfy the property that the sum of the squares of the magnitudes equals one. Thus the correct quantity to interpret as a probability (i.e. the thing that sums to one) are the squares.

Anyway, the point is, the "equal probability" property means that any pair of eigenvectors (from different observables) should pair to value $1/2$ (and the fact that repeated measurements give the same value means for a single observable the eigenvalues must be orthogonal). If they were all real eigenvectors, this means that their squared dot product is $1/2$, i.e. the angle between the vectors is $45^\circ$. But this is impossible to do with three pairs of eigenvectors in $\mathbb{R}^2$, so the eigenvectors must be complex. Moreover, the matrices have complex entries with respect to the standard basis which we have fixed.

Finally, it's worth constrasting this situation with polarizing filters; I saw an argument on Twitter at some point about whether polarization is a quantum phenomenon or not. I'm a bit confused by this. On the one hand, one can clearly model light polarization in terms of projections, but it appears there is both a quantum and a classical model for polarization of photons. I want to say this is due to the fact that photon polarization only involves two observables, but I haven't really checked this.

Sunday, September 25, 2022

Introduction

This is a learning blog for quantum computing from a mathematical perspective. Anything written here has a high likelihood of being completely wrong. Use at your own risk.