Thursday, February 8, 2024

Quantum (and discrete) Fourier transform

First, I want to note there is a nice discussion with pictures at the Qiskit tutorials. I won't try to replicate the pictures.
Recall the basic idea behind Pontryagin and Fourier duality. Given some abelian group $G$, we have the dual group $\hat{G} = \text{Hom}(G, \mathbb{C}^\times)$, the group of characters, i.e. continuous homomorphisms to the multiplicative group of complex numbers. For expository clarity, let us assume that $G$ is compact and that $\hat{G} = \text{Hom}(G, S^1)$ is discrete; in our only case of interest, both will be both.

There is a tautological character pairing $\hat{G} \times G \rightarrow \mathbb{C}^\times$. For example, taking $G = \mathbb{Z} /N\mathbb{Z} $, we have $\hat{G} = \mu_N \subset \mathbb{C}^\times$, the group of $N$th roots of unity, and the pairing is

$\mu_N \times \mathbb{Z}/N\mathbb{Z} \ni (z, m) \mapsto z^m.$

Given a function $f: G \rightarrow \mathbb{C}$, we have its Fourier transform:

$\hat{f}: \hat{G} \rightarrow \mathbb{C}, \;\;\;\;\;\;\; \hat{f}(\xi) = \displaystyle\frac{1}{|G|}\int_{x \in G} \xi^{-1}(x) f(x) \; dx.$

and given a function $g: \hat{G} \rightarrow \mathbb{C}$, we have its inverse Fourier transform:

$\hat{g}: G \rightarrow \mathbb{C}, \;\;\;\;\;\;\; \hat{g}(x) = \displaystyle\sum_{\xi \in \hat{G}} \xi(x) f(x).$

We chose $G$ to be compact so there is no quesiton of convergence of the integral, and $G$ to be discrete so that the integral on $\hat{G}$ is a (possibly infinite) sum for which we ignore all convergence issues. Aside from this, the Fourier transform and its inverse are not entirely symmetric; they differ by an inversion in the character pairing.

Moreover, Fourier inversion says that (pretend that, say, $G$ is compact and $\hat{G}$ is discrete; for us, they will be finite thus both):

$\hat{\hat{f}} = f, \;\;\;\;\;\;\; f(x) = \displaystyle\sum_{\xi \in \hat{G}} \hat{f}(\xi) \xi(x).$

Now, let us consider the case where $G = \mu_N$ and $\hat{G} = \mathbb{Z}/N\mathbb{Z}$. We represent the characteristic function on $m \in \mathbb{Z}/N\mathbb{Z}$ by $\lvert m \rangle$ (we will see why later). A function $f$ on $G$ is then a linear combination, i.e. given that $f(m) = a_m \lvert m \rangle$, we write

$f = \displaystyle\sum_{m \in \mathbb{Z}/N\mathbb{Z}} a_m \lvert m \rangle.$

Letting $\theta \in \displaystyle\frac{1}{N} \mathbb{Z}$, the Fourier transform takes the characteristic function at $e^{2 \pi i \theta}$ on $G = \mu_N$ to

$\lvert \phi_\theta \rangle := \displaystyle\frac{1}{N} \sum_{m \in \mathbb{Z}/N\mathbb{Z}} e^{2 \pi i \theta m} \lvert m \rangle.$

Note that in our convention, we will normalize using $1/\sqrt{N}$ rather than $\frac{1}{N}$. In this way, the inverse transform also picks up a normalization of $1/\sqrt{N}$, and both are unitary. This matrix performs a base change between the standard basis of characteristic functions on $\hat{G} = \mathbb{Z}/n\mathbb{Z}$ and the basis coming from applying the Fourier transform to charactersitic functions on $G = \mu_N$. We let $QFT_N$ denote the unitary transformation such that

$QFT_N \lvert \phi_{m/N} \rangle = \lvert m \rangle.$

But by Fourier inversion, we know that it is

$QFT_N = \displaystyle\frac{1}{\sqrt{N}}\sum_{x, y \in \mathbb{Z}/N\mathbb{Z}} e^{2 \pi ixy/N} \lvert x \rangle \langle y \rvert.$

Note that this matrix is symmetric, not conjugate symmetric. This is due to the difference between Fourier transform and inverse Fourier transform, i.e. the inversion in the character pairing. Its inverse is then the conjugate transpose

$QFT_N^{-1} = \displaystyle\frac{1}{\sqrt{N}}\sum_{x, y \in \mathbb{Z}/N\mathbb{Z}} e^{-2 \pi ixy/N} \lvert y \rangle \langle x \rvert.$


Now, recall we were trying to do phase estimation given a unitary operator $U$ and eigenvector $\lvert \psi \rangle$ and eigenvalue $\lambda = e^{2 \pi i \theta}$. The rightmost measurement on

$(\mathbb{I} \otimes H) \circ CU \circ (\mathbb{I} \otimes H)\lvert \psi 0 \rangle$

will, with certainty, measure $0$ if $\theta = 0$ and $1$ if $\theta = 1/2$. Otherwise, we get 0 or 1 according to some probability distribution. We can iterate this:

$(\mathbb{I} \otimes H) \circ CU^2 \circ (\mathbb{I} \otimes H)\lvert \psi 0 \rangle$

which will measure $0$ if $\theta = 0, 1/2$ (i.e. $4\theta \cong 0 \pmod{2}$) and $1$ if $\theta = 1/4, 3/4$ (i.e. $4\theta \cong 1 \pmod{2}$). More generally,

$(\mathbb{I} \otimes H) \circ CU^{n-1} \circ (\mathbb{I} \otimes H)\lvert \psi 0 \rangle$

will measure $0$ if $\theta = x/2^n$ for even $m$ (i.e. $2^k\theta \cong 0 \pmod{2}$) and $1$ if $\theta = x/2^k$ for odd $x$ (i.e. $2^n\theta \cong 1 \pmod{2}$).

Now, we want to put these together. Theoretically, doing the above should be able to identify with certainty eigenvalues of the form $e^{2 \pi i \theta}$ with $\theta = x/2^n$ for any $x \in \mathbb{Z}$. To begin with we follow our nose: instead of a single register on the right, we will have $n$ registers. We let $C_kU$ mean to apply the controlled $U$ gate with the control on the $k$th register from the right. We take the $n$ rightmost measurements on:

$(\mathbb{I} \otimes H^{\otimes n}) \circ (C_{n-1}U)^{2^{n-1}} \circ \cdots\circ (C_2U)^{2^1} \circ (C_1U)^{2^0} \circ (\mathbb{I} \otimes H^{\otimes n})\lvert \psi 0^n \rangle.$

Since $\psi$ is an eigenvector, this state is separable into the left part and right part, and we focus only on the right part. A calculation shows that, writing $x/2^n = 0.a_1a_2\ldots a_{n}$ and letting $\lvert x \rangle = \lvert a_n \ldots a_2 a_1 \rangle$ represent the corresponding state, we have that the resulting state is:

$\displaystyle\frac{1}{2^n}\sum_{x=0}^{2^n - 1} e^{2 \pi i \theta x} \lvert x \rangle.$

As discussed earlier, this is precisely the Fourier transform applied to the characteristic function on $e^{2 \pi i \theta}$, given that this is a $2^n$th root of unity. To recover $\theta$, one merely has to apply the inverse Fourier transform as defined above. So, assuming we can build it, the circuit becomes

$(\mathbb{I} \otimes QFT_{2^n}^{-1})\circ (\mathbb{I} \otimes H^{\otimes n}) \circ (C_nU)^n \circ \cdots\circ (C_2U)^2 \circ C_1U \circ (\mathbb{I} \otimes H^{\otimes n})\lvert \psi 0^n \rangle$

with the final state being

$QFT^{-1}_{2^n} \displaystyle\frac{1}{\sqrt{2^n}}\sum_{x=0}^{2^n - 1} e^{2 \pi i \theta x} \lvert x \rangle = \displaystyle \frac{1}{ \sqrt{2^n}\sqrt{2^n}} \sum_{x,t,y=0}^{2^n - 1} e^{-2 \pi i y t/2^n} \lvert y \rangle \langle t \rvert e^{2 \pi i \theta x} \lvert x \rangle = \displaystyle \frac{1}{2^n} \sum_{x,y=0}^{2^n - 1} e^{2 \pi i x (\theta - y/2^n)} \lvert y \rangle.$

Adding it and using standard formulas for geometric series, the coefficient to $\lvert y \rangle$ is:

$\displaystyle \frac{1 - e^{2^n 2 \pi i (\theta - y^/2^n)}}{1 - e^{2 \pi i (\theta - y/2^n)}} = \frac{1 - e^{2 \pi i (2^n\theta - y^/2^n)}}{1 - e^{2 \pi i (\theta - y/2^n)}}.$

giving the final formula

$\displaystyle \frac{1}{2^n} \sum_{y=0}^{2^n - 1} \frac{1 - e^{2 \pi i (2^n\theta - y^/2^n)}}{1 - e^{2 \pi i (\theta - y/2^n)}} \lvert y \rangle.$

Now, if $\theta \in \displaystyle \frac{1}{2^n}\mathbb{Z}$, then a measurement on the $n$ rightmost qubits will reveal the corresponding numerator with certainty. This leaves two remaining questions: what happens if the denominator is not a power of 2, and how to implement this.
Before answering those questions, we also need to discuss the case where the input $\lvert \psi\rangle$ is in fact not an eigenvector for $U$, but a sum of eigenvectors. In this case, the resulting quantum state from applying the above algorithm is no longer separable but entangled. To see this, assume that the $\theta_j$ is of the form $m_j/2^n$ as above. Suppose that
$\lvert \psi \rangle = \displaystyle\sum_{j=1}^\ell a_j \lvert \psi_j \rangle$

where $\lvert \psi_j \rangle$ are (unit) eigenvectors with eigenvalue $\lambda_j = e^{2 \pi i \theta_j}$, and $\sum |a_j|^2 = 1$. By linearity, running phase estimation on $\lvert \psi \rangle$ will return $m_j$ with probability $||a_j||$. Note it's possible that the $m_j$ coincide, in which case the probabilities add.

Intuitively, if all the eigenvalues are distinct and we measure $m$ by running the algorithm above, this means this measurement should project $\lvert \psi \rangle$ to $\lvert \psi_i \rangle$ where $m_i = m$. Mathematically, applying the above algorithm to $\lvert \psi \rangle$ gives
$\displaystyle\sum_{j=1}^\ell a_j \lvert \psi_j \rangle \lvert m_j \rangle$.

After measurement, we see that the only $\lvert \psi_j \rangle$ appearing are those where $m_j = m$, more or less by definition, i.e.
$\displaystyle\frac{1}{\sum_{m_j = m} ||a_j||} \displaystyle\sum_{m_j = m} a_j \lvert \psi_j \rangle \lvert m \rangle$

In particular, if we ran the algorithm again on the output, we would find the same eigenvalue $\lambda_i$ again.


Let's address the first question of what happens when $\theta \not\in \displaystyle \frac{1}{2^n} \mathbb{Z}$. I'll spoil the answer. Suppose we take a measurement and get a fraction $m/2^n$.
  • The probability that $m/2^n$ is the fraction in $\displaystyle\frac{1}{2^n}\mathbb{Z}$ which is closest (or tied to being closest) to $\theta$ is at least $4/\pi^2$, i.e. at least 40%.
  • The probability that $m/2^n$ is not the closest to $\theta$ is false is at most $1/4$, i.e. 25%.
Since these bounds are constant in $n$, this means that for any probability cutoff for having found the closest $m/2^n$ to $\theta$, the number of iterations we need to take to get within the cutoff is constant.

Let us briefly derive this. Recall that the coefficient to $\lvert y \rangle$ is:

$\displaystyle \frac{1}{2^n} \displaystyle \frac{1 - e^{2^n 2 \pi i (\theta - y^/2^n)}}{1 - e^{2 \pi i (\theta - y/2^n)}} = \displaystyle \frac{1}{2^n} \frac{1 - e^{2 \pi i (2^n\theta - y^/2^n)}}{1 - e^{2 \pi i (\theta - y/2^n)}}.$

We can visualize a circle with $2^n$ tick marks, one of which corresponds to $y/2^n$, while $\theta$ is some arbitrary point. The numerator says to take the points modulo the tick marks, then blow up the segment between ticks to take the whole circle, and then take the distance between them. The denominator says to simply take the distance between them. Using the fact that the cord length is at least the arc length, we can deduce the above inequalities.
Now, we return to the previous question of what happens when $\lvert \psi \rangle$ is not an eigenvector, but now we do not assume that $r = 2^n$. Suppose $\lambda_j = e^{2 \pi i \theta_j}$. By linearity, the state is

$\displaystyle\frac{1}{2^n}\sum_{y=0}^{2^n-1} \displaystyle\sum_{j=1}^\ell \displaystyle a_j \frac{1 - e^{2 \pi i (2^n\theta_j - y/2^n)}}{1 - e^{2 \pi i (\theta_j - y/2^n)}}\lvert \psi_j \rangle \lvert y \rangle .$

Unlike before, if we measure $m$ in the right registers, this doesn't cause all terms involving $\lvert \psi_j \rangle$ for $m_j \ne m$ to vanish. We still have contributions from all $\vert \psi_j \rangle$:

$\displaystyle\sum_{j=1}^\ell \displaystyle a_j \frac{1 - e^{2 \pi i (2^n\theta_j - m/2^n)}}{1 - e^{2 \pi i (\theta_j - m/2^n)}}\lvert \psi_j \rangle \lvert m \rangle .$

In the above formula, we did not renormalize. Roughly speaking, once we do, what happens is that the $\theta_j$ near $m$ gain weight, while those further from it lose weight. In particular, with enough iterations, the vector $\lvert \psi \rangle$ begins to converges exponentially to an eigenvector in the eigenspace for $\lambda = e^{2 \pi i m/2^n}$ with probability exponentially approaching $1$.
Now, to implement it. We ignore all normalizations. We denote our input as $\lvert a_{n-1} \cdots a_0 \rangle = \lvert m \rangle \lvert a_0 \rangle = \lvert 2m + a_0 \rangle$. Say we've already implemented the gate $QFT_{2^{n-1}}$. Then, applying it to $\lvert m \rangle$, we have:

$(QFT_{2^{n-1}} \otimes \mathbb{I}) \lvert m \rangle \lvert a_0 \rangle = \displaystyle\sum_{x=0}^{2^{n-1} -1} e^{2 \pi i \cdot m \cdot (x/2^{n-1})} \lvert x \rangle \lvert a_0 \rangle = \displaystyle\sum_{x=0}^{2^{n-1} -1} e^{2 \pi i \cdot 2m \cdot (x/2^{n})} \lvert 2x + a_0 \rangle$

In the next step, we want to tweak $m$ so that it becomes $2m + a_0$. To do this, for $x = 2^k$, i.e. the $k+1$th register, we need to add the phase $(x/2^n)a_0$. That is, we apply a controlled phase gate, with the rightmost register as the control, with phase $2 \pi i \cdot (2^{k-1}/2^n)$ to the $k$th register:

$\displaystyle\sum_{x=0}^{2^{n-1} -1} e^{2 \pi i \cdot (2m + a_0) \cdot (x/2^{n})} \lvert 2x + a_0 \rangle.$

We next apply the QFT to the remaining register, i.e. the Hadamard gate to the leftmost register:

$\displaystyle\sum_{x=0}^{2^{n-1} -1} e^{2 \pi i \cdot (2m + a_0) \cdot (x/2^{n})} (\lvert 2x \rangle + (-1)^{a_0} \lvert 2x + 1 \rangle) = \displaystyle\sum_{x=0}^{2^{n-1} -1} \sum_{y = 0}^1 e^{2 \pi i \cdot (2m + a_0) \cdot (x/2^{n}) + \pi i a_0 y} \lvert 2x + y\rangle.$

Intuitvely, this is applying a Fourier transform of degree $2$ to the rightmost bit. There are two ways to interpret $\lvert a_{n-1} \ldots a_0 \rangle$:
  • the integer $a = a_0 + 2a_1 + \cdots + 2^{n-1} a_{n-1}$, i.e. as the binary expression $a_{n-1} \cdots a_0$,
  • the root of unity $e^{2 \pi i \theta}$ where $\theta = a/2^n$, i.e. as the binary expression $0.a_{n-1} \cdots a_0$.
In particular, it is not the rightmost bit that should be $2$th root of unity but the leftmost bit. So we rewrite

$= \displaystyle\sum_{x=0}^{2^{n-1} -1} \sum_{y = 0}^1 e^{2 \pi i \cdot (2m + a_0) \cdot (x + 2^{n-1} y)/2^{n}} \lvert 2x + y\rangle.$

and rotate it there using swap gates:

$\displaystyle\sum_{x=0}^{2^{n-1} -1} \sum_{y = 0}^1 e^{2 \pi i \cdot (2m + a_0) \cdot (x + 2^{n-1} y)/2^{n}} \lvert 2^{n-1} y + x\rangle = \displaystyle\sum_{z=0}^{2^n-1} e^{2 \pi i \cdot (2m+a_0) z} \lvert z \rangle.$

This is the desired formula.

No comments:

Post a Comment

Shor's algorithm

Shor's algorithm is a quantum algorithm for factoring in polynomial time. For sources, I am using this Qiskit tutorial and Wikipedia ...