forked from Scinawa/quantumalgorithms.org
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtoolbox.Rmd
1095 lines (724 loc) · 85 KB
/
toolbox.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
output:
pdf_document: default
html_document: default
---
# A useful toolbox {#chap:toolbox}
## Finding the minimum {#subsec:findmin}
We now want to show an algorithm that finds the minimum among $N$ unsorted values $u_{j\in[N]}$. As the Grover's algorithm, we work in the oracle model, so we assume to have quantum access to a the vector $u$. Without loss of generality we can take $N=2^n$, but if the length of our list is not a power of 2, we can just pad the rest of the elements in the list with zeroes. The statement of the theorem is the following.
```{lemma, find-minimum-orig, name="Quantum minimum finding [@durr1996quantum]"}
Given quantum access to a vector $u \in [0,1]^N$ via the operation $\ket j \ket{\bar 0} \to \ket j \ket{ u_j}$ on $\Ord{\log N}$ qubits, where $u_j$ is encoded to additive accuracy $\Ord{1/N}$. Then, we can find the minimum $u_{\min} = \min_{j\in[N]} u_j$ with success probability $1-\delta$ with $\Ord{\sqrt N \log \left (\frac{1}{\delta}\right) }$ queries and $\tOrd{\sqrt N \log \left( \frac{1}{\delta}\right )}$ quantum gates.
```
This algorithm utilizes a subroutine called quantum exponential searching algorithm (QESA), which is again composed of amplitude amplification (Grover) and amplitude estimation. For simplicity, we assume those values are distinct. The general idea is marking indices of values below a chosen threshold, returning and picking one of them as the new threshold with equal probability. After some iterations, it is expected to yield the lowest value. Note that the step of marking lower values is taken as an oracle with gate complexity of $O(1)$. We first present to algorithm and keep the explanation for later.
```{r, min-finding, fig.cap="Finding the minimum", echo=FALSE}
knitr::include_graphics("algpseudocode/min-finding.png")
```
```{r, qesa, fig.cap="Quantum Exponential Searching Algorithm", echo=FALSE}
knitr::include_graphics("algpseudocode/qesa.png")
```
In the basic Grover's algorithm with one solution, $m = CI\left(\frac{\pi}{4}\sqrt{N}\right)$ iterations give the true output with error probability at most $1/N$ for $N \gg 1$, where $CI(x)$ is the closest integer to $x$. The result can be generalized in the case of $t \ll N$ solutions that the error rate is reduced to at most $t/N$ after exactly $m = CI\left(\frac{\pi-2\arcsin{\sqrt{t/N}}}{4\arcsin{\sqrt{t/N}}}\right) = \left \lfloor \frac{\pi}{4\arcsin{\sqrt{t/N}}} \right \rfloor$ iterations (and thus oracle calls), with an upper bound $m \leq \frac{\pi}{4}\sqrt{\frac{N}{t}}$. However, a problem arises when we have to amplify amplitudes corresponding to values below the threshold. In practice, the number of candidates is unknown and varies for each threshold. QESA is a generalized algorithm to find a solution for unknown $t$ with probability at least $1/4$ in $O(\sqrt{N/t})$, which is motivated by the following observation.
```{proposition, qesa-observation, name="Quantum Exponential Searching Algorithm [@boyer1998tight]"}
Let $t$ be the (unknown) number of solutions and let $\theta$ be such that $\sin^2{\theta} = t/N$. Let $m$ be an arbitrary positive integer. Let $k$ be an integer chosen at random according to the uniform distribution between $0$ and $m − 1$. If we observe the register after applying $k$ iterations of Grover’s algorithm starting from the initial state $\sum_{i}\frac{1}{\sqrt{N}}|i\rangle$, the probability of obtaining a solution is exactly $P_m = \frac{1}{2} - \frac{\sin{(4m\theta)}}{4m\sin{(2\theta)}}$.
In particular, $P_m \geq 1/4$ when $m \geq 1/\sin{(2\theta)}$.
```
As QESA is expected to be done in $O(\sqrt{N/t})$ queries, one can deduce that the expected number of queries for the minimum-finding algorithm with success probability at least $1/2$ is $O(\sqrt{N})$. Repeating the algorithm $c$ times increases the success probability to $1-1/2^c$. In terms of quantum gates, the Grover part and the initialization part use $O(\sqrt{N})$ and $O(\log{N})$ respectively.
## Quantum linear algebra {#subsec:linearalgebra}
In our algorithms, we will also use subroutines for quantum linear algebra. From the first work of \cite{HarrowHassidim2009HHL} (known in the literature as HHL algorithm) that proposed a quantum algorithm for matrix inversion, a lot of progress has been made. In this section, we briefly recall some of the results in quantum linear algebra. We conclude by citing the state-of-the art techniques for performing not only matrix inversion and matrix multiplication, but also for applying a certain class of functions to the singular values of a matrix. A notable result after HHL was the ability to perform a quantum version of the singular value decomposition. This idea is detailed in the following theorem.
```{theorem, sve-theorem, name="Singular Value Estimation [@kerenidis2017quantumsquares]"}
Let $M \in \mathbb{R}^{n \times d}$ be a matrix with singular value decomposition $M =\sum_{i} \sigma_i u_i v_i^T$ for which we have quantum access. Let $\varepsilon > 0$ the precision parameter. There is an algorithm with running time $\tilde{O}(\mu(M)/\varepsilon)$ that performs the mapping $\sum_{i} \alpha_i \ket{v_i} \to \sum_{i}\alpha_i \ket{v_i} \ket{\tilde{\sigma}_i}$, where $| \tilde{\sigma_i} - \sigma_i | \leq \varepsilon$ for all $i$ with probability at least $1-1/poly(n)$.
```
Recall that quantum access to a matrix is defined in theorem \@ref(def:KP-trees), and the parameter $\mu$ is defined in definition \@ref(def:mu).
The relevance of theorem \@ref(thm:sve-theorem) for quantum machine learning is the following: if we are able to estimate the singular values of a matrix, then we can perform a conditional rotation controlled by these singular values and hence perform a variety of linear algebraic operations, including matrix inversion, matrix multiplication or projection onto a subspace. Based on this result, quantum linear algebra was done using the theorem stated below.
``` {theorem, matrix-algebra-sve, nome="(OLD) Quantum linear algebra"}
Let $M := \sum_{i} \sigma_iu_iv_i^T \in \mathbb{R}^{d \times d}$ such that $\norm{M}_2 = 1$, and a vector $x \in \mathbb{R}^d$ for which we have quantum access. There exist quantum algorithms that with probability at least $1-1/poly(d)$ return
- a state $\ket{z}$ such that $| \ket{z} - \ket{Mx}| \leq \epsilon$ in time $\tilde{O}(\kappa^2(M)\mu(M)/\epsilon)$
- a state $\ket{z}$ such that $|\ket{z} - \ket{M^{-1}x}| \leq \epsilon$ in time $\tilde{O}(\kappa^2(M)\mu(M)/\epsilon)$
- a state $\ket{M_{\leq \theta, \delta}^+M_{\leq \theta, \delta}x}$ in time $\tilde{O}(\frac{ \mu(M) \norm{x}}{\delta \theta \norm{M^{+}_{\leq \theta, \delta}M_{\leq \theta, \delta}x}})$
One can also get estimates of the norms with multiplicative error $\eta$ by increasing the running time by a factor $1/\eta$.
```
Recall that we denote as $V_{\geq \tau}$ the matrix $\sum_{i=0}^\ell \sigma_i u_i v_i^T$ where $\sigma_\ell$ is the smallest singular value which is greater than $\tau$.
For a matrix $M$ and a vector $x$, we define as $M^{+}_{\leq \theta, \delta}M_{\leq \theta, \delta} x$ the projection of $x$ onto the space spanned by the singular vectors of $M$ whose corresponding singular values are smaller than $\theta$, and some subset of singular vectors whose corresponding singular values are in the interval $[\theta, (1+\delta)\theta]$.
<!--
# TODO Finalize draft of algorithm and proof for linear algebra with SVE
# It commented - but its there - the original latex coming from the paper of gradient descent
# for the algo and the proof of the linear system of equation.
# It should be integrated with the proof of projection is subspace.
# I have already an algorithm encompassing all the three cases in my notes.
# label:
-->
<!-- ```{r, svd-linalg, fig.cap="Finding the minimum", echo=FALSE} -->
<!-- knitr::include_graphics("algpseudocode/min-finding.png") -->
<!-- ``` -->
<!-- We next analyze the correctness and provide the running time for the above algorithm. -->
<!-- \begin{theorem} \label{lqmat} -->
<!-- Parts (i) and (ii) of Algorithm \ref{qmat} produce as output a -->
<!-- state $\ket{z}$ such that -->
<!-- $\norm{ \ket{\mathcal{A}x} - \ket{z}} \leq \delta$ in expected time -->
<!-- $\widetilde{O}(\frac{\kappa^{2}(A) \mu(A)}{ \delta } )$ for $\mathcal{A}= A$ and $\mathcal{A}= A^{-1}$ respectively. -->
<!-- \end{theorem} -->
<!-- {proof} -->
<!-- We first analyze matrix multiplication. The unnormalized solution state is $Ax= \sum_{i} \beta_{i} \lambda_{i} v_{i}$, while the unnormalized output of step 1 of the algorithm which performs SVE to precision $\epsilon_{1}$ is $z= \sum_{i} ( \lambda_{i} \pm \widetilde{\epsilon}_{i}) \beta_{i} v_{i}$ such that $|\widetilde{\epsilon}_{i}| \leq \epsilon_{1}$ for all $i$. As the $v_{i}$ are orthonormal, we have -->
<!-- $\norm{ Ax - z} \leq \epsilon_{1} \norm{x}$ and by Claim \ref{unnorm}, we have $\norm{ \ket{Ax} - \ket{z}} \leq \frac{\sqrt{2}\epsilon_{1}\norm{x}}{\norm{Ax}}\leq \sqrt{2} \epsilon_{1} \kappa(A)$. -->
<!-- We next analyze linear systems. The unnormalized solution state is $A^{-1}x= \sum_{i} \frac{\beta_{i}}{\lambda_{i}} v_{i}$. %where $\norm{\beta}=1$. -->
<!-- The unnormalized output is $z= \sum_{i} \frac{\beta_{i}}{\lambda_{i} \pm \tilde{\epsilon}_{i}} v_{i}$ for $|\widetilde{\epsilon}_{i}| \leq \epsilon_{1}$. We have -->
<!-- the bound -->
<!-- \begin{eqnarray*} -->
<!-- \norm{ A^{-1} x - z}^{2} & \leq & \sum_{i} \beta_{i}^{2} \en{ \frac{1}{\lambda_{i}} - \frac{1}{ \lambda_{i} \pm \tilde{\epsilon}_{i}} }^{2} \leq \epsilon_{1}^{2} \sum_{i} \frac{\beta_{i}^{2}}{\lambda_{i}^{2} (\lambda_{i} - \epsilon_{1})^{2} } \leq \frac{\epsilon_{1}^{2} \kappa^{2}(A) \norm{A^{-1} x}^{2}}{ (1-\kappa(A) \epsilon_{1})^{2}} \\ -->
<!-- & \leq & 4\epsilon_{1}^{2} \kappa^{2}(A) \norm{A^{-1} x}^{2} -->
<!-- \end{eqnarray*} -->
<!-- assuming -->
<!-- that $\kappa(A) \epsilon_{1} \leq 1/2$. Applying Claim \ref{unnorm} we obtain $\norm{ \ket{A^{-1} x} -\ket{ z}} \leq 2\sqrt{2}\kappa(A) \epsilon_{1}$ for $\kappa(A) \epsilon_{1} \leq 1/2$. -->
<!-- We can therefore use the SVE algorithm with precision $\epsilon_{1} = \frac{\delta} { \kappa(A)} $ for both cases to obtain a solution state $\norm{ \ket{\mathcal{A}x} - \ket{z}} \leq \delta$. The success probability for step 2 of the algorithm is $\frac{1}{\kappa^{2}(A)}$ for both matrix multiplication and matrix inversion. We apply Amplitude Amplification as in theorem \ref{tampa} with the unitary $V$ that represents the first two steps of the algorithm to obtain an expected running time $\widetilde{O}(\frac{\kappa^2 (A) \mu(A)}{ \delta } )$. -->
<!-- ``` -->
For a symmetric matrix $M \in \mathbb{R}^{d\times d}$ with spectral norm $\|M\|=1$ for which we have quantum access, the running time of these algorithms depends on the condition number $\kappa(M)$ of the matrix, that can be replaced by $\kappa_\tau(M)$, a condition threshold where we keep only the singular values bigger than $\tau$, and the parameter $\mu(M)$, a matrix dependent parameter defined in definition \@ref(def:mu). The running time also depends logarithmically on the relative error $\epsilon$ of the final outcome state. Recall that these linear algebra procedures above can also be applied to any rectangular matrix $V \in \mathbb{R}^{n \times d}$ by considering instead the symmetric matrix
$$ \overline{V} = \left ( \begin{matrix}
0 &V \\
V^{T} &0 \\
\end{matrix} \right ).$$
## Linear combination of unitaries, normal forms, and singular value transformations {#subsec:svt}
We continue our journey in quantum linear algebra by discussing the state-of-the-art technique beneath quantum linear algebra, called singular value transformation.
The research of quantum algorithms for machine learning has always used techniques developed in other areas of quantum algorithms. Among the many, we cite quantum algorithms for Hamiltonian simulation and quantum random walks. In fact, using quantum random walks, it is possible to decrease the dependence on the error parameter, from polynomial to $\text{polylog}(1/\epsilon)$ [@childs2012hamiltonian]. Stemming from the research in Hamiltonian simulation [@berry2015hamiltonian], [@low2019hamiltonian], [@berry2015simulating], [@low2017hamiltonian],[@subramanian2019implementing], these techniques have been further optimized, pushing them to the limit of almost optimal time and query complexity. Significant progress in the direction of quantum algorithms for linear algebra was the so-called LCU, or linear combination of unitaries [@childs2012hamiltonian], which again was developed in the context of the Hamiltonian simulation problem.
```{lemma, lcu, name="Linear combination of unitaries [@Childs2015]"}
Let $M = \sum_{i} \alpha_i U_i$ be a linear combination of unitaries $U_i$ with $\alpha_i >0$ for all $i$. Let $V$ be any operator that satisfies $V\ket{0^{\otimes m}} := \frac{1}{\sqrt{\alpha}}\sum_i \sqrt{\alpha_i}\ket{i}$, where $\alpha := \sum_i \alpha_i$.
Then $W := V^{\dagger}UV$ satisfies
\begin{equation}
W\ket{0^{\otimes m}}\ket{\psi} =\frac{1}{\alpha}\ket{0^{\otimes m}}M\ket{\psi} + \ket{\Psi^\perp}
\end{equation}
for all states $\ket{\psi}$, where $U := \sum_i \ket{i}\bra{i} \otimes U_i$ and $(\ket{0^{\otimes m}}\bra{0^{\otimes m}} \otimes I)\ket{\Psi^\perp}=0$.
```
In recent years a new framework for performing operations on matrices with a quantum computer was developed, which can be found in works [@CGJ18], [@gilyen2019quantum]. We now briefly go through the machinery behind these results, as it will be used throughout this thesis.
```{definition, def-block-encoding, name="Block encoding of a matrix"}
Let $A\in \mathbb{C}^{2^s \times 2^s}$. We say that a unitary $U \in \mathbb{C}^{(s+a)\times(s+a)}$ is a ($\alpha, a, \epsilon)$ block encoding of $A$ if:
$$\norm{A - \alpha (\bra{0}^a \otimes I )U( \ket{0}^a \otimes I) } \leq \epsilon$$
```
We will see that having quantum access to a matrix $A\in\mathbb{C}^{2^w \times 2^w}$, as described in the setting of theorem \@ref(def:qram), it is possible to implement a $(\mu(A), w+2, \text{polylog}(\epsilon))$ block-encoding of $A$ \footnote{This $\text{polylog}(\epsilon)$ in the block encoding is due to approximation error that one commits when creating quantum access to the classical data structures, i.e. is the approximation that derives from truncating a number $n \in \mathbb{R}$ (which rerepsent an entry of the matrix) up to a certain precision $\epsilon$ lemma 25 of [@CGJ18].}. Given matrix $U$ which is a $(\alpha, a, \delta)$ block encoding of $A$, and a matrix $V$ which is a $(\beta, b, \epsilon)$ block encoding of $B$, it is simple to obtain a $(\alpha\beta, a+b, \alpha\epsilon+\beta\delta)$ block encoding of $AB$.
For practical purposes, having a block encoding of a matrix $A$, allows one to manipulate its spectra using polynomial approximation of analytic functions.
In the following theorem, the notation $P_{\Re}(A)$ means that we apply the polynomial $P$ to the singular values of the matrix $A$, i.e. $P_{\Re}(A) = \sum_i^r P(\sigma_i)u_iv_i^T$.
<!-- TODO -->
<!-- explain better why we have a $ -->
```{theorem, SVT, name="Polynomial eigenvalue transformation of arbitrary parity [@gilyen2019quantum]"}
Suppose that $U$ is an $(\alpha,a,\epsilon)$-block encoding of the Hermitian matrix $A$. If $\delta\geq 0$ and $P_{\Re}\in\mathbb{R}[x]$ is a degree-$d$ polynomial satisfying that:
- for all $x\in[-1,1]$, $| P_{\Re}(x)|\leq \frac{1}{2}$.
Then there is a quantum circuit $\tilde{U}$, which is an $(1,a+2,4d\sqrt{\epsilon/\alpha}+\delta)$-encoding of $P_{\mathcal{R}}(A/\alpha)$, and consists of $d$ applications of $U$ and $U^\dagger$ gates, a single application of controlled-$U$ and $O((a+1)d)$ other one- and two-qubit gates. Moreover we can compute a description of such a circuit with a classical computer in time $O(\text{poly}d,\log(1/\delta))$.
```
For instance, matrix inversion can be seen as the problem of implementing the singular value transformation of $x \mapsto 1/x$. For this, one needs to get a polynomial approximation of the function $1/x$. While this might seem a simple task, there are small complications. First, one usually does not consider the whole interval $[-1, 1]$. In practice, one excludes the subset of the domain where the function has singularities (i.e. for $1/x$ is around zero). Then, it is preferable to pick a polynomial of a small degree (and small coefficients), as the depth of the circuit depends linearly on the degree of the polynomial.
Given a $(\alpha, a, \epsilon)$-block encoding for a matrix $A$ and a quantum state $\ket{b}$, we can obtain a good approximation of $A\ket{b}/\norm{Ab}$ by first creating the state $\ket{0^a,b}$ and then applying the block encoding of $A$ to it. Then, we can amplify the part of the subspace associated with the state $\ket{0}^{\otimes a}A\ket{b}$. Differently, one might use advanced amplification techniques and reach a similar result. This concept is detailed in the following lemma.
```{lemma, applicationBE, name="Applying a block-encoded matrix to a quantum state [@CGJ18]"}
Fix any $\varepsilon \in (0,1/2)$. Let $A \in \mathbb{C}^{N \times N}$ such that $\norm{A}\leq 1$ and $\ket{b}$ a normalized vector in $\mathbb{C}^N$, such that $\norm{A\ket{b}} \geq \gamma$. Suppose that $\ket{b}$ can be generated in complexity $T_b$ and there is a $(\alpha, a, \epsilon)$-block encoding of $A$ for some $\alpha \geq 1$, with $\epsilon \leq \varepsilon\gamma/2$, that can be implemented in cost $T_A$. Then there is a quantum algorithm with complexity
$$O\left(min(\frac{\alpha(T_A + T_b)}{\gamma}, \frac{\alpha T_A\log(1/\epsilon)+T_B}{\gamma})\right) $$
that terminates with success probability at least $2/3$, and upon success generates the state $A\ket{b}/\norm{A\ket{b}}$ to precision $\varepsilon$.
```
For sake of completeness, we briefly discuss how to prove the first upper bound. Generating $\ket{b}$ and applying the block encoding of $A$ to it, we create a state that is $(\epsilon/\alpha)$-close to:
$$\ket{0}^{\otimes a}(\frac{1}{\alpha}A\ket{b}) + \ket{0^\perp} $$
From the hypothesis, we know that $\norm{\frac{1}{\alpha}A\ket{b} } \geq \gamma/\alpha$. We can use $O(\alpha/\gamma)$ calls to amplitude amplification on the initial register being $\ket{0}^{\otimes a}$, to get $\frac{\epsilon}{\gamma}$ close to $\ket{0}^{\otimes a}\frac{A\ket{b}}{\norm{A}\ket{b}}$. The second upper bound is shown by other techniques based on amplitude amplification of singular values of block encoded matrices (i.e. \cite[lemma 47]{CGJ18}, \cite[theorem 2,8]{low2017hamiltonian}).
Regarding the usage of block-encodings for solving with a quantum computer a linear system of equations (i.e. multiplying a quantum state by the inverse of a matrix, and creating a state $\ket{x}$ proportional to $A^{-1}\ket{b}$) we can proceed in an analogous way. First, we need to create block encoding access to $A^{-1}$. Using the following lemma, (where they denoted with $\kappa$ the condition number of $A$) we can implement negative powers of Hermitian matrices.
```{lemma, negative-powers, name="Implementing negative powers of Hermitian matrices [@CGJ18] lemma 9" }
Let $c \in (0,\infty), \kappa \geq 2$, and let $A$ be an Hermitian matrix such that $I/\kappa \leq A \leq I$. Suppose that $\delta = o(\epsilon/( \kappa^{1+c}(1+c) \log^3\frac{k^{1+c}}{\epsilon}))$ and $U$ is an $(\alpha,a,\delta)$-block encoding of $A$ that can be implemented using $T_U$ gates. Then, for any $\epsilon$, we can implement a unitary $\tilde{U}$ that is a $(2\kappa^c, a, \epsilon)$-block encoding of $H^{-c}$ in cost:
$$O\left(\alpha \kappa(a+T_U)(1+c)\log^2(\frac{k^{1+c}}{\epsilon})\right)$$
```
Nevertheless, the algorithm that emerges by using the previous lemma has a quadratic dependence on $\kappa$. To decrease it to an algorithm linear in $\kappa$ the authors used variable time amplitude amplifications[@ambainis2010variable]. Hence, we can restate the theorem \@ref(thm:matrix-algebra-sve), with the improved runtimes, as follows.
```{theorem, qla, name="Quantum linear algebra [@CGJ18],[@gilyen2019quantum]"}
Let $M := \sum_{i} \sigma_iu_iv_i^T \in \mathbb{R}^{d \times d}$ such that $\norm{M}_2 =1$, and a vector $x \in \mathbb{R}^d$ for which we have quantum access in time $T_\chi$. There exist quantum algorithms that with probability at least $1-1/poly(d)$ return
- a state $\ket{z}$ such that $| \ket{z} - \ket{Mx}| \leq \epsilon$ in time $\tilde{O}(\kappa(M)(\mu(M)+T_\chi)\log(1/\epsilon))$
- a state $\ket{z}$ such that $|\ket{z} - \ket{M^{-1}x}| \leq \epsilon$ in time $\tilde{O}(\kappa(M)(\mu(M)+T_\chi)\log(1/\epsilon))$
- a state $\ket{M_{\leq \theta, \delta}^+M_{\leq \theta, \delta}x}$ in time $\tilde{O}(T_\chi \frac{ \mu(M) \norm{x}}{\delta \theta \norm{M^{+}_{\leq \theta, \delta}M_{\leq \theta, \delta}x}})$
One can also get estimates of the norms with multiplicative error $\eta$ by increasing the running time by a factor $1/\eta$.
```
Another important advantage of the new methods is that it provides easy ways to manipulate sums or products of matrices.
```{theorem, qlap, name="Quantum linear algebra for products [@CGJ18],[@gilyen2019quantum]"}
Let $M_1, M_2 \in \mathbb{R}^{d \times d}$ such that $\norm{M_1}_2= \norm{M_2}_2 =1$, $M=M_1M_2$,
and a vector $x \in \mathbb{R}^d$ for which we have quantum access. There exist quantum algorithms that with probability at least $1-1/poly(d)$ return
- a state $\ket{z}$ such that $| \ket{z} - \ket{Mx}| \leq \epsilon$ in time $\tilde{O}(\kappa(M)(\mu(M_1)+\mu(M_2))\log(1/\epsilon))$
- a state $\ket{z}$ such that $|\ket{z} - \ket{M^{-1}x}| \leq \epsilon$ in time $\tilde{O}(\kappa(M)(\mu(M_1)+\mu(M_2))\log(1/\epsilon))$
- a state $\ket{M_{\leq \theta, \delta}^+M_{\leq \theta, \delta}x}$ in time $\tilde{O}(\frac{ (\mu(M_1)+\mu(M_2)) \norm{x}}{\delta \theta \norm{M^{+}_{\leq \theta, \delta}M_{\leq \theta, \delta}x}})$
One can also get estimates of the norms with multiplicative error $\eta$ by increasing the running time by a factor $1/\eta$.
```
More generally, applying a matrix $M$ which is the product of $k$ matrices, i.e. $M=M_1\dots M_k$ will result in a runtime of $\kappa(M)(\sum_i^k \mu(M_i) )\log(1/\epsilon)$ factors in the runtime.
## Estimation of distances, inner products, norms, and quadratic forms {#sub:distances}
In this section, we report two lemmas that can be used to estimate the inner products, distances, and quadratic forms between vectors. The lemma \@ref(lem:innerproductestimation) has been developed in the work [@kerenidis2019qmeans]. The lemma \@ref(lem:qsp-l1norm) and the other lemmas for inner product estimation in the query model come from [@hamoudi2020quantum].
### Inner products and quadratic forms with KP-trees
We can rephrase this theorem saying that we have quantum access to the matrices, but for simplicity we keep the original formulation.
```{lemma, innerproductestimation, name="Distance and Inner Products Estimation [@kerenidis2019qmeans]"}
Assume for a matrix $V \in \mathbb{R}^{n \times d}$ and a matrix $C \in \mathbb{R}^{k \times d}$ that the following unitaries
$\ket{i}\ket{0} \mapsto \ket{i}\ket{v_i}$, and $\ket{j}\ket{0} \mapsto \ket{j}\ket{c_j}$
can be performed in time $T$ and the norms of the vectors are known. For any $\Delta > 0$ and $\epsilon>0$, there exists a quantum algorithm that computes
- $\ket{i}\ket{j}\ket{0} \mapsto \ket{i}\ket{j}\ket{\overline{d^2(v_i,c_j)}}$ where $|\overline{d^{2}(v_i,c_j)}-d^{2}(v_i,c_j)| \leqslant \epsilon$ w.p. $\geq 1-2\Delta$
- $\ket{i}\ket{j}\ket{0} \mapsto \ket{i}\ket{j}\ket{\overline{(v_i,c_j)}}$ where $|\overline{(v_i,c_j)}-(v_i,c_j)| \leqslant \epsilon$ w.p. $\geq 1-2\Delta$
in time $\widetilde{O}\left(\frac{ \norm{v_i}\norm{c_j} T \log(1/\Delta)}{ \epsilon}\right)$.
```
```{proof}
Let us start by describing a procedure to estimate the square $\ell_2$ distance between the normalized vectors $\ket{v_i}$ and $\ket{c_j}$. We start with the initial state
$$
\ket{\phi_{ij}} := \ket{i} \ket{j} \frac{1}{\sqrt{2}}(\ket{0}+ \ket{1})\ket{0}
$$
Then, we query the state preparation oracle controlled on the third register to perform the mappings
$\ket{i}\ket{j}\ket{0}\ket{0} \mapsto \ket{i}\ket{j}\ket{0}\ket{v_i}$ and $\ket{i}\ket{j}\ket{1}\ket{0} \mapsto \ket{i}\ket{j}\ket{1}\ket{c_j}$.
The state after this is given by,
$$
\frac{1}{\sqrt{2}}\left( \ket{i}\ket{j}\ket{0}\ket{v_i} + \ket{i}\ket{j}\ket{1}\ket{c_j}\right)
$$
Finally, we apply an Hadamard gate on the the third register to obtain,
$$\ket{i}\ket{j}\left( \frac{1}{2}\ket{0}\left(\ket{v_i} + \ket{c_j}\right) + \frac{1}{2}\ket{1}\left(\ket{v_i} -\ket{c_j}\right)\right)$$
The probability of obtaining $\ket{1}$ when the third register is measured is,
$$ p_{ij} = \frac{1}{4}(2 - 2\braket{v_i}{c_j}) = \frac{1}{4} d^2(\ket{v_i}, \ket{c_j}) = \frac{1 - \langle v_i | c_j\rangle}{2}$$
which is proportional to the square distance between the two normalized vectors.
We can rewrite $\ket{1}\left(\ket{v_i} - \ket{c_j}\right)$ as $\ket{y_{ij},1}$ (by swapping the registers), and hence we have the final mapping
\begin{equation}
A: \ket{i}\ket{j} \ket{0} \mapsto \ket{i}\ket{j}(\sqrt{p_{ij}}\ket{y_{ij},1}+\sqrt{1-p_{ij}}\ket{G_{ij},0})
(\#eq:QDE)
\end{equation}
where the probability $p_{ij}$ is proportional to the square distance between the normalized vectors and $G_{ij}$ is a garbage state. Note that the running time of $A$ is $T_A=\tilde{O}(T)$.
Now that we know how to apply the transformation described in Equation \@ref(eq:QDE), we can use known techniques to conclude our subroutine to perform the distance estimation within additive error $\epsilon$ with high probability. The method uses two tools, amplitude estimation, and the median evaluation lemma \@ref(lem:median) from [@wiebe2018quantum], which is a quantum version of the well-known powering-lemma \@ref(lem:powering-lemma).
First, using amplitude estimation (theorem \@ref(thm:thm-ampest-orig) ) with the unitary $A$ defined in Equation \ref{QDE}, we can create a unitary operation that maps
$$
\mathcal{U}: \ket{i}\ket{j} \ket{0} \mapsto \ket{i}\ket{j} \left( \sqrt{\alpha} \ket{ \overline{p_{ij}}, G, 1} + \sqrt{ (1-\alpha ) }\ket{G', 0} \right)
$$
where $G, G'$ are garbage registers, $|\overline{p_{ij}} - p_{ij} | \leq \epsilon$ and $\alpha \geq 8/\pi^2$.
The unitary $\mathcal{U}$ requires $P$ iterations of $A$ with $P=O(1/\epsilon)$. Amplitude estimation thus takes time $T_{\mathcal{U}} = \widetilde{O}(T/\epsilon)$. We can now apply theorem \@ref(lem:median) for the unitary $\mathcal{U}$ to obtain a quantum state $\ket{\Psi_{ij}}$ such that,
$$\|\ket{\Psi_{ij}}-\ket{0}^{\otimes L}\ket{\overline{p_{ij}}, G}\|_2\le \sqrt{2\Delta}$$
The running time of the procedure is $O( T_{\mathcal{U}} \ln(1/\Delta)) = \widetilde{O}( \frac{T }{\epsilon}\log (1/\Delta) )$.
Note that we can easily multiply the value $\overline{p_{ij}}$ by 4 in order to have the estimator of the square distance of the normalized vectors or compute $1-2\overline{p_{ij}}$ for the normalized inner product. Last, the garbage state does not cause any problem in calculating the minimum in the next step, after which this step is uncomputed.
The running time of the procedure is thus
$O( T_{\mathcal{U}} \ln(1/\Delta)) = O( \frac{T }{\epsilon}\log (1/\Delta) )$.
The last step is to show how to estimate the square distance or the inner product of the unnormalized vectors. Since we know the norms of the vectors, we can simply multiply the estimator of the normalized inner product with the product of the two norms to get an estimate for the inner product of the unnormalized vectors and a similar calculation works for the distance. Note that the absolute error $\epsilon$ now becomes $\epsilon \norm{v_i}\norm{c_j}$ and hence if we want to have in the end an absolute error $\epsilon$ this will incur a factor of $\norm{v_i}\norm{c_j}$ in the running time. This concludes the proof of the lemma.
```
It is relatively simple to extend the previous algorithm to one that computes an estimate of a quadratic form. We will consider the case where we have quantum access to a matrix $A$ and compute the quadratic forms $v^TAv$ and $v^TA^{-1}v$. The extension to the case when we have two different vectors, i.e. $v^TAu$ and $v^TA^{-1}u$ is trivial.
```{lemma, quadratic-forms, name="Estimation of quadratic forms"}
Assume to have quantum access to a symmetric positive definite matrix $A \in \mathbb{R}^{n \times n}$ such that $\norm{A}\leq 1$, and to a matrix $V \in \mathbb{R}^{n \times d}$. For $\epsilon > 0$, there is a quantum algorithm that performs the mapping $\ket{i}\ket{0} \mapsto \ket{i}\ket{\overline{s_i}}$, for $|s_i - \overline{s_i}| \leq \epsilon$, where $s_i$ is either:
- $(\ket{v_i},A\ket{v_i})$ in time $O(\frac{\mu(A)}{\epsilon})$
- $(\ket{v_i},A^{-1}\ket{v_i})$ in time $O(\frac{\mu(A)\kappa(A)}{\epsilon})$
The algorithm can return an estimate of $\overline{(v_i, Av_i)}$ such that $\overline{(v_i, Av_i)} - (v_i, Av_i) \leq \epsilon$ using quantum access to the norm of the rows of $V$ by increasing the runtime by a factor of $\norm{v_i}^2$.
```
```{proof}
We analyze first the case where we want to compute the quadratic form with $A$, and after the case for $A^{-1}$. Recall that the matrix $A$ can be decomposed in an orthonormal basis $\ket{u_i}$. We can use theorem \@ref(thm:qla) to perform the following mapping:
\begin{align}
\ket{i}\ket{v_i}\ket{0} = \ket{i}\frac{1}{N_i}\sum_j^n \alpha_{ij}\ket{u_j}\ket{0} \mapsto \ket{i}\frac{1}{N_i} \sum_i^n \Big(\lambda_i\alpha_{ij}\ket{u_i,0}+ \sqrt{1-\gamma^2}\ket{G,1} \Big) =\\
\ket{i} \Big(\norm{Av_i}\ket{Av_i,0}+\sqrt{1-\gamma^2}\ket{G,1} \Big)
=\ket{i}\ket{\psi_i},
\end{align}
where $N_i = \sqrt{\sum_j^n \alpha_{ij}^2}$.
We define $\ket{\phi_i} = \ket{v_i,0}$. Using controlled operations, we can then create the state:
\begin{equation}\label{eq:quadatic A}
\frac{1}{2}\ket{i}\left(\ket{0}(\ket{\phi_i}+\ket{\psi_i}) + \ket{1}(\ket{\phi_i}-\ket{\psi_i}) \right)
\end{equation}
It is simple to check that, for a given register $\ket{i}$, the probability of measuring $0$ is:
$$p_i(0) = \frac{1+\norm{Av_i}\braket{Av_i|v_i}}{2}$$
We analyze the case where we want to compute the quadratic form for $A^{-1}$. For a $C = O(1/\kappa(A))$, we create instead the state:
\begin{align}\label{eq:quadatic A minus 1}
\ket{i}\frac{1}{\sqrt{\sum_i^n \alpha_i^2}} \sum_i^n \Big(\frac{C}{\lambda_i}\alpha_i\ket{v_i,0} + \sqrt{1-\gamma^2}\ket{G,1} \Big) = \ket{i}\ket{\psi_i}
\end{align}
\begin{equation}
U_2 \ket{i}\ket{0} \mapsto \frac{1}{2}\ket{i} \left(\sqrt{\alpha}\ket{p_i(0),y_i,0} + \sqrt{1-\alpha}\ket{G_i,1} \right)
\end{equation}
and estimate $p_i(0)$ such that $|p_i(0) - \overline{p_i(0)}| < \epsilon$ for the case of $v_i^TAv_i$ and we choose a precision $\epsilon/C$ for the case of $v_i^TA^{-1}v_i$ to get the same accuracy. Amplitude estimation theorem, i.e. theorem \@ref(thm:ampest_orig) fails with probability $\leq \frac{8}{\pi^2}$. The runtime of this procedure is given by combining the runtime of creating the state $\ket{\psi_i}$, amplitude estimation, and the median lemma. Since the error in the matrix multiplication step is negligible, and assuming quantum access to the vectors is polylogarithmic, the final runtime is $O(\log(1/\delta)\mu(A)\log(1/\epsilon_2) / \epsilon)$, with an additional factor $\kappa(A)$ for the case of the quadratic form of $A^{-1}$.
Note that if we want to estimate a quadratic form of two unnormalized vectors, we can just multiply this result by their norms. Note also that the absolute error $\epsilon$ now becomes relative w.r.t the norms, i.e. $\epsilon \norm{v_i}^2$. If we want to obtain an absolute error $\epsilon'$, as in the case with normalized unit vectors, we have to run amplitude estimation with precision $\epsilon'=O(\epsilon/\norm{v_i}^2)$.
To conclude, this subroutine succeeds with probability $1-\gamma$ and requires time $O(\frac{\mu(A) \log (1/\gamma) \log (1/\epsilon_?)}{\epsilon_1})$, with an additional factor of $\kappa(A)$ if we were to consider the quadratic form for $A^{-1}$, and an additional factor of $\norm{v_i}^2$ if we were to consider the non-normalized vectors $v_i$. This concludes the proof of the lemma.
```
<!--
# TODO Fix and check epsilons in the proof for quandratic forms
# (it was epsilonmult) what is the correct epsilon to put in the log?
-->
Note that this algorithm can be extended by using another index register to query for other vectors from another matrix $W$, for which we have quantum access. This extends the capabilities to estimating inner products in the form $\ket{i}\ket{j}\ket{w_i^TA v_i}$.
### Inner product and l1-norm estimation with query access
```{lemma, qsp-l1norm, name="Quantum state preparation and norm estimation"}
Let $\eta >0$. Given a non-zero vector $u \in [0,1]^N$, with $\max_j u_j = 1$. Given quantum access to $u$ via the operation $\ket j \ket{\bar 0} \to \ket j \ket{ u_j}$ on $O(\log N + \log 1/ \eta)$ qubits, where $u_j$ is encoded to additive accuracy $\eta$. Then:
- There exists a unitary operator that prepares the state $\frac{1}{\sqrt{N}} \sum_{j=1}^N \ket j \left( \sqrt{ u_j } \ket{0} + \sqrt{1- u_j} \ket{1} \right)$ with two queries and number of gates $O(\log N + \log 1/\eta)$. Denote this unitary by $U_\chi$.
- Let $\epsilon >0$ such that $\eta \leq \epsilon/ (2N)$ and $\delta \in (0,1)$. There exists a quantum algorithm that provides an estimate $\Gamma_u$ of the $\ell_1$-norm $\Vert u \Vert_1$ such that $\left \vert \Vert u \Vert_1 - \Gamma_u\right \vert \leq \epsilon \Vert u \Vert_1$, with probability at least $1-\delta$. The algorithm requires $O(\frac{\sqrt{ N}}{\epsilon} \log(1/\delta))$ queries and $\widetilde{O}(\frac{\sqrt{ N }}{\epsilon} \log\left (1/\delta\right))$ gates.
- Let $\xi \in (0,1]$ such that $\eta \leq \xi /4N$ and $\delta \in(0,1)$. An approximation $\ket{ \tilde p} = \sum_{j=1}^N \sqrt{ \tilde p_j }\ket j$ to the state $\ket{u} := \sum_{j=1}^N \sqrt{ \frac{u_j}{\Vert u \Vert_1}}\ket j$ can be prepared with probability $1-\delta$, using $O(\sqrt{N} \log (1/\delta))$ calls to the unitary of (i) and $\widetilde{O}(\sqrt{N} \log (1/\xi)\log \left (1/\delta \right))$ gates. The approximation in $\ell_1$-norm of the probabilities is $\left \Vert \tilde p - \frac{u}{\Vert u\Vert_1} \right\Vert_1 \leq \xi$.
```
<!-- TODO: fix cite Brassard2002, Apeldoorn2017, and put it as in our book.bib
also add the notation used in this theorems, or standardize it with therest of the document, or add it briefly here..
-->
```{proof}
For the first point, prepare a uniform superposition of all $\ket j$ with $O(\log N)$ Hadamard gates. With the quantum query access, perform
$$\frac{1}{\sqrt{N}} \sum_{j=1}^N \ket {j} \ket {\bar 0} \to \frac{1}{\sqrt{N}} \sum_{j=1}^N \ket j \ket{ u_j} \ket { 0}$$
$$\to \frac{1}{\sqrt{N}} \sum_{j=1}^N \ket j \ket{ u_j} \left( \sqrt{ u_j} \ket{0} + \sqrt{1-u_j } \ket{1} \right).$$
The steps consist of an oracle query and a controlled rotation. The rotation is well-defined as $u_j \leq 1$ and costs $O( \log 1/\eta)$ gates. Then uncompute the data register $\ket{u_j}$ with another oracle query.
For part 2, define a unitary $\mathcal U = U_\chi \left(\mathbb{1} - 2 \ket{\bar 0}\bra{\bar 0}\right) \left(U_\chi\right)^\dagger$, with $U_\chi$ from part 1, and here $\mathbb{1}$ is the identiy matrix. Define another unitary by $\mathcal V = \mathbb {1}-2 \mathbb{1} \otimes \ket{0} \bra{0}$.
Using $K$ applications of $\mathcal U$ and $\mathcal V$, amplitude estimation \@ref(thm:thm-ampest-orig) allows to provide an estimate $\tilde a$ of the quantity $a = \frac{\Vert u \Vert_1}{N}$ to accuracy
$\vert \tilde{a} - a \vert \leq 2 \pi \frac{\sqrt{a(1-a)} }{K}+ \frac{\pi^2}{K^2}$.
Following the idea in [@van2020quantum] (of dividing the elements that we are summing using amplitude estimation by their maximum, that we can find using the finding the minimum subroutine), take $K> \frac{6 \pi }{\epsilon} \sqrt{ N}$, which obtains
\begin{align}
\left| \tilde a - a \right|
&\leq& \frac{\pi}{K}\left( 2\sqrt{a}+ \frac{\pi}{K} \right)
< \frac{\epsilon}{6 } \sqrt{ \frac{1}{N } } \left( 2\sqrt{a}+ \frac{\epsilon }{6}\sqrt{ \frac{1}{N } } \right) \nonumber \\
&\leq& \frac{\epsilon}{6 } \sqrt{ \frac{1}{N} } \left( 3 \sqrt{a} \right)
= \frac{\epsilon \sqrt{\Vert u \Vert_1} }{2 N}.
\end{align}
Since $\Vert u \Vert_1 \geq 1$ by assumption, we have $\left| \tilde{a} - a \right| \leq \frac{\epsilon \Vert u \Vert_1 }{2 N}$.
Also, there is an inaccuracy arising from the additive error $\eta$ of each $u_j$. As it was assumed that
$\eta\leq \epsilon / (2N)$, the overall multiplicative error $\epsilon$ is obtained for the estimation.
For performing a single run of amplitude estimation with $K$ steps,
we require
$O(K) =O(\frac{\sqrt{ N }}{\epsilon} )$
queries to the oracles and
$O\left(\frac{\sqrt{ N }}{\epsilon} \left(\log N + \log (N / \epsilon) \right) \right)$
gates.
For part 3, rewrite the state from part 1
as
\begin{align}
\sqrt{ \frac{{\Vert u \Vert_1} }{N}} \sum_{j=1}^N& \sqrt{ \frac{u_j }{{\Vert u \Vert_1}}} \ket j \ket{0} + \sqrt{1-\frac{{\Vert u \Vert_1}}{N }} \sum_{j=1}^N \sqrt{\frac{1- u_j }{N -{\Vert u \Vert_1}} } \ket j \ket{1}.\nonumber
\end{align}
Now amplify the $\ket 0$ part using Amplitude Amplification [@brassard2002quantum] via the exponential search technique without knowledge of the normalization, to prepare
$\sum_{j=1}^N \ket j \sqrt{ \frac{u_j}{\Vert u \Vert_1}}$ with success probability $1-\delta$.
The amplification requires $O(\sqrt{ \frac{N }{ \Vert u \Vert_1}}\log (1/\delta))= O(\sqrt N \log (1/\delta))$ calls to the unitary of part 1, as ${\Vert u \Vert_1} \geq 1$. The gate complexity derives from the gate complexity of part 1.
Denote the $\eta$-additive approximation to $u_j$ by $\tilde u_j$,
and evaluate the $\ell_1$-distance of the probabilities.
First, $\left \vert \Vert u\Vert_1 - \Vert \tilde u\Vert_1\right \vert \leq N \eta$.
One obtains $\left \Vert \tilde p - \frac{u}{\Vert u\Vert_1} \right \Vert_1 =\left \Vert \frac{\tilde u}{{\Vert \tilde u\Vert_1}} - \frac{u}{\Vert u\Vert_1} \right \Vert_1 \leq \sum_j \left \vert \frac{\tilde u_j}{{\Vert \tilde u\Vert_1}} - \frac{u_j}{{\Vert \tilde u\Vert_1}} \right \vert + \sum_j \left \vert \frac{ u_j}{{\Vert \tilde u\Vert_1}} - \frac{u_j}{\Vert u\Vert_1}\right \vert \leq \frac{N \eta} {{\Vert \tilde u\Vert_1}} + \frac{ N \eta } {{\Vert \tilde u\Vert_1}}$.
We also obtain $$\frac{1}{ {\Vert \tilde u\Vert_1} } \leq \frac{1}{ \Vert u\Vert_1 -N\eta } \leq \frac{2}{ \Vert u\Vert_1}$$ for $\eta \leq \Vert u\Vert_1 / 2 N.$$
Since $\eta \leq \Vert u\Vert_1 \xi/(4N)$, the distance is $\left \Vert \tilde p - \frac{u}{\Vert u\Vert_1} \right \Vert_1 \leq \xi$ as desired.
```
```{lemma, inner-product-relative, name="Quantum inner product estimation with relative accuracy"}
Let $\epsilon,\delta \in(0,1)$.
Given quantum access to two vectors $u,v \in [0,1]^N$, where $u_j$ and $v_j$ are encoded to additive accuracy $\eta= O{1/N}$. Then, an estimate $I$ for the inner product can be provided such that $\vert I - u\cdot v /\Vert u\Vert_1 \vert \leq \epsilon\ u\cdot v /\Vert u\Vert_1$ with success probability $1-\delta$.
This estimate is obtained with $O\left(\frac{ \sqrt{N} }{\epsilon} \log \left (\frac{1}{\delta} \right )\right)$ queries and $\widetilde{O}\left(\frac{ \sqrt{N} }{\epsilon} \log \left (\frac{1}{\delta} \right ) \right)$ quantum gates.
```
```{proof}
Via lemma \@ref(lem:find-minimum-orig), determine $u_{\max}$ with success probability $1-\delta$ with $O\left(\sqrt N \log \frac{1}{\delta}\right)$ queries and $\widetilde{O}\left(\sqrt N \log \left( \frac{1}{\delta}\right )\right)$ quantum gates.
Apply lemma \@ref(lem:qsp-l1norm) with the vector $\frac{u}{ u_{\max}}$ to obtain an estimate $\Gamma_u$ of the norm $\left \Vert \frac{ u}{ u_{\max}} \right \Vert_1$ to relative accuracy $\epsilon_u= \epsilon/2$ with success probability $1-\delta$.
This estimation takes $O(\frac{ \sqrt{N} }{\epsilon} \log \left (\frac{1}{\delta} \right ) )$ queries and $\widetilde{O}(\frac{ \sqrt{N} }{\epsilon} \log \left (\frac{1}{\delta} \right ) )$ quantum gates.
Define the vector $z$ with $z_j = u_j v_j$.
Via lemma \@ref(lem:find-minimum-orig), determine $z_{\max}$ with success probability $1-\delta$ with $O\left(\sqrt N \log \frac{1}{\delta}\right)$ queries and $\widetilde{O}\left(\sqrt N \log \left( \frac{1}{\delta}\right )\right)$ quantum gates.
If $z_{\max} = 0$ up to numerical accuracy, the estimate is $I = 0$ and we are done.
Otherwise,
apply lemma \@ref(lem:qsp-l1norm) with the vector $\frac{ z}{ z_{\max}}$ to obtain an estimate $\Gamma_z$ of the norm $\left \Vert \frac{ z}{ z_{\max}} \right \Vert_1$ to relative accuracy $\epsilon_z = \epsilon/2$ with success probability $1-\delta$.
This estimation takes $O\left(\frac{ \sqrt{N} }{\epsilon} \log \left (\frac{1}{\delta} \right ) \right)$ queries and $\widetilde{O}\left(\frac{ \sqrt{N} }{\epsilon} \log \left (\frac{1}{\delta} \right) \right)$ quantum gates.
With lemma \@ref(lem:ratio-relative-general), which gives a nice bound for the ratio between two relative errors, we have
\begin{align}
\left \vert \frac{\Gamma_z}{\Gamma_u} - \frac{u_{\max}}{z_{\max}}\frac{u\cdot v}{\Vert u\Vert_1} \right \vert &\leq& \frac{u_{\max}}{z_{\max}} \frac{u\cdot v }{\Vert u\Vert_1 } \frac{\epsilon_z + \epsilon_u}{ (1-\epsilon_u)} \\
&\leq& 2\epsilon \frac{u_{\max}}{z_{\max}} \frac{u\cdot v }{\Vert u\Vert_1 },
\end{align}
since $\epsilon_u < 1/2$.
Set
\begin{align}
I= \frac{z_{\max}}{u_{\max}} \frac{\Gamma_z}{\Gamma_u},
\end{align}
and we have $\vert I - u\cdot v /\Vert u\Vert_1 \vert \leq 2\epsilon\ u\cdot v /\Vert u\Vert_1$.
The total success probability of the four probabilistic steps is at least $1-4\delta$ via a union bound (theorem \@ref(thm:unionbound)). Choosing $\epsilon \to \epsilon/2$ and $\delta \to \delta/4$ leads to the result.
```
```{lemma, inner-product-additive, name="Quantum inner product estimation with additive accuracy"}
Let $\epsilon,\delta \in(0,1)$.
Given quantum access to a non-zero vector $u \in [0,1]^N$ and another vector $v \in [-1,1]^N$, where $u_j$ and $v_j$ are encoded to additive accuracy $\eta= O{1/N}$. Then,
an estimate $I$ for the inner product can be provided such that $\vert I - u\cdot v / \Vert u\Vert_1 \vert \leq \epsilon$ with success probability $1-\delta$.
This estimate is obtained with $O\left(\frac{ \sqrt{N} }{\epsilon} \log \left (\frac{1}{\delta} \right ) \right)$ queries and $\widetilde{O}\left(\frac{ \sqrt{N} }{\epsilon} \log \left (\frac{1}{\delta} \right ) \right)$ quantum gates.
```
Note that as a byproduct, the value $u_{\max}$ and an estimate of $\Vert u /u_{\max} \Vert_1$ with relative accuracy $\epsilon$ can be provided with probability at least $1-\delta$.
```{proof}
Via lemma \@ref(lem:find-minimum-orig), determine $\Vert u\Vert_{\max}$ with success probability $1-\delta$ with $O\left(\sqrt N \log \frac{1}{\delta}\right)$ queries and $\widetilde{O}\left(\sqrt N \log \left( \frac{1}{ \eta}\right) \log \left( \frac{1}{\delta}\right )\right)$ quantum gates.
Apply lemma \@ref(lem:qsp-l1norm) with the vector $\frac{u}{ u_{\max}}$ to obtain an estimate $\Gamma_u$ of the norm $\left \Vert \frac{ u}{ u_{\max}} \right \Vert_1$ to relative accuracy $\epsilon_u= \epsilon/2$ with success probability $1-\delta$.
This estimation takes $O\left(\frac{ \sqrt{N} }{\epsilon} \log \left (\frac{1}{\delta} \right ) \right)$ queries and $\widetilde{O}\left(\frac{ \sqrt{N} }{\epsilon} \log \left (\frac{1}{\delta} \right ) \right)$ quantum gates.
Similarily, consider the vector $z$ with elements $z_j := u_j \left (v_j+3\right) \in [0,4]$.
Determine $\Vert z\Vert_{\max}$ with success probability $1-\delta$ with $O\left(\sqrt N \log \frac{1}{\delta}\right)$ queries and $\widetilde{O}\left(\sqrt N \log \left( \frac{1}{\delta}\right )\right)$ quantum gates.
Apply lemma \@ref(lem:qsp-l1norm) with the vector $z/ z_{\max}$ to obtain an estimate $\Gamma_z$ of the norm $\Vert z/ z_{\max} \Vert_1$ to relative accuracy $\epsilon_z = \epsilon/2$ with success probability $1-\delta$.
This estimation takes $O\left(\frac{ \sqrt{N} }{\epsilon} \log \left (\frac{1}{\delta} \right ) \right)$ queries and $\widetilde{O}\left(\frac{ \sqrt{N} }{\epsilon} \log \left (\frac{1}{\delta} \right ) \right)$.
It takes some steps to see that the exact quantities are related via
\begin{align}
\frac{u\cdot v}{\Vert u\Vert_1} = \frac{z_{\max} }{u_{\max} } \frac{ \Vert \frac{z}{ z_{\max}} \Vert_1 } { \Vert \frac{ u}{ u_{\max}} \Vert_1} - 3.
\end{align}
Considering the estimator $I = \frac{z_{\max} }{u_{\max} } \frac{ \Gamma_z } { \Gamma_u} - 3$, from lemma \@ref(lem:ratio-relative-general), we have
\begin{align}
\left \vert I - \frac{u\cdot v}{ \Vert u\Vert_1 } \right \vert &=& \frac{z_{\max} }{u_{\max} } \left \vert \frac{ \Gamma_z } { \Gamma_u} - \frac{ \Vert \frac{z}{ z_{\max}} \Vert_1 } { \Vert \frac{ u}{ u_{\max}} \Vert_1} \right \vert \\ &\leq&
\frac{ \epsilon_u + \epsilon_{z}}{1- \epsilon_u} \frac{ \Vert z \Vert_1 } { \Vert u\Vert_1} \leq 8 \epsilon. \nonumber
\end{align}
In the last steps we have used the observation that
\begin{align}
\frac{ \Vert z \Vert_1 } { \Vert u\Vert_1} \equiv \frac{\sum_j u_j (v_j+3)}{\sum_j u_j } \leq\frac{4 \sum_j u_j }{\sum_j u_j } = 4,
\end{align}
and $\epsilon_u < 1/2$.
All steps together take $O\left(\frac{\sqrt N}{\epsilon} \log \frac{1}{\delta}\right)$ queries and $\widetilde{O}\left(\frac{\sqrt N}{\epsilon} \log \left( \frac{1}{\delta}\right )\right)$ gates.
The total success probability of all the probabilistic steps is at least $1-4\delta$ via a union bound. Choosing $\epsilon \to \epsilon/8$ and $\delta \to \delta/4$ leads to the result.
```
## SVE-based quantum algorithms
In the following section, we will cover some quantum algorithms based on singular value estimation. Some of them are here just because they are simple enough to have a good pedagogical value, while some of them we believe will be really useful in performing data analysis.
### Estimation of the spectral norm and the condition number
<!--
# TODO Estimation of spectral norm and condition number from quantum access
# Let's understand properly the procedure from quantum gradient descent paper
# labels: help wanted,
-->
We will elaborate more on this result soon. For the moment, we report the main statement.
```{theorem, spectral-norm-estimation, name="Spectral norm estimation"}
Let there be quantum access to the matrix $A \in \R ^{n\times m}$, and let $\epsilon>0$ be a precision parameter. There exists a quantum algorithm that estimates $\|A\|$ to error $\epsilon\|A\|_F$ in time $\widetilde{O}\left(\frac{\log(1/\epsilon)}{\epsilon}\frac{\|A\|_F}{\|A\|}\right)$.
```
<!-- ```{proof}
TODO!
```
-->
### Explained variance {#sec:explainedvariance}
<!--
# TODO Add algos for estimating variance
# From Armando's thesis!
# labels: help wanted, enhancement
-->
The content of this section is extracted from a M.Sc. thesis [@bellante2020quantum] and from an soon-to-appear work.
The interested reader may wish to consult both for more details.
Let $A = U\Sigma V^T$ be the singluar value decomposition of a matrix $A \in \mathbb{R}^{n \times m}$. We call *factor scores* of $A$, and denote them with $\lambda_i = \sigma_i^2$, the squares of its singular values. Similarly, we call *factor score ratios* the relative magnitudes of the factor scores $\lambda^{(i)} = \frac{\lambda_i}{\sum_j^{rank(A)} \lambda_j} = \frac{\sigma_i^2}{\sum_j^{rank(A)}\sigma_j^2}$. The factor score ratios are a measure of the amount of variance explained by the singular values.
We state here some nice examples of SVE algorithms: the first allows us to assess whether a few singular values/factor scores explain most of the variance of the matrix (as it is the case in many visualization algorithms e.g. PCA, CA, etc.); the second one allows computing the cumulative sum of the factor score ratios associated to singular values grater or equal than a certain threshold; the third one is a modified version of the spectral norm estimation result and allows us to define a threshold for the smallest singular value such that the the sum of the above explains more than a given percentage of the total variance; finally, the last two algorithms allow retrieving a classical description of the singular vectors that correspond to the most relevant singular values.
The main intuition behind these algorithms is that it is possible to create the state
$\sum_{i}^{r} \sqrt{\lambda^{(i)}} \ket{u_i}\ket{v_i}\ket{\overline{\sigma}_i}$.
The first algorithm uses a parameter $\gamma$ to control how big a factor score ratio should be for the corresponding singular value/factor score to be measured.
If we choose $\gamma$ bigger than the least factor scores ratio of interest, only the biggest singular values/factor scores will be likely to be measured.
```{theorem, factor-score-estimation, name="Quantum factor score ratio estimation"}
Let there be quantum access to a matrix $A \in \R^{n \times m}$, with singular value decomposition $A = \sum_i\sigma_i u_i v^T_i$ and $\sigma_{max} \leq 1$.
Let $\gamma, \epsilon$ be precision parameters.
There exists a quantum algorithm that, in time $\widetilde{O}\left(\frac{1}{\gamma^2}\frac{\mu(A)}{\epsilon}\right)$, estimates:
- the factor score ratios $\lambda^{(i)}$, such that $\|\lambda^{(i)} - \overline{\lambda}^{(i)}\| \leq \gamma$, with high probability;
- the correspondent singular values $\sigma_i$, such that $\|\sigma_i - \overline{\sigma}_i\| \leq \epsilon$, with probability at least $1-1/\text{poly}(n)$;
- the correspondent factor scores $\lambda_i$, such that $\|\lambda_i - \overline{\lambda}_i\| \leq 2\epsilon$, with probability at least $1-1/\text{poly}(n)$.
```
```{proof}
We provide an algorithm that satisfies the above guarantees.
As a first step, one creates the state
$$\ket{A} = \frac{1}{\|A\|_F}\sum_i^n\sum_j^m a_{ij}\ket{i}\ket{j} = \frac{1}{\sum_j^r \sigma_j^2}\sum_i^r \sigma_i \ket{u_i}\ket{v_i}.$$
This step costs $\widetilde{O}(1)$, assuming that the data are stored in an adeguate data structure.
From this state we apply SVE in time $\widetilde{O}(\mu(A)/\epsilon)$
$$\ket{A'} = \frac{1}{\sum_j^r \sigma_j^2}\sum_i^r \sigma_i \ket{u_i}\ket{v_i}\ket{\overline{\sigma}_i}$$
encoding the singular values with absolute precision $\epsilon$.
If we ignore the first two registers, we have the state $\ket{A''} = \sum_i^r \sqrt{\lambda^{(i)}}\ket{\overline{\sigma}_i}$, from which we can measure the singular values (and factor scores) with probability equal to their factor score ratios.
To evaluate the number $S$ of measurements needed on $\ket{A''}$ to satisfy the guarantees of the theorem, we can model the measurement process as performing $r$ Bernoulli trials: one for each $\overline{\lambda}^{(i)}$, so that if we measure $\overline{\sigma}_i$ it is a success for the $i^{th}$ Bernoulli trial and a failure for all the others.
We use the estimator
$\overline{\lambda}^{(i)} = \frac{\zeta_{\overline{\sigma}_i}}{S}$,
where $\zeta_{\overline{\sigma}_i}$ is the number of times $\overline{\sigma}_i$ appears in the measurements and $S$ is the number of total measurements.
Given a confidence level $z$ and an absolute error $\gamma$, it is possible to use the Wald confidence interval to determine a value for $S$ such that $\|\lambda^{(i)} - \overline{\lambda}^{(i)}\| \leq \gamma$ with confidence level $z$.
It is possible to show that $\gamma \leq \frac{z}{2\sqrt{S}}$ \cite{quantumsupervisedschuld}, from which we get $S=\frac{z^2}{4\gamma^2}$.
Since $z$ is a small number, we can state that the complexity of the algorithm is
$\widetilde{O}\left(\frac{1}{\gamma^2}\frac{\mu(A)}{\epsilon}\right)$.
Finally, note that the error on the factor scores is $2\epsilon$. Suppose that we run SVE with precision $\tau$.
For each $\lambda_i$, the worst estimate is
$\overline{\lambda_i} = (\sigma_i \pm \tau)^2 = \sigma_i^2 \pm 2\tau\sigma_i + \tau^2$ and since $0 \leq \sigma_i \leq 1$, we can say that the worst case is
$\sigma_i^2 + (2\tau + \tau^2)$.
Solving the equation
$2\tau + \tau^2=\epsilon$ for $\tau>0$
leads to
$\tau = \sqrt{1+\epsilon} - 1$. Finally,
$$\tau = \sqrt{1+\epsilon} - 1 = \frac{(\sqrt{1+\epsilon} - 1)(\sqrt{1+\epsilon} + 1)}{(\sqrt{1+\epsilon} + 1)} = \frac{1+\epsilon - 1} {\sqrt{1+\epsilon} + 1} = \frac{\epsilon}{\sqrt{1+\epsilon} + 1} \sim \frac {\epsilon}{2}$$
which proves the error guarantees.
```
One limitation of the algorithm used in the proof is that the Wald confidence interval is not a simultaneous confidence interval.
For this reason, it is not guaranteed that all the factor score ratios estimates are computed with precision $\gamma$ with the same confidence.
We have stated the proof using the Wald confidence interval for the sake of simplicity and because we believe that it provides good estimates in practice.
To overcome this problem it is possible to adjust the confidence level according to the number of estimated factor scores (e.g. with a Bonferroni correction) or to slightly modify the algorithm to use $\ell_\infty$ tomography (Theorem \@ref(thm:tomellinfinity)) on $\ket{A''}$.
The modified version of the algorithm still has an asymptotic run-time of $\widetilde{O}\left(\frac{1}{\gamma^2}\frac{\mu(A)}{\epsilon}\right)$, while providing that all $\|\lambda^{(i)} - \overline{\lambda}^{(i)}\| \leq \gamma$ with probability at least $1 - 1/\text{poly}(r)$, where $r$ is the rank of $A$.
Often in data representations, the cumulative sum of the factor score ratios is a measure of the quality of the representation. By modifying the above proof to leverage $\ell_\infty$ tomography, the cumulative sum can be estimated as $\|\sum_i^k \lambda^{(i)} - \sum_i^k \overline{\lambda}^{(i)}\| \leq k\epsilon$ with probability $1-1/\text{poly}(r)$.
However, a slight variation of the algorithm of Theorem \@ref(thm:spectral-norm-estimation) provides a more accurate estimation in less time, given a threshold $\theta$ for the smallest singular value to retain.
```{theorem, check-explained-variance, name="Quantum check on the factor score ratios' sum"}
Let there be efficient quantum access to the matrix $A \in \mathbb{R}^{n \times m}$, with singular value decomposition $A = \sum_i\sigma_iu_iv^T_i$. Let $\eta, \epsilon$ be precision parameters and $\theta$ be a threshold for the smallest singular value to consider.
There exists a quantum algorithm that estimates $p = \frac{\sum_{i: \overline{\sigma}_i \geq \theta} \sigma_i^2}{\sum_j^r \sigma_j^2}$, where $\|\sigma_i - \overline{\sigma}_i\| \leq \epsilon$, to relative error $\eta$ in time $\widetilde{O}\left(\frac{\mu(A)}{\epsilon}\frac{1}{\eta \sqrt{p}}\right)$.
```
```{proof}
As reported in the proof above, creating the state
$\ket{A''} = \sum_i^r \sqrt{\lambda^{(i)}}\ket{\overline{\sigma}_i}$ costs $\widetilde{O}(\mu(A)/\epsilon)$.
From this state it is possible to append a quantum register that is $\ket{0}$ if $\overline{\sigma}_i < \theta$ and $\ket{1}$ otherwhise: $\ket{\varphi} = \sum_{i: \overline{\sigma}_i \geq \theta} \sqrt{\lambda^{(i)}}\ket{\overline{\sigma}_i}\ket{0} + \sum_{j: \overline{\sigma}_j < \theta} \sqrt{\lambda^{(i)}}\ket{\overline{\sigma}_i}\ket{1}$.
The probability of measuring $\ket{0}$ is $p=\sum_{i: \overline{\sigma}_i \geq \theta} \lambda^{(i)} = \frac{\sum_{i: \overline{\sigma}_i \geq \theta}\sigma_i^2}{\sum_j^r \sigma_j^2}$. Using amplitude estimation (Lemma \@ref(lem:amp-amp-est-simple)), we can estimate $p$ in time $\widetilde{O}\left(\frac{\mu(A)}{\sqrt{p}\epsilon}\right)$.
```
Moreover, an observation on the algorithm for spectral norm estimation allows to perform a binary search of $\theta$ given the desired sum of factor score ratios.
```{theorem, explained-variance-binarysearch, name="Quantum binary search for the singular value threshold"}
Let there be efficient quantum access to the matrix $A \in \mathbb{R}^{n \times m}$. Let $p$ be the factor ratios sum to retain. The threshold $\theta$ for the smallest singular value to retain can be estimated to absolute error $\epsilon$ in time $\widetilde{O}\left(\frac{\log(1/\epsilon)\mu(A)}{\epsilon\sqrt{p}}\right)$.
```
Both these procedures are very fast.
We will see that in problems such as PCA, CA, and LSA, the desired sum of factor score ratios to retain is a number in the range $p \in [1/3, 1]$ with precision up to the second decimal digit.
In practice, the complexity of these last two algorithms scales as $\widetilde{O}\left(\frac{\mu(A)}{\epsilon}\right)$.
After introducing the procedures to test for the most relevant singular values, factor scores and factor score ratios of $A$, we present an efficient routine to extract the corresponding right/left singular vectors.
Additionally, this subroutine can provide an estimation of the singular values greater than $\theta$, to absolute error $\epsilon$.
```{theorem, top-k-sv-extraction, name="Top-k singular vectors extraction"}
Let there be efficient quantum access to the matrix $A \in \mathbb{R}^{n \times m}$, with singular value decomposition $A = \sum_i^r \sigma_i u_i v_i^T$ and $\sigma_{max} \leq 1$. Let $\delta > 0$ be a precision parameter for the singular vectors, $\epsilon>0$ a precision parameter for the singular values, and $\theta>0$ be a threshold such that $A$ has $k$ singular values greater than $\theta$. Define $p=\frac{\sum_{i: \overline{\sigma}_i \geq \theta} \sigma_i^2}{\sum_j^r \sigma_j^2}$. There exist quantum algorithms that estimate:
- The top $k$ left singular vectors $u_i$ of $A$ with unit vectors $\overline{u}_i$ such that $||u_i-\overline{u}_i||_2 \leq \delta$ with probability at least $1-1/poly(n)$, in time $\widetilde{O}\left(\frac{1}{\theta}\frac{1}{\sqrt{p}}\frac{\mu(A)}{\epsilon}\frac{kn}{\delta^2}\right)$;
- The top $k$ right singular vectors $v_i$ of $A$ with unit vectors $\overline{v}_i$ such that $||v_i-\overline{v}_i||_2 \leq \delta$ with probability at least $1-1/poly(m)$, in time $\widetilde{O}\left(\frac{1}{\theta}\frac{1}{\sqrt{p}}\frac{\mu(A)}{\epsilon}\frac{km}{\delta^2}\right)$;
- The top $k$ singular values $\sigma_i$ and factor scores $\lambda_i$ of $A$ to precision $\epsilon$ and $2\epsilon$ with probability at least $1 - 1/\text{poly}(m)$, in time $\widetilde{O}\left(\frac{1}{\theta}\frac{1}{\sqrt{p}}\frac{\mu(A)k}{\epsilon}\right)$ or any of the two above.
```
```{proof}
As reported in the proofs above, creating the state
$\ket{A'} = \sum_i^r \sqrt{\lambda^{(i)}}\ket{u_i}\ket{v_i}\ket{\overline{\sigma}_i}$ costs $\widetilde{O}(\mu(A)/\epsilon)$.
On this state we append an additional register, set to $\ket{0}$ if $\overline{\sigma}_i \geq \theta$ and to $\ket{1}$ otherwise. The cost of this operation is negligible as it only depends on the binary encoding of $\ket{\overline{\sigma}_i}$.
On this state we perform amplitude amplification (Lemma \@ref(lem:amp-amp-est-simple)) on $\ket{0}$ and obtain the state
$$\ket{\varphi} = \sum_i^k \sqrt{\lambda^{(i)}}\ket{u_i}\ket{v_i}\ket{\overline{\sigma}_i}$$
with a superposition of the $k$ most relevant singular values and vectors. Creating $\ket{\varphi}$ costs $\widetilde{O}\left(\frac{\mu(A)}{\sqrt{p}\epsilon}\right)$.
On this state we append an additional register and perform a conditional rotation
$$\frac{C}{\sum_i^k\sigma_i^2}\sum_i^k \frac{\sigma_i}{\overline{\sigma}_i} \ket{u_i}\ket{v_i}\ket{\overline{\sigma}_i}\ket{0} +\frac{1}{\sum_i^k\sigma_i^2}\sum_i^k \sqrt{1-\frac{C^2}{\overline{\sigma}_i^2}}\ket{u_i}\ket{v_i}\ket{\overline{\sigma}_i}\ket{1}$$
The constant $C$ is a normalization factor in the order of $\widetilde{O}(1/\kappa(A^{(k)}))$ where $\kappa(A^{(k)}) = \frac{\sigma_{max}}{\sigma_{min}}$
is the condition number of the low-rank ($k$-rank) matrix
$A^{(k)}$.
Since for construction
$\sigma_{max} \leq 1$ and $\sigma_{min} \geq \theta$,
we can bound the condition number
$\kappa(A^{(k)}) \leq \frac{1}{\theta}.$
From the famous work of Harrow, Hassidim and Lloyd [@HarrowHassidim2009HHL] we know that applying amplitude amplification on the state above, with the the third register being
$\ket{0}$,
would cost
$\widetilde{O}(\kappa(A^{(k)}) T(U)$, where $T(U)$ is the run-time to create the state.
In our case, the cost till this step is
$\widetilde{O}\left(\frac{1}{\theta}\frac{1}{\sqrt{p}}\frac{\mu(A)}{\epsilon}\right)$, and
amplitude amplification leaves the registers in the state
$$\frac{1}{\sqrt{\sum_{i}^k \frac{\sigma_i^2}{\overline{\sigma}_i^2}}}\sum_i^k\frac{\sigma_i}{\overline{\sigma}_i} \ket{u_i}\ket{v_i}\ket{\overline{\sigma}_i} \sim
\frac{1}{\sqrt{k}}\sum_i^k \ket{u_i}\ket{v_i}\ket{\overline{\sigma}_i}
$$
where
$\overline{\sigma}_i \in [\sigma_i - \epsilon, \sigma_i + \epsilon]$
and
$\frac{\sigma_i}{\sigma_i \pm \epsilon} \rightarrow 1$
for
$\epsilon \rightarrow 0$.
The last step of the proof is to compute the time complexity of the state-vector tomography on $\frac{1}{\sqrt{k}}\sum_i^k \ket{u_i}\ket{v_i}\ket{\overline{\sigma}_i}$.
When measuring the last register of state in the computational basis and obtaining $\ket{\overline{\sigma}_i}$, the first two registers collapse in the state $\ket{u_i}\ket{v_i}$.
On $\ket{u_i}\ket{v_i}$, it is possible to perform $\ell_2$ vector-state tomography using Theorem \@ref(thm:tomelle2) either on the first register, to retrieve $\overline{u}_i$, or on the second one, to retrieve $\overline{v}_i$.
Since $u_i \in R^n$ and $v_i \in R^m$, performing state-vector tomography
on the first register
takes time
$O(\frac{n\log{n}}{\delta^2})$
and performing it on the second takes time
$O(\frac{m\log{m}}{\delta^2})$.
Using a coupon collector's argument, if the $k$ states
$\ket{\overline{\sigma}_i}$
are uniformly distributed, to get all the $k$ possible couples
$\ket{u_i}\ket{v_i}$
at least once we would need
$k\log k$
measurements on average.
This proves that it is possible to estimate all the singular values greater than $\theta$, with the guarantees above, in time $\widetilde{O}(\frac{1}{\theta}\frac{1}{\sqrt{p}}\frac{\mu(A)k}{\epsilon})$.
To perform tomography on each state-vector, one should satisfy the coupon collector the same number of times as the measurements needed by the tomography procedure.
The costs of the tomography for all the vectors $\{\overline{u}_i\}_i^k$ and $\{\overline{v}_i\}_i^k$ are
$O\left(T(U)\frac{k\log{k}\cdot n\log{n}}{\delta^2}\right)$,
and
$O\left(T(U)\frac{k\log{k}\cdot m\log{m}}{\delta^2}\right)$, where $T(U)$ is the run-time to create the state on which to perform tomography.
It easily follows that the following complexities are proven:
$\widetilde{O}\left(\frac{1}{\theta}\frac{1}{\sqrt{p}}\frac{\mu(A)}{\epsilon}\frac{kn}{\delta^2}\right), \widetilde{O}\left(\frac{1}{\theta}\frac{1}{\sqrt{p}}\frac{\mu(A)}{\epsilon}\frac{km}{\delta^2}\right).$
```
In the experiments paragraph \@ref(sec:runtime-analysis), we provide experiments that show that the coupon collector's prediction is accurate for practical $\epsilon$.
```{corollary, top-k-sv-linfty, name="Fast top-k singular vectors extraction"}
The run-times of the theorem above can be improved to
$\widetilde{O}\left(\frac{1}{\theta}\frac{1}{\sqrt{p}}\frac{\mu(A)}{\epsilon}\frac{k}{\delta^2}\right)$ with estimation guarantees on the $\ell_\infty$ norms.
```
The proof of this corollary consists in a variant of the proof above that uses $\ell_\infty$ tomography (Theorem \@ref(thm:tomellinfinity)) to extract the singular vectors.
Besides $p$ being negligible, it is interesting to note that the parameter $\theta$ can be computed using:
- the procedures of Theorems \@ref(thm:factor-score-estimation) and \@ref(thm:check-explained-variance);
- the binary search of Theorem \@ref(thm:explained-variance-binarysearch);
- the available literature on the type of data stored in the input matrix $A$.
About the latter, for instance, the experiments of [@KL18] on the run-time parameters of the polynomial expansions of the MNIST dataset support this expectation:
even though in qSFA they keep the $k$ smallest singular values and refer to $\theta$ as the biggest singular value to retain, this value does not vary much when the the dimensionality of their dataset grows.
In the experiments chapter, we observe similar $\theta$s in different datasets for image classification.
Note that, given a vector with $d$ non-zero entries, performing $\ell_\infty$ tomography with error $\frac{\delta}{\sqrt{d}}$ provides the same guarantees of $\ell_2$ tomography with error $\delta$.
In practice, this result implies that the extraction of the singular vectors, with $\ell_2$ guarantees, can be faster if we can assume some prior assumptions on their sparseness: $\widetilde{O}\left(\frac{1}{\theta}\frac{1}{\sqrt{p}}\frac{\mu(A)}{\epsilon}\frac{kd}{\delta^2}\right)$.
### Singular value estimation of a product of two matrices
This is an example of an algorithm that has been superseded by recent development in singular value transformation. Nevertheless, it is a non-trivial way of using SVE, which a nice mathematical error analysis.
```{theorem, sve-product, name="SVE of product of matrices"}
Assume to have quantum access to matrices $P \in \mathbb{R}^{d\times d}$ and $Q \in \mathbb{R}^{d \times d}$. Define $W=PQ = U\Sigma V^T$ and $\epsilon > 0$ an error parameter. There is a quantum algorithm that with probability at least $1-poly(d)$ performs the mapping $\sum_{i}\alpha\ket{v_i} \to \sum_{i}\alpha_i\ket{v_i}\ket{\overline{\sigma_i}}$ where $\overline{\sigma_i}$ is an approximation of the eigenvalues $\sigma_i$ of $W$ such that $|\sigma_i - \overline{\sigma}_i | \leq \epsilon$, in time $\tilde{O}\left(\frac{ ( \kappa(P) + \kappa(Q) ) (\mu(P)+\mu(Q))}{\varepsilon}\right)$.
```
```{proof}
We start by noting that for each singular value $\sigma_{i}$ of $W$ there is a corresponding eigenvalue $e^{-i\sigma_i}$ of the unitary matrix $e^{-iW}$. Also, we note that we know how to multiply by $W$ by applying theorem \@ref(thm:matrix-algebra-sve) sequentially with $Q$ and $P$. This will allow us to approximately apply the unitary $U=e^{-iW}$. The last step will consist of the application of phase estimation to estimate the eigenvalues of $U$ and hence the singular values of $W$. Note that we need $W$ to be a symmetric matrix because of the Hamiltonian simulation part. In case $W$ is not symmetric, we redefine it as
$$
W =
\begin{bmatrix}
0 & PQ\\
(PQ)^T & 0
\end{bmatrix}
$$
Note we have $W=M_1M_2$ for the matrices $M_1, M_2$ stored in QRAM and defined as
$$
M_1 = \begin{bmatrix}
P & 0\\
0 & Q^T
\end{bmatrix},
M_2 = \begin{bmatrix}
0 & Q\\
P^T & 0
\end{bmatrix}.
$$
We now show how to approximately apply $U=e^{-iW}$ efficiently. Note that for a symmetric matrix $W$ we have $W=V\Sigma V^T$ and using the Taylor expansion of the exponential function we have
$$U = e^{-iW} = \sum_{j=0}^{\infty} \frac{(-iW)^j}{j!} = V \sum_{j=0}^{\infty} \frac{(-i\Sigma)^j}{j!} V^T$$
With $\widetilde{U}$ we denote our first approximation of $U$, where we truncate the sum after $\ell$ terms.
$$\widetilde{U} = \sum_{j=0}^{\ell} \frac{(-iW)^j}{j!} = V \sum_{j=0}^{\ell} \frac{(-i\Sigma)^j}{j!} V^T$$
We want to chose $\ell$ such that $\norm{U - \widetilde{U}} < \epsilon/4$. We have:
\begin{eqnarray*}
\norm{U - \widetilde{U}} & \leq & || \sum_{j=0}^{\infty} \frac{(-iW)^j}{j!} - \sum_{j=0}^{\ell} \frac{(-iW)^j}{j!} || \leq
|| \sum_{j={\ell+1}}^{\infty} \frac{(-iW)^j}{j!} || \leq \sum_{j={\ell+1}}^{\infty} || \frac{(-iW)^j}{j!} || \leq \sum_{j={\ell+1}}^{\infty} \frac{1}{j!} \\
& \leq & \sum_{j={\ell+1}}^{\infty} \frac{1}{2^{j-1}} \leq 2^{-\ell +1}
\end{eqnarray*}
where we used triangle inequality and that $\norm{W^j}\leq 1$. Choosing $\ell = O(\log 1/ \varepsilon)$ makes the error less than $\epsilon/4$.
%We can approximate a positive series where the term $a_n$ satisfy the following two conditions: $0 \leq a_n \leq Kr^n$ with $K>0, 0<r<1$ by expressing the error as the geometric series $\frac{Kr^{N+1}}{1-r}$. In our case $K=1$ and $r=1/2$. For a given $\epsilon$ we have to find $\ell$ such that $\frac{(\frac{1}{2})^{\ell+1}}{1-(\frac{1}{2})} \leq \epsilon$. By taking $\ell = O(\log 1/\epsilon)$ we can easily satisfy the error guarantee.
In fact, we cannot apply $\widetilde{U}$ exactly but only approximately, since we need to multiply with the matrices $W^j, j\in[\ell]$ and we do so by using the matrix multiplication algorithm for the matrices $M_1$ and $M_2$. For each of these matrices, we use an error of $\frac{\epsilon}{8 \ell}$ which gives an error for $W$ of $\frac{\epsilon}{4 \ell}$ and an error for $W^j$ of at most $\frac{\epsilon}{4}$. The running time for multiplying with each $W^j$ is at most $O(\ell ( \kappa(M_1)\mu(M_1) \log( 8\ell/\epsilon) + \kappa(M_2)\mu(M_2) \log( 8\ell/\epsilon) ))$ by multiplying sequentially. Hence, we will try to apply the unitary ${U}$ by using the Taylor expansion up to level $\ell$ and approximating each $W^j, j\in [\ell]$ in the sum through our matrix multiplication procedure that gives error at most $\frac{\epsilon}{4}$.
In order to apply $U$ on a state $\ket{x} = \sum_{i} \alpha_i \ket{v_i}$, let's assume $\ell+1$ is a power of two and define $N_l = \sum_{j=0}^l (\frac{(-i)^j}{j!})^2$. We start with the state
$$\frac{1}{\sqrt{N_l}}\sum_{j=0}^l \frac{-i^j}{j!}\ket{j}\ket{x}$$
Controlled on the first register we use our matrix multiplication procedure to multiply with the corresponding power of $W$ and get a state at most $\epsilon/4$ away from the state
$$\frac{1}{\sqrt{N_l}}\sum_{j=0}^l \frac{-i^j}{j!}\ket{j}\ket{W^jx}.$$
We then perform a Hadamard on the first register and get a state $\epsilon/4$ away from the state
$$\frac{1}{\sqrt{\ell}} \ket{0} \left( \frac{1}{\sqrt{N'}} \sum_{j=0}^l \frac{-i^j}{j!} \ket{W^jx}\right) + \ket{0^\bot} \ket{G}$$
where $N'$ just normalizes the state in the parenthesis. Note that after the Hadamard on the first register, the amplitude corresponding to each $\ket{i}$ is the first register is the same. We use this procedure inside an amplitude amplification procedure to increase the amplitude $1/\sqrt{\ell}$ of $\ket{0}$ to be close to 1, by incurring a factor $\sqrt{\ell}$ in the running time. The outcome will be a state $\epsilon/4$ away from the state
$$\left( \frac{1}{\sqrt{N'}} \sum_{j=0}^l \frac{-i^j}{j!} \ket{W^jx}\right) = \ket{\tilde{U}x}$$
which is the application of $\widetilde{U}$. Since $\norm{U - \widetilde{U}} \leq \epsilon/4$, we have that the above procedure applies a unitary $\overline{U}$ such that $\norm{U - \overline{U}} \leq \epsilon/2$. Note that the running time of this procedure is given by the amplitude amplification and the time to multiply with $W^j$, hence we have that the running time is
$$O(\ell^{3/2} ( \kappa(M_1)\mu(M_1) \log( 8\ell/\epsilon) + \kappa(M_2)\mu(M_2) \log( 8\ell/\epsilon) )$$
Now that we know how to apply $\overline{U}$, we can perform phase estimation on it with error $\epsilon/2$. This provides an algorithm for estimating the singular values of $W$ with overall error of $\epsilon$. The final running time is
$$O(\frac{\ell^{3/2}}{\epsilon} ( \kappa(M_1)\mu(M_1) \log( 8\ell/\epsilon) + \kappa(M_2)\mu(M_2) \log( 8\ell/\epsilon) )$$
We have $\mu(M_1)=\mu(M_2)= \mu(P)+\mu(Q)$ and $\kappa(M_1)=\kappa(M_2) = \frac{max \{ \lambda^{P}_{max}, \lambda^{Q}_{max} \}}{ min \{ \lambda^{P}_{min}, \lambda^{Q}_{min} \}} \leq \kappa(P)+\kappa(Q)$, and since $\ell=O(\log 1/\epsilon)$the running time can be simplified to
$$\tilde{O}(\frac{ ( \kappa(P) + \kappa(Q))(\mu(P)+\mu(Q))}{\epsilon} ).$$
```
## Estimating average and variance of a function
Albeit the ideas treated in this post are somehow well-digested in the mind of many quantum algorithms developers, I decided to write this page inspired by [@woerner2019quantum]. Here I want to describe just the main technique employed by their algorithm (namely, how to use amplitude estimation to get useful information out of a function).
Suppose we have a random variable $X$ described by a certain probability distribution over $N$ different outcomes, and a function $f: \{0,\cdots N\} \to [0,1]$ defined over this distribution. How can we use quantum computers to evaluate some properties of $f$ such as expected value and variance faster than classical computers?
Let's start by translating into the quantum realm these two mathematical objects. The probability distribution is (surprise surprise) represented in our quantum computer by a quantum state over $n=\lceil \log N \rceil$ qubits.
$$\ket{\psi} = \sum_{i=0}^{N-1} \sqrt{p_i} \ket{i} $$
where the probability of measuring the state $\ket{i}$ is $p_i,$ for $p_i \in [0, 1]$. Basically, each bases of the Hilbert space represent an outcome of the random variable.
The quantization of the function $f$ is represented by a linear operator $F$ acting on a new ancilla qubit (here on the right) as:
$$ F: \ket{i}\ket{0} \to \ket{i}\left(\sqrt{1-f(i)}\ket{0} + \sqrt{f(i)}\ket{1}\right) $$
If we apply $F$ with $\ket{\psi}$ as input state we get:
$$ \sum_{i=0}^{N-1} \sqrt{1-f(i)}\sqrt{p_i}\ket{i}\ket{0} + \sum_{i=0}^{N-1} \sqrt{f(i)}\sqrt{p_i}\ket{i}\ket{1} $$
Observe that the probability of measuring $\ket{1}$ in the ancilla qubit is $\sum_{i=0}^{N-1}p_if(i)$, which is $E[f(X)]$.
By sampling the ancilla qubit we won't get any speedup compared to a classical randomized algorithm with oracle access to the function, but applying [amplitude estimation](https://arxiv.org/abs/quant-ph/0005055) [@brassard2002quantum %] to the ancilla qubit on the right, we can get an estimate of $E[F(X)]$ with precision $\epsilon$, quadratically faster than a classical computer: in only $O(\frac{1}{\epsilon})$ queries to $f$.
Finally, observe that:
- if we chose $f(i)=\frac{i}{N-1}$ we are able to estimate $E[\frac{X}{N-1}]$ (which, since we know $N$ gives us an estimate of the expected value $E[X]$)
- if we chose $f(i)=\frac{i^2}{(N-1)^2}$ instead, we can estimate $E[X^2]$ and using this along with the previous choice of $f$ we can estimate the variance of $X$: $E[X^2] - E[X]^2$.
### Log-determinant
A very simple example of the utility of the SVE subroutines is to estimate quantities associated to a given matrix. In this case we are going to study the log-determinant. As the name sais, this is just the logarithm of the determinant of a (symmetric positive definite) SPD matrix.
```{definition, log-determinant, name="Log-determinant of an SPD matrix"}
Let $A\in \mathbb{R}^{n \times n}$ be a SPD matrix with singular value decomposition $A=U\SigmaV^T$. The log-determinant of $A$ is defined as:
$$\log\det(A)=\log(\prod_i^n \sigma_i) = \sum_i^n \log(\sigma_i)$$
```
Please keep in mind that this is *not* the fastest algorithm for estimating the log-determinant (we will see that in the appropriate chapter on spectral sums), but it's worth mentioning here because it perhaps the first thing one would try to do in order to estimate this quantity. It also is a good example of the power of quantum singular value estimation, and checking the correctness of this proof might be a good exercise to learn more some mathematical tricks that are very useful to upper bound quantities that appear in the error analysis or the runtime analysis of algorithms.
```{r, algo-logdet-sve, fig.cap="SVE based algorithm to estimatethe log-determinant of a matrix", echo=FALSE}
knitr::include_graphics("algpseudocode/logdet-sve.png")
```
```{theorem, logdet-sve, name="SVE based algorithm for log-determinant"}
Assuming to have quantum access to an SPD matrix $A$, the algorithm in figure \@ref(algo:logdet-sve) returns
$\overline{\log\det(A)}$ such that $|\overline{\log\det(A)} - \log\det(A)| < \epsilon |\log\det(A)|$ in time
$\widetilde{O}(\mu \kappa^3/\epsilon^2).$
```
```{proof}
We can rewrite the quantum state encoding the representation of $A$ (which we can create with quantum access to $A$) as follow:
\begin{equation}
|A\rangle = \frac{1}{\|A\|_F} \sum_{i,j=1}^n a_{ij}|i,j\rangle = \frac{1}{\|A\|_F} \sum_{j=1}^n \sigma_j |u_j\rangle|u_j\rangle.
\end{equation}
Starting from the state $\ket{A}$, we can apply SVE (see lemma \@ref(thm:sve-theorem) to $\ket{A}$ up to precision $\epsilon_1$ to obtain
$$\frac{1}{\|A\|_F} \sum_{j=1}^n \sigma_j |u_j\rangle|u_j\rangle |\tilde{\sigma}_j\rangle,$$
where $|\tilde{\sigma}_j-\sigma_j|\leq \epsilon_1$.
Since $\norm{A} \leq 1$, using controlled operations, we can prepare
\begin{align}
\frac{1}{\|A\|_F} \sum_{i=1}^n \sigma_j |u_j\rangle|u_j\rangle
\ket{\tilde{\sigma}_j}
\left(
C\frac{\sqrt{-\log \tilde{\sigma}_j}}{\tilde{\sigma}_j}\ket{0}
+ \ket{0^\bot}
\right),
(\#eq:tracelogdet)
\end{align}
where $C=\min_j \tilde{\sigma}_j/\sqrt{|\log \tilde{\sigma}_j|} \approx \sigma_{\min}/\sqrt{|\log \sigma_{\min}|}
=1/\kappa\sqrt{\log \kappa}$.
The probability of $\ket{0}$ is
$$P = -\frac{C^2}{\|A\|_F^2} \sum_{j=1}^n \frac{\sigma_j^2}{\tilde{\sigma}_j^2} \log \tilde{\sigma}_j.$$
First, we do the error analysis.
Note that
\begin{align}
\left| \sum_{j=1}^n \frac{\sigma_j^2}{\tilde{\sigma}_j^2} \log \tilde{\sigma}_j - \sum_{j=1}^n \log \sigma_j \right|
&\leq& \left|\sum_{j=1}^n \frac{\sigma_j^2}{\tilde{\sigma}_j^2} \log \tilde{\sigma}_j - \sum_{j=1}^n \frac{\sigma_j^2}{\tilde{\sigma}_j^2}\log \sigma_j \right|
+ \left|\sum_{j=1}^n \frac{\sigma_j^2}{\tilde{\sigma}_j^2} \log\sigma_j - \sum_{j=1}^n \log \sigma_j \right| \\
&\leq&
\sum_{j=1}^n \frac{\sigma_j^2}{\tilde{\sigma}_j^2} |\log \tilde{\sigma}_j - \log \sigma_j | + \sum_{j=1}^n \frac{|\sigma_j^2-\tilde{\sigma}_j^2|}{\tilde{\sigma}_j^2} |\log \sigma_j | \\
&\leq& \sum_{j=1}^n (1 + \frac{\epsilon_1}{\tilde{\sigma}_j})^2 (\frac{\epsilon_1}{\tilde{\sigma}_j}+O(\frac{\epsilon_1^2}{\tilde{\sigma}_j^2})) + (2\kappa\epsilon_1 +\kappa^2\epsilon_1^2)
|\log\det(A)| \\
&\leq& n (\kappa\epsilon_1+O(\kappa^2\epsilon_1^2)) + (2\kappa\epsilon_1 +\kappa^2\epsilon_1^2) |\log \det(A)| \\
&=& (n+2|\log \det(A)|) (\kappa\epsilon_1+O(\kappa^2\epsilon_1^2)).
\end{align}
In the third inequality, we use the result
that $\sigma_j\leq \tilde{\sigma}_j+\epsilon_1$.
Denote $P'$ as the $\epsilon_2$-approximation of $P$ obtained by amplitude estimation, then the above analysis shows that
$-\|A\|_F^2P'/C^2$ is an $(n+2|\log \det(A)|) (\kappa \epsilon_1 + O(\kappa^2 \epsilon_1^2)) +\epsilon_2\|A\|_F^2/C^2$
approximation of $\log\det(A)$. Note that
\begin{align}
&& (n+2|\log \det(A)|) (\kappa \epsilon_1 +
O(\kappa^2 \epsilon_1^2))
+\epsilon_2\|A\|_F^2/C^2 \\
&=&
(n+2|\log \det(A)|) (\kappa \epsilon_1 +
O(\kappa^2 \epsilon_1^2))
+\epsilon_2 \|A\|_F^2 \kappa^2\log \kappa \\
&\leq&
(n+2n\log \kappa) (\kappa \epsilon_1 +
O(\kappa^2 \epsilon_1^2))
+n\epsilon_2 \kappa^2\log \kappa \\
&=& O(n\epsilon_1\kappa\log \kappa+n\epsilon_2\kappa^2\log \kappa).
\end{align}
To make sure the above error is bounded by $n\epsilon$ it suffcies to choose
$\epsilon_1=\epsilon/\kappa\log \kappa$
and $\epsilon_2=\epsilon/\kappa^2\log \kappa$.
Now we do the runtime analysis. The runtime of the algorithm mainly comes from the using of SVE and the performing of amplitude estimation on the state in (\@ref(eq:tracelogdet)).
Using quantum singular value estimation, the complexity to obtain the state
\@ref(eq:tracelogdet) is $\widetilde{O}(\mu /\epsilon_1)$.
The complexity to perform amplitude estimation is $\widetilde{O}(\mu /\epsilon_1\epsilon_2)=\widetilde{O}(\mu \kappa^3(\log \kappa)^2/\epsilon^2)$.
```
## Hamiltonian simulation
These notes based on Childs' [Lecture notes](https://www.cs.umd.edu/~amchilds/qa/), i.e. [@ChildsLectureNotes], Section 5
### Introduction to Hamiltonians
The only way possible to start a chapter on Hamiltonian simulation would
be to start from the work of Feynman, who had the first intuition on the
power of quantum mechanics for simulating physics with computers. We
know that the Hamiltonian dynamics of a closed quantum system, weather