-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathbernsupp.html
executable file
·2092 lines (1600 loc) · 208 KB
/
bernsupp.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html><html xmlns:dc="http://purl.org/dc/terms/" itemscope itemtype="http://schema.org/Article"><head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><title>Supplemental Notes for Bernoulli Factory Algorithms</title><meta name="citation_title" content="Supplemental Notes for Bernoulli Factory Algorithms"><meta name="citation_pdf_url" content="https://peteroupc.github.io/bernsupp.pdf"><meta name="citation_url" content="https://peteroupc.github.io/bernsupp.html"><meta name="citation_date" content="2025/02/21"><meta name="citation_online_date" content="2025/02/21"><meta name="og:title" content="Supplemental Notes for Bernoulli Factory Algorithms"><meta name="og:type" content="article"><meta name="og:url" content="https://peteroupc.github.io/bernsupp.html"><meta name="og:site_name" content="peteroupc.github.io"><meta name="twitter:title" content="Supplemental Notes for Bernoulli Factory Algorithms"><meta name="author" content="Peter Occil"/><meta name="citation_author" content="Peter Occil"/><meta name="viewport" content="width=device-width"><link rel=stylesheet type="text/css" href="/style.css">
<script type="text/x-mathjax-config"> MathJax.Hub.Config({"HTML-CSS": { availableFonts: ["STIX","TeX"], linebreaks: { automatic:true }, preferredFont: "TeX" },
tex2jax: { displayMath: [ ["$$","$$"], ["\\[", "\\]"] ], inlineMath: [ ["$", "$"], ["\\\\(","\\\\)"] ], processEscapes: true } });
</script><script type="text/javascript" async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-AMS_HTML-full"></script></head><body> <div class="header">
<nav><p><a href="#navigation">Menu</a> - <a href="#top">Top</a> - <a href="/">Home</a></nav></div>
<div class="mainarea" id="top">
<h1 id="supplemental-notes-for-bernoulli-factory-algorithms">Supplemental Notes for Bernoulli Factory Algorithms</h1>
<p><a href="mailto:[email protected]"><strong>Peter Occil</strong></a></p>
<p><a id="Contents"></a></p>
<h2 id="contents">Contents</h2>
<ul>
<li><a href="#Contents"><strong>Contents</strong></a></li>
<li><a href="#About_This_Document"><strong>About This Document</strong></a></li>
<li><a href="#Definitions"><strong>Definitions</strong></a></li>
<li><a href="#General_Factory_Functions"><strong>General Factory Functions</strong></a>
<ul>
<li><a href="#Building_the_Lower_and_Upper_Polynomials"><strong>Building the Lower and Upper Polynomials</strong></a></li>
<li><a href="#Another_General_Algorithm"><strong>Another General Algorithm</strong></a></li>
<li><a href="#Request_for_Additional_Methods"><strong>Request for Additional Methods</strong></a></li>
</ul>
</li>
<li><a href="#Approximate_Bernoulli_Factories"><strong>Approximate Bernoulli Factories</strong></a></li>
<li><a href="#Achievable_Simulation_Rates"><strong>Achievable Simulation Rates</strong></a></li>
<li><a href="#Notes"><strong>Notes</strong></a></li>
<li><a href="#Appendix"><strong>Appendix</strong></a>
<ul>
<li><a href="#Examples_of_Well_Behaved_Functions"><strong>Examples of Well-Behaved Functions</strong></a></li>
<li><a href="#Results_Used_in_Approximate_Bernoulli_Factories"><strong>Results Used in Approximate Bernoulli Factories</strong></a></li>
<li><a href="#How_Many_Coin_Flips_Are_Needed_to_Simulate_a_Polynomial"><strong>How Many Coin Flips Are Needed to Simulate a Polynomial?</strong></a></li>
<li><a href="#Proofs_for_Polynomial_Building_Schemes"><strong>Proofs for Polynomial-Building Schemes</strong></a>
<ul>
<li><a href="#A_Conjecture_on_Polynomial_Approximation"><strong>A Conjecture on Polynomial Approximation</strong></a></li>
<li><a href="#Example_of_Polynomial_Building_Scheme"><strong>Example of Polynomial-Building Scheme</strong></a></li>
</ul>
</li>
<li><a href="#Which_functions_don_t_require_outside_randomness_to_simulate"><strong>Which functions don’t require outside randomness to simulate?</strong></a></li>
<li><a href="#Multiple_Output_Bernoulli_Factory"><strong>Multiple-Output Bernoulli Factory</strong></a></li>
<li><a href="#Pushdown_Automata_and_Algebraic_Functions"><strong>Pushdown Automata and Algebraic Functions</strong></a>
<ul>
<li><a href="#Finite_State_and_Pushdown_Generators"><strong>Finite-State and Pushdown Generators</strong></a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#License"><strong>License</strong></a></li>
</ul>
<p><a id="About_This_Document"></a></p>
<h2 id="about-this-document">About This Document</h2>
<p><strong>This is an open-source document; for an updated version, see the</strong> <a href="https://github.com/peteroupc/peteroupc.github.io/raw/master/bernsupp.md"><strong>source code</strong></a> <strong>or its</strong> <a href="https://github.com/peteroupc/peteroupc.github.io/blob/master/bernsupp.md"><strong>rendering on GitHub</strong></a><strong>. You can send comments on this document on the</strong> <a href="https://github.com/peteroupc/peteroupc.github.io/issues"><strong>GitHub issues page</strong></a><strong>. See</strong> “<a href="https://peteroupc.github.io/bernreq.html"><strong>Open Questions on the Bernoulli Factory Problem</strong></a>” <strong>for a list of things about this document that I seek answers to.</strong></p>
<p>My audience for this article is <strong>computer programmers with mathematics knowledge, but little or no familiarity with calculus</strong>. It should be read in conjunction with the article “<a href="https://peteroupc.github.io/bernoulli.html"><strong>Bernoulli Factory Algorithms</strong></a>”.</p>
<p>I encourage readers to implement any of the algorithms given in this page, and report their implementation experiences. In particular, <a href="https://github.com/peteroupc/peteroupc.github.io/issues/18"><strong>I seek comments on the following aspects</strong></a>:</p>
<ul>
<li>Are the algorithms in this article (in conjunction with “Bernoulli Factory Algorithms”) easy to implement? Is each algorithm written so that someone could write code for that algorithm after reading the article?</li>
<li>Does this article have errors that should be corrected?</li>
<li>Are there ways to make this article more useful to the target audience?</li>
</ul>
<p>Comments on other aspects of this document are welcome.</p>
<p><a id="Definitions"></a></p>
<h2 id="definitions">Definitions</h2>
<p>This section describes certain math terms used on this page for programmers to understand.</p>
<p>The <em>closed unit interval</em> (written as [0, 1]) means the set consisting of 0, 1, and every real number in between.</p>
<p>The following terms can describe a function $f(x)$, specifically how “well-behaved” $f$ is.</p>
<ul>
<li>A <em>continuous</em> function $f$ has the property that there is a function $h(x, \epsilon)$ (where $x$ is in $f$’s domain and $\epsilon>0$), such that $f(x)$ and $f(y)$ are less than $\epsilon$ apart whenever $x$ and $y$ are in $f$’s domain and less than $h(x, \epsilon)$ apart.<br />Roughly speaking, for each $x$ in $f$’s domain, $f(x)$ and $f(y)$ are “close” if $x$ and $y$ are “close” and belong in the domain.</li>
<li>If $f$ is continuous, its <em>derivative</em> is, roughly speaking, its “slope” function or “velocity” function or “instantaneous-rate-of-change” function. The derivative (or <em>first derivative</em>) is denoted $f’$ or $f^{(1)}$. The <em>second derivative</em> (“slope-of-slope”) of $f$, denoted $f’’$ or $f^{(2)}$, is the derivative of $f’$; the <em>third derivative</em>, denoted $f^{(3)}$, is the derivative of $f’’$; and so on. The <em>0-th derivative</em> of a function $f$ is $f$ itself and denoted $f^{(0)}$.</li>
<li>A <a href="https://en.wikipedia.org/wiki/Hölder_condition"><strong><em>Hölder continuous</em></strong></a> function (with <em>M</em> being the <em>Hölder constant</em> and <em>α</em> being the <em>Hölder exponent</em>) is a continuous function <em>f</em> such that <em>f</em>(<em>x</em>) and <em>f</em>(<em>y</em>) are no more than <em>M</em>*<em>δ</em><sup><em>α</em></sup> apart whenever <em>x</em> and <em>y</em> are in the function’s domain and no more than <em>δ</em> apart.<br />Here, <em>α</em> satisfies 0 < <em>α</em> ≤ 1.<br />Roughly speaking, the function’s “steepness” is no greater than that of <em>M</em>*<em>x</em><sup><em>α</em></sup>.</li>
<li>A <em>Lipschitz continuous</em> function with constant <em>L</em> (the <em>Lipschitz constant</em>) is Hölder continuous with Hölder exponent 1 and Hölder constant <em>L</em>.<br />Roughly speaking, the function’s “steepness” is no greater than that of <em>L</em>*<em>x</em>.<br />If the function has a derivative on its domain, <em>L</em> can be the maximum of the absolute value of that derivative.</li>
<li>A <em>convex</em> function $f$ has the property that $f((x+y)/2) \le (f(x)+f(y))/2$ whenever $x$, $y$, and $(x+y)/2$ are in the function’s domain.<br />Roughly speaking, if the function’s “slope” never goes down, then it’s convex.</li>
<li>A <em>concave</em> function $f$ has the property that $f((x+y)/2) \ge (f(x)+f(y))/2$ whenever $x$, $y$, and $(x+y)/2$ are in the function’s domain.<br />Roughly speaking, if the function’s “slope” never goes up, then it’s concave.</li>
<li>The function $f$ is <em>bounded</em> if there are two numbers $a$ and $b$ such that $a\le f(x)\le b$ whenever $x$ is in the function’s domain.</li>
</ul>
<blockquote>
<p><strong>Note</strong>: The “good behavior” of a function can be important when designing Bernoulli factory algorithms. This page mostly cares how $f$ behaves when its domain is the closed unit interval, that is, when $0 \le x \le 1$.</p>
</blockquote>
<p><a id="General_Factory_Functions"></a></p>
<h2 id="general-factory-functions">General Factory Functions</h2>
<p>As a reminder, the <em>Bernoulli factory problem</em> is: Suppose there is a coin that shows heads with an unknown probability, <em>λ</em>, and the goal is to use that coin (and possibly also a fair coin) to build a “new” coin that shows heads with a probability that depends on <em>λ</em>, call it <em>f</em>(<em>λ</em>).</p>
<p>The algorithm for <a href="https://peteroupc.github.io/bernoulli.html#General_Factory_Functions"><strong>general factory functions</strong></a>, described in my main article on Bernoulli factory algorithms, works by building randomized upper and lower bounds for a function <em>f</em>(<em>λ</em>), based on flips of the input coin. Roughly speaking, the algorithm works as follows:</p>
<ol>
<li>Generate a random variate, <em>U</em>, uniformly distributed, greater than 0 and less than 1.</li>
<li>Flip the input coin, then build an upper and lower bound for <em>f</em>(<em>λ</em>), based on the outcomes of the flips so far.</li>
<li>If <em>U</em> is less than or equal to the lower bound, return 1. If <em>U</em> is greater than the upper bound, return 0. Otherwise, go to step 2.</li>
</ol>
<p>These randomized upper and lower bounds come from two sequences of polynomials as follows:</p>
<ol>
<li>One sequence approaches the function <em>f</em>(<em>λ</em>) from above, the other from below, and both sequences must converge to <em>f</em>.</li>
<li>For each sequence, the first polynomial has degree 1 (so is a linear function), and each other polynomial’s degree is 1 higher than the previous.</li>
<li>The <em>consistency requirement</em> must be met, namely—
<ul>
<li>the difference between the degree-(<em>n</em>−1) upper polynomial and the degree-_n_ upper polynomial, and</li>
<li>the difference between the degree-_n_ lower polynomial and the degree-(<em>n</em>−1) lower polynomial,</li>
</ul>
<p>must have nonnegative Bernstein coefficients, once each of these differences is rewritten as a polynomial of degree exactly <em>n</em>. (For more on Bernstein coefficients and the Bernstein form of polynomials, see “<a href="https://peteroupc.github.io/bernoulli.html#Certain_Polynomials"><strong>Certain Polynomials</strong></a>” in the main article.)</p>
</li>
</ol>
<p>The consistency requirement ensures that the upper polynomials “decrease” and the lower polynomials “increase”. Unfortunately, the reverse is not true in general; even if the upper polynomials “decrease” and the lower polynomials “increase” to <em>f</em>, this does not ensure the consistency requirement by itself.</p>
<blockquote>
<p><strong>Example (Nacu & Peres [2005]<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>):</strong> The polynomial $x^2+(1-x)^2$ is of degree 2 with Bernstein coefficients [1, 0, 1]. The polynomial $x(1-x)$ is of degree 2 with Bernstein coefficients [0, 1/2, 0]. Although $(x^2+(1-x)^2)$ minus $(x(1-x))$ is nonnegative, this difference’s Bernstein coefficients of degree 2 are not always nonnegative, namely, the Bernstein coefficients are [1, -1/2, 1].</p>
</blockquote>
<p>In this document, <strong>fbelow</strong>(<em>n</em>, <em>k</em>) and <strong>fabove</strong>(<em>n</em>, <em>k</em>) mean the <em>k</em><sup>th</sup> Bernstein coefficient for the lower or upper degree-_n_ polynomial, respectively, where 0 ≤ <em>k</em> ≤ <em>n</em> is an integer.</p>
<p>The section “Building the Lower and Upper Polynomials” are ways to build sequences of polynomials that appropriately converge to a factory function if that function meets certain conditions.</p>
<p>To determine if the methods are right for <em>f</em>(<em>λ</em>), a deep mathematical analysis of <em>f</em> is required; it would be helpful to plot that function using a computer algebra system to see if it is described in the next section.</p>
<p><a id="Building_the_Lower_and_Upper_Polynomials"></a></p>
<h3 id="building-the-lower-and-upper-polynomials">Building the Lower and Upper Polynomials</h3>
<p>The rest of this section assumes <em>f</em>(<em>λ</em>) is not a constant. For examples of functions, see “<a href="#Examples_of_Well_Behaved_Functions"><strong>Examples of Well-Behaved Functions</strong></a>”, in the appendix.</p>
<p><strong>Concave functions.</strong> If <em>f</em> is concave, then <strong>fbelow</strong>(<em>n</em>, <em>k</em>) can equal <em>f</em>(<em>k</em>/<em>n</em>), thanks to Jensen’s inequality.</p>
<p><strong>Convex functions.</strong> If <em>f</em> is convex, then <strong>fabove</strong>(<em>n</em>, <em>k</em>) can equal <em>f</em>(<em>k</em>/<em>n</em>), thanks to Jensen’s inequality.</p>
<p><strong>Hölder and Lipschitz continuous functions.</strong> I have found a way to extend the results of Nacu and Peres (2005)<sup id="fnref:1:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> to certain functions with a slope that tends to a vertical slope. The following scheme, proved in the appendix, implements <strong>fabove</strong> and <strong>fbelow</strong> if <em>f</em>(<em>λ</em>)—</p>
<ul>
<li>is <a href="https://en.wikipedia.org/wiki/Hölder_condition"><strong><em>Hölder continuous</em></strong></a> on the closed unit interval, with Hölder constant <em>m</em> or less and Hölder exponent <em>α</em> (see “<a href="#Definitions"><strong>Definitions</strong></a>” as well as “<a href="#Examples_of_Well_Behaved_Functions"><strong>Examples of Well-Behaved Functions</strong></a>”, in the appendix), and</li>
<li>on the closed unit interval—
<ul>
<li>has a minimum of greater than 0 and a maximum of less than 1, or</li>
<li>is convex and has a minimum of greater than 0, or</li>
<li>is concave and has a maximum of less than 1.</li>
</ul>
</li>
</ul>
<p>For every integer <em>n</em> that’s a power of 2:</p>
<ul>
<li><em>D</em>(<em>n</em>) = <em>m</em>*(2/7)<sup><em>α</em>/2</sup>/((2<sup><em>α</em>/2</sup>−1)*<em>n</em><sup><em>α</em>/2</sup>).</li>
<li><strong>fbelow</strong>(<em>n</em>, <em>k</em>) = <em>f</em>(<em>k</em>/<em>n</em>) if <em>f</em> is concave; otherwise, min(<strong>fbelow</strong>(4,0), <strong>fbelow</strong>(4,1), …, <strong>fbelow</strong>(4,4)) if <em>n</em> < 4; otherwise, <em>f</em>(<em>k</em>/<em>n</em>) − <em>D</em>(<em>n</em>).</li>
<li><strong>fabove</strong>(<em>n</em>, <em>k</em>) = <em>f</em>(<em>k</em>/<em>n</em>) if <em>f</em> is convex; otherwise, max(<strong>fabove</strong>(4,0), <strong>fabove</strong>(4,1), …, <strong>fabove</strong>(4,4)) if <em>n</em> < 4; otherwise, <em>f</em>(<em>k</em>/<em>n</em>) + <em>D</em>(<em>n</em>).</li>
</ul>
<blockquote>
<p><strong>Notes:</strong></p>
<ol>
<li>If <em>α</em> is 1, <em>D</em>(<em>n</em>) can be <em>m</em>*322613/(250000*sqrt(<em>n</em>)), which is an upper bound. If <em>α</em> is 1/2, <em>D</em>(<em>n</em>) can be <em>m</em>*154563/(40000*<em>n</em><sup>1/4</sup>), which is an upper bound.</li>
<li>The function $f(x)=\min(\lambda t, 1-\epsilon)$, where $\epsilon\ge 0$ and $t\ge 1$, is Lipschitz continuous with Lipschitz constant <em>t</em>. Because $f$ is linear between 0 and 1/<em>t</em>, ways to build polynomials that converge to this kind of function were discussed by Thomas and Blanchet (2012)<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> <sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> and Nacu & Peres (2005)<sup id="fnref:1:2" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> <sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>.</li>
</ol>
</blockquote>
<p><strong>Functions with a Lipschitz continuous derivative.</strong> The following method, proved in the appendix, implements <strong>fabove</strong> and <strong>fbelow</strong> if <em>f</em>(<em>λ</em>)—</p>
<ul>
<li>has a Lipschitz continuous derivative (see “<a href="#Definitions"><strong>Definitions</strong></a>” as well as “<a href="#Examples_of_Well_Behaved_Functions"><strong>Examples of Well-Behaved Functions</strong></a>”, in the appendix), and</li>
<li>in the closed unit interval—
<ul>
<li>has a minimum of greater than 0 and a maximum of less than 1, or</li>
<li>is convex and has a minimum of greater than 0, or</li>
<li>is concave and has a maximum of less than 1.</li>
</ul>
</li>
</ul>
<p>Let <em>m</em> be the Lipschitz constant of <em>f</em>’s derivative, or a greater number than that constant. Then for every integer <em>n</em> that’s a power of 2:</p>
<ul>
<li>For every $n$ such that <strong>fbelow</strong>(<em>n</em>,<em>k</em>) is 0 or greater for every <em>k</em>, <strong>fbelow</strong>(<em>n</em>, <em>k</em>) = <em>f</em>(<em>k</em>/<em>n</em>) if <em>f</em> is concave; otherwise, min(<strong>fbelow</strong>(4,0), <strong>fbelow</strong>(4,1), …, <strong>fbelow</strong>(4,4)) if <em>n</em> < 4; otherwise, <em>f</em>(<em>k</em>/<em>n</em>) − <em>m</em>/(7*<em>n</em>). For every other $n$, <strong>fbelow</strong>(<em>n</em>,<em>k</em>) = 0.</li>
<li>For every $n$ such that <strong>fabove</strong>(<em>n</em>,<em>k</em>) is 1 or less for every <em>k</em>, <strong>fabove</strong>(<em>n</em>, <em>k</em>) = <em>f</em>(<em>k</em>/<em>n</em>) if <em>f</em> is convex; otherwise, max(<strong>fabove</strong>(4,0), <strong>fabove</strong>(4,1), …, <strong>fabove</strong>(4,4)) if <em>n</em> < 4; otherwise, <em>f</em>(<em>k</em>/<em>n</em>) + <em>m</em>/(7*<em>n</em>). For every other $n$, <strong>fabove</strong>(<em>n</em>,<em>k</em>) = 1.</li>
</ul>
<p><strong>Certain functions that equal 0 at 0.</strong> This approach involves transforming the function <em>f</em> so that it no longer equals 0 at the point 0. This can be done by dividing <em>f</em> by a function (<code>High</code>(<em>λ</em>)) that “dominates” <em>f</em> everywhere on the closed unit interval. Unlike for the original function, there might be a polynomial-building scheme described earlier in this section for the transformed function.</p>
<p>More specifically, <code>High</code>(<em>λ</em>) must meet the following requirements:</p>
<ul>
<li><code>High</code>(<em>λ</em>) is continuous on the closed unit interval.</li>
<li><code>High</code>(0) = 0. (This is required to ensure correctness in case <em>λ</em> is 0.)</li>
<li>1 ≥ <code>High</code>(1) ≥ <em>f</em>(1) ≥ 0.</li>
<li>1 > <code>High</code>(<em>λ</em>) > <em>f</em>(<em>λ</em>) > 0 whenever 0 < <em>λ</em> < 1.</li>
<li>If <em>f</em>(1) = 0, then <code>High</code>(1) = 0. (This is required to ensure correctness in case <em>λ</em> is 1.)</li>
</ul>
<p>Also, <code>High</code> is a Bernoulli factory function that should admit a simple Bernoulli factory algorithm. For example, <code>High</code> can be the following degree-_n_ polynomial: 1−(1−<em>λ</em>)<sup><em>n</em></sup>, where <em>n</em>≥1 is an integer.<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup></p>
<p>The algorithm is now described.</p>
<p>Let <em>g</em>(<em>λ</em>) = lim<sub><em>ν</em>→<em>λ</em></sub> <em>f</em>(<em>ν</em>)/<code>High</code>(<em>ν</em>) (roughly speaking, the value that <em>f</em>(<em>ν</em>)/<code>High</code>(<em>ν</em>) approaches as <em>ν</em> approaches <em>λ</em>.) If—</p>
<ul>
<li><em>f</em>(0) = 0 and <em>f</em>(1) < 1, and</li>
<li><em>g</em>(<em>λ</em>) is continuous on the closed unit interval and belongs in one of the classes of functions given earlier,</li>
</ul>
<p>then <em>f</em> can be simulated using the following algorithm:</p>
<ol>
<li>Run a Bernoulli factory algorithm for <code>High</code>. If the call returns 0, return 0. (For example, if <code>High</code>(<em>λ</em>) = <em>λ</em>, then this step amounts to the following: “Flip the input coin. If it returns 0, return 0.”)</li>
<li>Run a Bernoulli factory algorithm for <em>g</em>(.) and return the result of that algorithm. This can be one of the <a href="https://peteroupc.github.io/bernoulli.html#General_Factory_Functions"><strong>general factory function algorithms</strong></a> if there is a way to calculate polynomials that converge to <em>g</em>(.) in a manner needed for that algorithm (for example, if <em>g</em> is described earlier in this section).</li>
</ol>
<blockquote>
<p><strong>Notes:</strong></p>
<ol>
<li>It may happen that <em>g</em>(0) = 0. In this case, step 2 of this algorithm can involve running this algorithm again, but with new <em>g</em> and <code>High</code> functions that are found based on the current <em>g</em> function. (This will eventually result in <em>g</em>(0) being nonzero if <em>f</em> is a nonconstant Bernoulli factory function.) See the second example below.</li>
<li><code>High</code>(<em>λ</em>) can also equal 1 instead of be described in this section. That leads to the original Bernoulli factory algorithm for <em>f</em>(<em>λ</em>).</li>
</ol>
<p><strong>Examples:</strong></p>
<ol>
<li>If <em>f</em>(<em>λ</em>) = (sinh(<em>λ</em>)+cosh(<em>λ</em>)−1)/4, then <em>f</em> is less than or equal to <code>High</code>(<em>λ</em>) = <em>λ</em>, so <em>g</em>(<em>λ</em>) is 1/4 if <em>λ</em> = 0, and (exp(<em>λ</em>) − 1)/(4*<em>λ</em>) otherwise. The following code in Python that uses the SymPy computer algebra library computes this example: <code>fx = (sinh(x)+cosh(x)-1)/4; h = x; pprint(Piecewise((limit(fx/h,x,0), Eq(x,0)), ((fx/h).simplify(), True)))</code>.</li>
<li>
<p>If <em>f</em>(<em>λ</em>) = cosh(<em>λ</em>) − 1, then <em>f</em> is less than or equal to <code>High</code>(<em>λ</em>) = <em>λ</em>, so <em>g</em>(<em>λ</em>) is 0 if <em>λ</em> = 0, and (cosh(<em>λ</em>)−1)/<em>λ</em> otherwise. Now, since <em>g</em>(0) = 0, find new functions <em>g</em> and <code>High</code> based on the current <em>g</em>. The current <em>g</em> is less than or equal to <code>High</code>(<em>λ</em>) = <em>λ</em>*3*(2−<em>λ</em>)/5 (a degree-2 polynomial that has Bernstein coefficients [0, 6/10, 6/10]), so <em>G</em>(<em>λ</em>) = 5/12 if <em>λ</em> = 0, and −(5*cosh(<em>λ</em>) − 5)/(3*<em>λ</em><sup>2</sup>*(<em>λ</em>−2)) otherwise. <em>G</em> is bounded away from 0 and 1, resulting in the following algorithm:</p>
<ol>
<li>(Simulate <code>High</code>.) Flip the input coin. If it returns 0, return 0.</li>
<li>(Simulate <code>High</code>.) Flip the input coin twice. If both flips return 0, return 0. Otherwise, with probability 4/10 (that is, 1 minus 6/10), return 0.</li>
<li>Run a Bernoulli factory algorithm for <em>G</em> (which might involve building polynomials that converge to <em>G</em>, noticing that <em>G</em>’s derivative is Lipschitz continuous) and return the result of that algorithm.</li>
</ol>
</li>
</ol>
</blockquote>
<p><strong>Certain functions that equal 0 at 0 and 1 at 1.</strong> Let <em>f</em>, <em>g</em>, and <code>High</code> be functions as defined earlier, except that <em>f</em>(0) = 0 and <em>f</em>(1) = 1. Define the following additional functions:</p>
<ul>
<li><code>Low</code>(<em>λ</em>) is a function that meets the following requirements:
<ul>
<li><code>Low</code>(<em>λ</em>) is continuous on the closed unit interval.</li>
<li><code>Low</code>(0) = 0 and <code>Low</code>(1) = 1.</li>
<li>1 > <em>f</em>(<em>λ</em>) > <code>Low</code>(<em>λ</em>) > 0 whenever 0 < <em>λ</em> < 1.</li>
</ul>
</li>
<li><em>q</em>(<em>λ</em>) = lim<sub><em>ν</em>→<em>λ</em></sub> <code>Low</code>(<em>ν</em>)/<code>High</code>(<em>ν</em>).</li>
<li><em>r</em>(<em>λ</em>) = lim<sub><em>ν</em>→<em>λ</em></sub> (1−<em>g</em>(<em>ν</em>))/(1−<em>q</em>(<em>ν</em>)).</li>
</ul>
<p>Roughly speaking, <code>Low</code> is a function that bounds <em>f</em> from below, just as <code>High</code> bounds <em>f</em> from above. <code>Low</code> is a Bernoulli factory function that should admit a simple Bernoulli factory algorithm; one example is the polynomial <em>λ</em><sup><em>n</em></sup> where <em>n</em>≥1 is an integer. If both <code>Low</code> and <code>High</code> are polynomials of the same degree, <em>q</em> will be a ratio of polynomials with a relatively simple Bernoulli factory algorithm (see “<a href="https://peteroupc.github.io/bernoulli.html#Certain_Rational_Functions"><strong>Certain Rational Functions</strong></a>”).</p>
<p>Now, if <em>r</em>(<em>λ</em>) is continuous on the closed unit interval, then <em>f</em> can be simulated using the following algorithm:</p>
<ol>
<li>Run a Bernoulli factory algorithm for <code>High</code>. If the call returns 0, return 0. (For example, if <code>High</code>(<em>λ</em>) = <em>λ</em>, then this step amounts to the following: “Flip the input coin. If it returns 0, return 0.”)</li>
<li>Run a Bernoulli factory algorithm for <em>q</em>(.). If the call returns 1, return 1.</li>
<li>Run a Bernoulli factory algorithm for <em>r</em>(.), and return 1 minus the result of that call. The Bernoulli factory algorithm can be one of the <a href="https://peteroupc.github.io/bernoulli.html#General_Factory_Functions"><strong>general factory function algorithms</strong></a> if there is a way to calculate polynomials that converge to <em>r</em>(.) in a manner needed for that algorithm (for example, if <em>r</em> is described earlier in this section).</li>
</ol>
<blockquote>
<p><strong>Notes:</strong></p>
<ol>
<li>Quick proof: Rewrite $f=\text{High}\cdot(q\cdot1+(1-q)\cdot(1-r))+(1-\text{High})\cdot0$.</li>
<li><code>High</code>(<em>λ</em>) is allowed to equal 1 if the <em>r</em>(.) in step 3 is allowed to equal 0 at 0.</li>
</ol>
<p><strong>Example:</strong> If <em>f</em>(<em>λ</em>) = (1−exp(<em>λ</em>))/(1−exp(1)), then <em>f</em> is less than or equal to <code>High</code>(<em>λ</em>) = <em>λ</em>, and greater than or equal to <code>Low</code>(<em>λ</em>) = <em>λ</em><sup>2</sup>. As a result, <em>q</em>(<em>λ</em>) = <em>λ</em>, and <em>r</em>(<em>λ</em>) = (2 − exp(1))/(1 − exp(1)) if <em>λ</em> = 0; 1/(exp(1)−1) if <em>λ</em> = 1; and (−<em>λ</em>*(1 − exp(1)) − exp(<em>λ</em>) + 1)/(<em>λ</em>*(1 − exp(1))*(<em>λ</em> − 1)) otherwise. This can be computed using the following code in Python that uses the SymPy computer algebra library: <code>fx=(1-exp(x))/(1-exp(1)); high=x; low=x**2; q=(low/high); r=(1-fx/high)/(1-q); r=Piecewise((limit(r, x, 0), Eq(x,0)), (limit(r,x,1),Eq(x,1)), (r,True)).simplify(); pprint(r)</code>.</p>
</blockquote>
<p><strong>Other functions that equal 0 or 1 at the endpoint 0, 1, or both.</strong> The table below accounts for these Bernoulli factory functions:</p>
<table>
<thead>
<tr>
<th>If <em>f</em>(0) =</th>
<th>And <em>f</em>(1) =</th>
<th>Method</th>
</tr>
</thead>
<tbody>
<tr>
<td>> 0 and < 1</td>
<td>1</td>
<td>Use the algorithm for <strong>certain functions that equal 0 at 0</strong>, but with <em>f</em>(<em>λ</em>) = 1 − <em>f</em>(1−<em>λ</em>).<br /><em>Inverted coin</em>: Instead of the usual input coin, use a coin that does the following: “Flip the input coin and return 1 minus the result.”<br /><em>Inverted result:</em> If the overall algorithm would return 0, it returns 1 instead, and vice versa.</td>
</tr>
<tr>
<td>> 0 and < 1</td>
<td>0</td>
<td>Algorithm for <strong>certain functions that equal 0 at 0</strong>, but with <em>f</em>(<em>λ</em>) = <em>f</em>(1−<em>λ</em>). (For example, cosh(<em>λ</em>)−1 becomes cosh(1−<em>λ</em>)−1.)<br />Inverted coin.</td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>Algorithm for <strong>certain functions that equal 0 at 0 and 1 at 1</strong>, but with <em>f</em>(<em>λ</em>) = 1−<em>f</em>(<em>λ</em>).<br />Inverted result.</td>
</tr>
<tr>
<td>1</td>
<td>> 0 and ≤ 1</td>
<td>Algorithm for <strong>certain functions that equal 0 at 0</strong>, but with <em>f</em>(<em>λ</em>) = 1−<em>f</em>(<em>λ</em>).<br />Inverted result.</td>
</tr>
</tbody>
</table>
<p><a id="Another_General_Algorithm"></a></p>
<h3 id="another-general-algorithm">Another General Algorithm</h3>
<p>The algorithm I’ve developed in this section simulates $f(\lambda)$ when $f$ belongs in a large class of functions, as long as the following is known:</p>
<ul>
<li>$f$ is continuous and has a minimum of greater than 0 and a maximum of less than 1.</li>
<li>There is a family of polynomials ($L_{1}(f)$, $L_{2}(f)$, $L_{4}(f)$, $L_{8}(f)$, …) that come close to $f$ with a known error bound, where the number after $L$ is the degree of the polynomial.</li>
<li>There is a way to find the <em>Bernstein coefficients</em> of each polynomial $L_{n}(f)$ in the family of polynomials.</li>
</ul>
<p>For examples of suitable polynomials, see <a href="https://peteroupc.github.io/bernapprox.html"><strong>“Approximations in Bernstein Form”</strong></a>.</p>
<p>In effect, the algorithm writes $f$ as an infinite sum of polynomials, whose maximums must sum to 1 or less (called <em>T</em> in the algorithm below), then simulates an appropriate <a href="https://peteroupc.github.io/bernoulli.html#Convex_Combinations"><strong>convex combination</strong></a> (weighted sum) of these polynomials. To build the convex combination, each polynomial in the infinite sum is divided by an upper bound on its maximum (which is why error bounds on $L_{n}(f)$ are crucial here).<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup> To simulate $f$, the algorithm—</p>
<ul>
<li>selects a polynomial in the convex combination with probability proportional to its upper bound, or a “leftover” zero polynomial with probability <em>T</em>, then</li>
<li>simulates the chosen polynomial (which is easy to do; see Goyal and Sigman (2012)<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>).</li>
</ul>
<hr />
<ul>
<li>In the algorithm, denote:
<ul>
<li>$\epsilon(f, n)$ as an upper bound on the absolute value of the difference between $f$ and the degree-$n$ polynomial $L_{n}(f)$. $\epsilon(f, n)$ must increase nowhere as $n$ increases, and must converge to 0.
<ul>
<li>For best results, this should be written as $\epsilon(f, n) = C/n^r$, where $C$ is a constant and $r>0$ is a multiple of 1/2, since then it’s easy to find the value of ErrShift(f, n), later. In this case, the algorithm should be limited to functions with a continuous $(2r)$-th derivative or a Lipschitz continuous $(2r-1)$-th derivative (see “<a href="#Achievable_Simulation_Rates"><strong>Achievable Simulation Rates</strong></a>”, later). (For example, if the error bound is $C/n^2$, the function $f$ should have a continuous fourth derivative or a Lipschitz continuous third derivative.)</li>
<li>For examples of error bounds, see <a href="https://peteroupc.github.io/bernapprox.html"><strong>“Approximations in Bernstein Form”</strong></a>.</li>
</ul>
</li>
<li>ErrShift($f, m$) as 1.01 $\cdot\sum_{i\ge m} \epsilon(f, 2^i)$. The factor 1.01 is needed to ensure each difference polynomial is strictly between 0 and 1.
<ul>
<li><strong>Example:</strong> If $\epsilon(f, n) = C/n^r$, then ErrShift($f, m)$ = $1.01\cdot C\cdot 2^r/(((2^r)-1)\cdot 2^{rm})$.</li>
</ul>
</li>
<li>DiffWidth($f, m$) as 1.01 $\cdot 2 (\epsilon(f, 2^m)$ + $\epsilon(f, 2^{m+1}))$. This is an upper bound on the maximum difference between the shifted degree-$2^m$ and the shifted degree-$(2^{m+1})$ polynomial.</li>
</ul>
</li>
<li>The technique breaks $f$ into a <strong>starting polynomial</strong> and a family of <strong>difference polynomials</strong>.<br />To find the <strong>starting polynomial</strong>:
<ol>
<li>Set $m$ to 0.</li>
<li>Find the Bernstein coefficients of $L_{2^m}$, then subtract ErrShift($f, m$) from them. If those coefficients now all lie in the closed unit interval, go to the next step. Otherwise, add 1 to <em>m</em> and repeat this step.</li>
<li>Calculate <strong>StartWidth</strong> as ceil($c\cdot 65536$)/65536, where $c$ is the maximum Bernstein coefficient from step 2, then divide each Bernstein coefficient by <strong>StartWidth</strong>. (65536 is arbitrary and ensures <strong>StartWidth</strong> is a rational number that is close to, but no lower than, the maximum Bernstein coefficient, for convenience.)</li>
<li>The <strong>starting polynomial</strong> now has Bernstein coefficients found in step 3. Set <strong>StartOrder</strong> to <em>m</em>.</li>
</ol>
</li>
<li>To find the <strong>difference polynomial</strong> of order $m$:
<ol>
<li>Find the Bernstein coefficients of $L_{2^m}$, then subtract ErrShift($f, m$) from them. Rewrite them to Bernstein coefficients of degree $2^{m+1}$. Call the coefficients <strong>LowerCoeffs</strong>.</li>
<li>Find the Bernstein coefficients of $L_{2^{m+1}}$, then subtract ErrShift($f, m+1$) from them. Call the resulting values <strong>UpperCoeffs</strong>.</li>
<li>Subtract <strong>UpperCoeffs</strong> from <strong>LowerCoeffs</strong>, and call the result <strong>DiffCoeffs</strong>. Divide each coefficient in <strong>DiffCoeffs</strong> by DiffWidth($f, m$). The result is the Bernstein coefficients of a positive polynomial of degree $2^{m+1}$ bounded by 0 and 1, but these coefficients are not necessarily bounded by 0 and 1. Thus, if any coefficient in <strong>DiffCoeffs</strong> is less than 0 or greater than 1, add 1 to <em>m</em> and rewrite <strong>DiffCoeffs</strong> to Bernstein coefficients of degree $2^{m+1}$ until no coefficient is less than 0 or greater than 1 anymore.</li>
<li>The <strong>difference polynomial</strong> now has Bernstein coefficients given by <strong>DiffCoeffs</strong>.</li>
</ol>
</li>
<li>The probabilities for <em>X</em> are as follows:
<ul>
<li>First, find the <strong>starting polynomial</strong>, then calculate <em>T</em> as <strong>StartWidth</strong> + $\sum_{i\ge 0}$ DiffWidth($f$, <strong>StartOrder</strong>+<em>i</em>). If <em>T</em> is greater than 1, this algorithm can’t be used.
<ul>
<li><strong>Example:</strong> If $\epsilon(f, n) = C/n^r$, then <em>T</em> = <strong>StartWidth</strong> + 1.01 $\cdot(2 \cdot 2^{-r\cdot\text{StartOrder}} C (1 + 2^{- r}))/(1 - 2^{- r})$.</li>
</ul>
</li>
<li><em>X</em> is 0 with probability 1 − <em>T</em>.</li>
<li><em>X</em> is 1 with probability equal to <strong>StartWidth</strong>.</li>
<li>For each <em>m</em> ≥ 2, <em>X</em> is <em>m</em> with probability equal to DiffWidth($f$,<strong>StartOrder</strong> + <em>m</em> − 2).</li>
</ul>
</li>
</ul>
<p>Then an algorithm to toss heads with probability equal to $f$ would be:</p>
<ol>
<li>Generate <em>X</em> at random with the probabilities given earlier.</li>
<li>If <em>X</em> is 0, return 0. Otherwise, if <em>X</em> is 1, find the <strong>starting polynomial</strong> and its Bernstein coefficients. Otherwise (if <em>X</em> is 2 or greater), find the <strong>difference polynomial</strong> of order <em>m</em> and its Bernstein coefficients, where <em>m</em> = (<em>X</em>−2) + <strong>StartOrder</strong>.</li>
<li>Flip the input coin (with probability of heads $\lambda$), $n - 1$ times, where $n$ is the number of Bernstein coefficients in the polynomial found in step 2 (its degree plus one), and let $j$ be the number of heads.</li>
<li>Return 1 with probability equal to the polynomial’s $j$th Bernstein coefficient ($j$ starts at 0), or 0 otherwise (see also Goyal and Sigman 2012 for an algorithm to simulate polynomials).</li>
</ol>
<p>If <em>T</em> turns out to be greater than 1 in this algorithm, but still finite, one way to simulate $f$ is to create a coin that simulates $f(\lambda)/T$ instead, and use that coin as the input to a <a href="https://peteroupc.github.io/bernoulli.html#Linear_Bernoulli_Factories"><strong><em>linear Bernoulli factory</em></strong></a> that simulates $T\cdot (f(\lambda)/T)$. (This is possible especially because $f(\lambda)$ is assumed to have a maximum of less than 1.)</p>
<blockquote>
<p><strong>Example</strong>: The following parameters allow this algorithm to work if $f$ is concave, has a maximum of less than 1, and has a Lipschitz-continuous derivative with Lipschitz constant <em>M</em> or less. In this case, it is allowed that $f(0)=0$, $f(1)=0$, or both.</p>
<ul>
<li>The family of polynomials $L_n(f)$ is simply the family of Bernstein polynomials of $f$. The Bernstein coefficients of $L_n(f)$ are $f(0/n), f(1/n), …, f(n/n)$.</li>
<li>The error bound is $\epsilon(f, n) = M/(8n)$ (Lorentz 1963)<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>.</li>
<li>The <strong>starting polynomial</strong> is found as follows. Let <em>c</em> = max($f(0), f(1)$). Then the starting polynomial has two Bernstein coefficients both equal to $c$; <strong>StartWidth</strong> is equal to ceil($c\cdot 65536$)/65536, and <strong>StartOrder</strong> is equal to 0.</li>
<li>ErrShift($f,m$) = 0. The reason for 0 is that $f$ is concave, so its Bernstein polynomials naturally “increase” with increasing degree (Temple 1954)<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup>, (Moldovan 1962)<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>.</li>
<li>DiffWidth($f,m$) = $1.01\cdot 3 M/(8\cdot 2^m)$. For the same reason as the previous point, and because the Bernstein polynomials are always “below” $f$, DiffWidth($f,m$) can also equal 1.01 $\cdot \epsilon(f, 2^{m})$ = $1.01\cdot M/(8\cdot 2^m)$. This is what is used to calculate <em>T</em>, later.</li>
<li><em>T</em> is calculated as <strong>StartWidth</strong> + $1.01\cdot M/4$.</li>
</ul>
</blockquote>
<p><a id="Request_for_Additional_Methods"></a></p>
<h3 id="request-for-additional-methods">Request for Additional Methods</h3>
<p>Readers are requested to let me know of additional solutions to the following problem:</p>
<p><em>Let $f(\lambda)$ be continuous and satisfy $0\lt f(\lambda)\lt 1$ whenever $0\le\lambda\le 1$. Given that $f(\lambda)$ belongs to a large class of functions (for example, it has a continuous, Lipschitz continuous, concave, or nowhere decreasing $k$-th derivative for some integer $k$, or any combination of these), compute the Bernstein coefficients for two sequences of polynomials as follows: one of them approaches $f(\lambda)$ from above, the other from below, and the consistency requirement must be met (see “<a href="#General_Factory_Functions"><strong>General Factory Functions</strong></a>”).</em></p>
<p><em>The polynomials need to be computed only for degrees 2, 4, 8, 16, and higher powers of 2.</em></p>
<p><em>The rate of convergence must be no slower than $1/n^{r/2}$ if the specified class has only functions with continuous $r$-th derivative.</em></p>
<p><em>Methods that use only integer arithmetic and addition and multiplication of rational numbers are preferred (thus, methods that involve cosines, sines, $\pi$, $\exp$, and $\ln$ are not preferred).</em></p>
<p>See also the <a href="https://peteroupc.github.io/bernreq.html#Polynomials_that_approach_a_factory_function_fast"><strong>open questions</strong></a>.</p>
<p><a id="Approximate_Bernoulli_Factories"></a></p>
<h2 id="approximate-bernoulli-factories">Approximate Bernoulli Factories</h2>
<p>An <strong>approximate Bernoulli factory</strong> for a function <em>f</em>(<em>λ</em>) is a Bernoulli factory algorithm that simulates another function, <em>g</em>(<em>λ</em>), that approximates <em>f</em> in some sense.</p>
<p>Usually <em>g</em> is a polynomial, but can also be a rational function (ratio of polynomials) or another function with an easy-to-implement Bernoulli factory algorithm.</p>
<p>Meanwhile, <em>f</em>(<em>λ</em>) can be any function that maps the closed unit interval to itself, even if it isn’t continuous or a factory function (examples include the “step function” 0 if <em>λ</em> < 1/2 and 1 otherwise, or the function 2*min(<em>λ</em>, 1 − <em>λ</em>)). If the function is continuous, it can be approximated arbitrarily well by an approximate Bernoulli factory (as a result of the so-called “Weierstrass approximation theorem”), but generally not if the function is discontinuous.</p>
<p>To build an approximate Bernoulli factory with a polynomial:</p>
<ol>
<li>
<p>First, find a polynomial in Bernstein form of degree <em>n</em> that is close enough to the desired function <em>f</em>(<em>λ</em>).</p>
<p>The simplest choice for this polynomial, known simply as a <em>Bernstein polynomial</em>, has <em>n</em>+1 Bernstein coefficients and its <em>j</em><sup>th</sup> coefficient (starting at 0) is found as <em>f</em>(<em>j</em>/<em>n</em>). For this choice, if <em>f</em> is continuous, the polynomial can be brought arbitrarily close to <em>f</em> by choosing <em>n</em> high enough.</p>
<p>Other choices for this polynomial are given in the page <a href="https://peteroupc.github.io/bernapprox.html"><strong>“Approximations in Bernstein Form”</strong></a>.</p>
<p>Whatever polynomial is used, the polynomial’s Bernstein coefficients must all lie on the closed unit interval.</p>
<p>The polynomial can be in <em>homogeneous form</em> (also known as <em>scaled Bernstein</em> form (Farouki and Rajan 1988)<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">11</a></sup>) instead of in Bernstein form, with <em>scaled Bernstein coefficients</em> $s_0, …, s_n$, as long as $0\le s_i\le{n\choose i}$ where $0\le i\le n$.</p>
</li>
<li>
<p>The rest of the process is to toss heads with probability equal to that polynomial, given its Bernstein coefficients. To do this, first flip the input coin <em>n</em> times, and let <em>j</em> be the number of times the coin returned 1 this way.</p>
</li>
<li>
<p>Then, with probability equal to—</p>
<ul>
<li>the polynomial’s Bernstein coefficient at position <em>j</em> (which will be $f(j/n)$ in the case of the Bernstein polynomial $B_n(f)$), or</li>
<li>the polynomial’s scaled Bernstein coefficient at position <em>j</em>, divided by choose(<em>n</em>, <em>j</em>),</li>
</ul>
<p>return 1. Otherwise, return 0. Here, 0≤<em>j</em>≤<em>n</em>.</p>
<p>If the probability can be an irrational number, see “<a href="https://peteroupc.github.io/bernoulli.html#Algorithms_for_General_Irrational_Constants"><strong>Algorithms for General Irrational Constants</strong></a>” for ways to exactly sample a probability equal to that irrational number.</p>
</li>
</ol>
<blockquote>
<p><strong>Notes:</strong></p>
<ol>
<li>More sophisticated ways to implement steps 2 and 3 are found in the section “<a href="https://peteroupc.github.io/bernoulli.html"><strong>Certain Polynomials</strong></a>” in the main article “Bernoulli Factory Algorithms”.</li>
<li>There are other kinds of functions, besides polynomials and rational functions, that serve to approximate continuous functions. But many of them work poorly as approximate Bernoulli factory functions because their lack of “smoothness” means there is no simple Bernoulli factory for them. For example, a <em>spline</em>, which is a continuous function made up of a finite number of polynomial pieces, is generally not “smooth” at the points where the spline’s pieces meet.</li>
<li>Bias and variance are the two sources of error in a randomized estimation algorithm. Let <em>g</em>(<em>λ</em>) be an approximation of <em>f</em>(<em>λ</em>). The original Bernoulli factory for <em>f</em>, if it exists, has bias 0 and variance <em>f</em>(<em>λ</em>)*(1−<em>f</em>(<em>λ</em>)), but the approximate Bernoulli factory has bias <em>g</em>(<em>λ</em>) − <em>f</em>(<em>λ</em>) and variance <em>g</em>(<em>λ</em>)*(1−<em>g</em>(<em>λ</em>)). (“Variance reduction” methods are outside the scope of this document.) An estimation algorithm’s <em>mean squared error</em> equals variance plus square of bias.</li>
<li>There are two known approximations to the linear function $f(\lambda) = 2\lambda$ using a polynomial in Bernstein form of degree $n$ that maps the open interval (0, 1) to itself. In each case, if <em>g</em>(<em>λ</em>) is that polynomial and if $0\le\lambda\le 1/2$, then the error in approximating <em>f</em>(<em>λ</em>) is no greater than 1−<em>g</em>(1/2).
<ul>
<li>In Henderson and Glynn (2003, Remark 4)<sup id="fnref:12" role="doc-noteref"><a href="#fn:12" class="footnote" rel="footnote">12</a></sup>, the polynomial’s <em>j</em><sup>th</sup> Bernstein coefficient (starting at 0) is min((<em>j</em>/<em>n</em>)*2, 1−1/<em>n</em>). The polynomial <em>g</em> can be computed with the SymPy computer algebra library as follows: <code>from sympy.stats import *; g=2*E( Min(sum(Bernoulli(("B%d" % (i)),z) for i in range(n))/n,(S(1)-S(1)/n)/2))</code>.</li>
<li>In Nacu and Peres (2005, section 6)<sup id="fnref:1:3" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>, the polynomial’s <em>j</em><sup>th</sup> Bernstein coefficient (starting at 0) is min((<em>j</em>/<em>i</em>)*2, 1). It corresponds to the following algorithm: Flip the input coin <em>n</em> times or until the ratio of “heads” to “flips” becomes at least 1/2, whichever comes first, then if <em>n</em> flips were made without the ratio becoming at least 1/2, return 0; otherwise, return 1.</li>
</ul>
</li>
</ol>
</blockquote>
<p><a id="Achievable_Simulation_Rates"></a></p>
<h2 id="achievable-simulation-rates">Achievable Simulation Rates</h2>
<p>In general, the number of input coin flips needed by any Bernoulli factory algorithm for a factory function <em>f</em>(<em>λ</em>) depends on how “smooth” the function <em>f</em> is.</p>
<p>The following table summarizes the rate of simulation (in terms of the number of input coin flips needed) that can be achieved <em>in theory</em> depending on <em>f</em>(<em>λ</em>), assuming the input coin’s probability of heads is unknown. In the following table:</p>
<ul>
<li><em>λ</em>, the unknown probability of heads, is <em>ε</em> or greater and (1−<em>ε</em>) or less for some <em>ε</em> > 0.</li>
<li>The simulation makes use of a <em>fair coin</em> (a coin that shows 1 or 0 with equal probability) in addition to input coin flips.</li>
<li><em>Δ</em>(<em>n</em>, <em>r</em>, <em>λ</em>) = max(sqrt(<em>λ</em>*(1−<em>λ</em>)/<em>n</em>),1/<em>n</em>)<sup><em>r</em></sup>.</li>
</ul>
<table>
<thead>
<tr>
<th>Property of simulation</th>
<th>Property of <em>f</em></th>
</tr>
</thead>
<tbody>
<tr>
<td>Requires no more than <em>n</em> input coin flips.</td>
<td>If and only if <em>f</em> can be written as a polynomial of degree <em>n</em> with Bernstein coefficients in the closed unit interval (Goyal and Sigman 2012)<sup id="fnref:7:1" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>.</td>
</tr>
<tr>
<td>Requires a finite number of flips on average. Also known as “realizable” by Flajolet et al. (2010)<sup id="fnref:13" role="doc-noteref"><a href="#fn:13" class="footnote" rel="footnote">13</a></sup>.</td>
<td>Only if <em>f</em> is Lipschitz continuous (Nacu and Peres 2005)<sup id="fnref:1:4" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.<br />Whenever <em>f</em> admits a fast simulation (Mendo 2019)<sup id="fnref:14" role="doc-noteref"><a href="#fn:14" class="footnote" rel="footnote">14</a></sup>.</td>
</tr>
<tr>
<td>Number of flips required, raised to power of <em>r</em>, is bounded by a finite number on average and has a tail that drops off uniformly over <em>f</em>’s domain.</td>
<td>Only if <em>f</em> has continuous <em>r</em>-th derivative (Nacu and Peres 2005)<sup id="fnref:1:5" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</td>
</tr>
<tr>
<td>Requires more than <em>n</em> flips with at most a probability proportional to <em>Δ</em>(<em>n</em>, <em>r</em> + 1, <em>λ</em>), for integer <em>r</em> ≥ 0 and every <em>λ</em>, and for large enough <em>n</em>. (The greater <em>r</em> is, the faster the simulation.)</td>
<td>Only if <em>f</em> has an <em>r</em>-th derivative that is continuous and in the Zygmund class (see note below) (Holtz et al. 2011, Theorem 13)<sup id="fnref:15" role="doc-noteref"><a href="#fn:15" class="footnote" rel="footnote">15</a></sup>.</td>
</tr>
<tr>
<td>Requires more than <em>n</em> flips with at most a probability proportional to <em>Δ</em>(<em>n</em>, <em>α</em>, <em>λ</em>), for noninteger <em>α</em> > 0 and every <em>λ</em>, and for large enough <em>n</em>. (The greater <em>α</em> is, the faster the simulation.)</td>
<td>If and only if <em>f</em> has an <em>r</em>-th derivative that is Hölder continuous with Hölder exponent (<em>α</em> − <em>r</em>) or greater, where <em>r</em> = floor(<em>α</em>) (Holtz et al. 2011, Theorem 8)<sup id="fnref:15:1" role="doc-noteref"><a href="#fn:15" class="footnote" rel="footnote">15</a></sup>. Assumes <em>f</em> is bounded away from 0 and 1.</td>
</tr>
<tr>
<td>“Fast simulation” (requires more than <em>n</em> flips with a probability that decays exponentially or faster as <em>n</em> gets large). Also known as “strongly realizable” by Flajolet et al. (2010)<sup id="fnref:13:1" role="doc-noteref"><a href="#fn:13" class="footnote" rel="footnote">13</a></sup>.</td>
<td>If and only if <em>f</em> is analytic at every point in its domain (see note below) (Nacu and Peres 2005)<sup id="fnref:1:6" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</td>
</tr>
<tr>
<td>Average number of flips greater than or equal to (<em>f′</em>(<em>λ</em>))<sup>2</sup>*<em>λ</em>*(1−<em>λ</em>)/(<em>f</em>(<em>λ</em>)*(1−<em>f</em>(<em>λ</em>))), where <em>f′</em> is the first derivative of <em>f</em>.</td>
<td>Whenever <em>f</em> admits a fast simulation (Mendo 2019)<sup id="fnref:14:1" role="doc-noteref"><a href="#fn:14" class="footnote" rel="footnote">14</a></sup>.</td>
</tr>
</tbody>
</table>
<blockquote>
<p><strong>Note:</strong> A function $f(\lambda)$ is:</p>
<ul>
<li>
<p><em>Analytic</em> at a point $z$ if there is a positive number $r$ such that $f$ is writable as—</p>
\[f(\lambda)=f(z)+f^{(1)}(z)(\lambda-z)^1/1! + f^{(2)}(z)(\lambda-z)^2/2! + ...,\]
<p>for every point $\lambda$ satisfying abs($\lambda-z$) < $r$, where $f^{(i)}$ is the $i$-th derivative of $f$.</p>
</li>
<li>
<p>In the <em>Zygmund class</em> if it is continuous and there is a positive number $D$ with the following property: For each step size $\epsilon>0$, abs($f(x-h) + f(x+h) - 2f(x)$) $\le D\times\epsilon$ wherever the left-hand side is defined and $0\lt h\le\epsilon$. The Zygmund class includes the two “smaller” classes of Lipschitz continuous functions (see “Definitions”) and functions with a continuous derivative.</p>
</li>
</ul>
</blockquote>
<p><a id="Notes"></a></p>
<h2 id="notes">Notes</h2>
<p><a id="Appendix"></a></p>
<h2 id="appendix">Appendix</h2>
<p><a id="Examples_of_Well_Behaved_Functions"></a></p>
<h3 id="examples-of-well-behaved-functions">Examples of Well-Behaved Functions</h3>
<p>In the examples given in this section, <em>f</em>(<em>λ</em>) is a function defined on the closed unit interval.</p>
<p>Generally, how well-behaved a function is depends on how many continuous derivatives it has, and whether those derivatives are Lipschitz continuous or Hölder continuous, among other things. The following lists several classes of functions, from worst to best behaved and from “largest” to “smallest”:</p>
<p>Functions continuous except possibly at one point → Continuous functions → Hölder continuous functions → Lipschitz continuous functions → With continuous first derivative → With Hölder continuous first derivative → With Lipschitz continuous first derivative → With continuous second derivative → With infinitely many derivatives → Analytic functions.</p>
<p><strong>Concave and convex functions.</strong> The following table shows examples of functions that are convex, concave, or neither. All these functions are continuous. Also, review the <a href="#Definitions"><strong>definitions</strong></a>.</p>
<table>
<thead>
<tr>
<th>Function <em>f</em>(<em>λ</em>)</th>
<th>Convex or concave?</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td>1− <em>λ</em><sup>2</sup>.</td>
<td>Concave.</td>
<td> </td>
</tr>
<tr>
<td><em>λ</em><sup>2</sup>.</td>
<td>Convex.</td>
<td> </td>
</tr>
<tr>
<td><em>λ</em><sup>2</sup> − <em>λ</em><sup>3</sup>.</td>
<td>Neither.</td>
<td> </td>
</tr>
<tr>
<td><em>λ</em><sup><em>z</em></sup>, where 0< <em>z</em>≤ 1.</td>
<td>Concave.</td>
<td> </td>
</tr>
<tr>
<td><em>λ</em><sup><em>z</em></sup>, where <em>z</em>≥ 1.</td>
<td>Convex.</td>
<td> </td>
</tr>
<tr>
<td>exp(−<em>λ</em>/4).</td>
<td>Concave.</td>
<td> </td>
</tr>
</tbody>
</table>
<p><strong>Hölder and Lipschitz continuous functions.</strong> The following table shows some functions that are Hölder continuous and some that are not. Also, review the <a href="#Definitions"><strong>definitions</strong></a>.</p>
<table>
<thead>
<tr>
<th>Function <em>f</em>(<em>λ</em>):</th>
<th>Hölder exponent (<em>α</em>) and an upper bound of the Hölder constant (<em>H</em>):</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td>$\lambda^z\cdot t$.</td>
<td><em>α</em>=<em>z</em>.<br /><em>L</em>=abs(<em>t</em>).</td>
<td>$0\lt z\le 1$, $t\ne 0$.</td>
</tr>
<tr>
<td>$\lambda^z\cdot t$.</td>
<td><em>α</em>=1 (Lipschitz continuous).<br /><em>L</em>=<em>z</em>*abs(<em>t</em>).</td>
<td>$z\ge 1$, $t$ is a real number.</td>
</tr>
<tr>
<td>$\lambda^{1/3}/4 + \lambda^{2/3}$/5.</td>
<td><em>α</em>=1/3.<br /><em>L</em>=9/20.</td>
<td><em>α</em> is the minimum of Hölder exponents, min(1/3, 2/3), and <em>L</em> is the sum of Hölder constants, 1/4+1/5.</td>
</tr>
<tr>
<td>$1/2-(1-2\lambda)^{z}/2$ if $\lambda<1/2$ and $1/2+(2\lambda-1)^{z}/2$ otherwise.</td>
<td><em>α</em>=<em>z</em>.<br /><em>L</em>=$2^z/2$.</td>
<td>$0\lt z\le 1$. In this example, $f$ has a “vertical” slope at 1/2, unless <em>z</em> is 1.</td>
</tr>
<tr>
<td>$3/4-\sqrt{\lambda(1-\lambda)}$.</td>
<td><em>α</em>=1/2.<br /><em>L</em>=1.</td>
<td>Has a “vertical” slope at 0 and 1.</td>
</tr>
<tr>
<td>Continuous and <strong>piecewise linear</strong>.</td>
<td><em>α</em>=1.<br /><em>L</em> is the greatest absolute value of the slope among all pieces’ slopes.</td>
<td><em>f</em>(<em>λ</em>) is <em>piecewise linear</em> if it’s made up of multiple linear functions defined on a finite number of “pieces”, or nonempty subintervals, that together make up $f$’s domain (in this case, the closed unit interval).</td>
</tr>
<tr>
<td>Piecewise linear; equals 0 at 0, 3/4 at 2/3 and 1/4 at 1, and these points are connected by linear functions.</td>
<td><em>α</em>=1.<br /><em>L</em> = 1.5.</td>
<td><em>L</em> = $\max(\text{abs}((3/4-0)/(2/3))$, $\text{abs}((1/4-3/4)/(1/3)))$.<br />Concave because the first piece’s slope is greater than the second piece’s.</td>
</tr>
<tr>
<td>min(<em>λ</em>*<em>mult</em>, 1−<em>ε</em>).</td>
<td><em>α</em>=1.<br /><em>L</em> = <em>mult</em>.</td>
<td><em>mult</em> > 0, <em>ε</em> > 0. Piecewise linear; equals 0 at 0, 1−<em>ε</em> at (1−<em>ε</em>)/<em>mult</em>, and 1−<em>ε</em> at 1.<br /><em>L</em>=max(<em>mult</em>, 0).<br />Concave.</td>
</tr>
<tr>
<td>1/10 if <em>λ</em> is 0; −1/(2*ln(<em>λ</em>/2)) + 1/10 otherwise.</td>
<td>Not Hölder continuous.</td>
<td>Has a slope near 0 that’s steeper than every “nth” root.</td>
</tr>
</tbody>
</table>
<blockquote>
<p><strong>Note:</strong> A Hölder continuous function with Hölder exponent <em>α</em> and Hölder constant <em>L</em> is also Hölder continuous with Hölder exponent <em>β</em> and Hölder constant bounded above by <em>L</em>, where 0 < <em>β</em> < <em>α</em>.</p>
</blockquote>
<p><strong>Finding parameters <em>α</em> and <em>L</em> for Hölder continuous functions.</strong> If $f(\lambda)$ is continuous, the following is one way to find the Hölder exponent (<em>α</em>) and Hölder constant (<em>L</em>) of $f$, to determine whether $f$ is Hölder continuous, not just continuous.</p>
<p>First, if $f$ has a bounded and continuous first derivative on its domain, then <em>α</em> is 1 ($f$ is <strong>Lipschitz continuous</strong>) and <em>L</em> is the maximum of the absolute value of that derivative.</p>
<p>Otherwise, consider the function $h(\lambda, c)=\text{abs}(f(\lambda)-f(c))/((\text{abs}(\lambda-c))^\alpha)$, or 0 if $\lambda=c$, where $0\lt\alpha\le 1$ is a Hölder exponent to test. For a given $\alpha$, let $g(\lambda)$ be the maximum of $h(\lambda,c)$ over all points $c$ where $f$ has a “vertical slope” or the “steepest slope exhibited”. If $g(\lambda)$ is bounded for a given $\alpha$ on $f$’s domain (in this case, the closed unit interval), then $f$ is Hölder continuous with Hölder exponent $\alpha$ and Hölder constant (<em>L</em>) equal to or greater than the maximum value of $g(\lambda)$ on its domain.</p>
<p>The following example, which uses the SymPy computer algebra library, plots $\max(h(\lambda,0),h(\lambda,1))$ when $f=\sqrt{\lambda(1-\lambda)}$ and $\alpha=1/2$: <code>lamda,c=symbols('lamda c'); func=sqrt(lamda*(1-lamda)); alpha=S(1)/2; h=Abs(func-func.subs(lamda,c))/Abs(lamda-c)**alpha; plot(Max(h.subs(c, 0), h.subs(c,1)), (lamda, 0, 1))</code>.</p>
<p><strong>Functions with a Hölder continuous or Lipschitz continuous derivative.</strong> The following table shows some functions whose derivatives are Hölder continuous, and others where that is not the case. (In the SymPy library, a function’s derivative can be found using the <code>diff</code> method.) In the following table, if $f$ has a bounded and continuous <em>second</em> derivative on its domain, then <em>α</em> is 1 (the first derivative is Lipschitz continuous) and <em>L</em> is the maximum of the absolute value of that <em>second</em> derivative.</p>
<table>
<thead>
<tr>
<th>Function $f(\lambda)$</th>
<th>Derivative’s Hölder exponent (<em>α</em>) and an upper bound of the derivative’s Hölder constant (<em>L</em>):</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td><em>λ</em><sup>1+<em>β</em></sup></td>
<td><em>α</em>=<em>β</em>.<br /><em>L</em> = 1+<em>β</em>.</td>
<td>0 < <em>β</em> ≤ 1.</td>
</tr>
<tr>
<td>$3/4-\sqrt{\lambda(1-\lambda)}$.</td>
<td>Derivative is not Hölder continuous.</td>
<td>Derivative is not Hölder continuous because $f$ is not Lipschitz continuous.</td>
</tr>
<tr>
<td>cosh(<em>λ</em>) − 3/4.</td>
<td><em>α</em>=1 (derivative is Lipschitz continuous).<br /><em>L</em> = cosh(1).</td>
<td>Continuous second derivative, namely cosh(<em>λ</em>). Convex. <code>cosh</code> is the hyperbolic cosine function.</td>
</tr>
<tr>
<td>$\lambda\cdot\sin(z\lambda)/4+1/2$.</td>
<td><em>α</em>=1.<br /><em>L</em> = $z(2+z\lambda)/4$.</td>
<td>Continuous second derivative. <em>L</em> is an upper bound of its absolute value. $z>0$.</td>
</tr>
<tr>
<td>$\sin(z\lambda)/4+1/2$.</td>
<td><em>α</em>=1.<br /><em>L</em> = $(z^2)/4$.</td>
<td>Continuous second derivative; <em>L</em> is an upper bound of its absolute value, namely abs($-\sin(z\lambda)\cdot z^2/4$). $z>0$.</td>
</tr>
<tr>
<td><em>λ</em><sup>2</sup>/2 + 1/10 if <em>λ</em> ≤ 1/2; <em>λ</em>/2 − 1/40 otherwise.</td>
<td><em>α</em>=1.<br /><em>L</em> = 1.</td>
<td>Lipschitz continuous derivative, with Lipschitz constant 1.</td>
</tr>
<tr>
<td>exp(−<em>λ</em>).</td>
<td><em>α</em>=1.<br /><em>L</em> = 1.</td>
<td>Lipschitz continuous derivative, with Lipschitz constant 1.<sup id="fnref:16" role="doc-noteref"><a href="#fn:16" class="footnote" rel="footnote">16</a></sup></td>
</tr>
<tr>
<td><em>λ</em>/2 if <em>λ</em> ≤ 1/2; (4*<em>λ</em> − 1)/(8*<em>λ</em>) otherwise.</td>
<td><em>α</em>=1.<br /><em>L</em>=1.</td>
<td>Concave. Lipschitz continuous derivative with Lipschitz constant 2.</td>
</tr>
</tbody>
</table>
<p><a id="Results_Used_in_Approximate_Bernoulli_Factories"></a></p>
<h3 id="results-used-in-approximate-bernoulli-factories">Results Used in Approximate Bernoulli Factories</h3>
<p>See the appendix of the page <a href="https://peteroupc.github.io/bernapprox.html"><strong>“Approximations in Bernstein Form”</strong></a>.</p>
<p><a id="How_Many_Coin_Flips_Are_Needed_to_Simulate_a_Polynomial"></a></p>
<h3 id="how-many-coin-flips-are-needed-to-simulate-a-polynomial">How Many Coin Flips Are Needed to Simulate a Polynomial?</h3>
<p>Let $p(\lambda)$ be a polynomial that maps the closed unit interval to itself and satisfies $0\lt p(\lambda)\lt 1$ whenever $0\lt\lambda\lt 1$.</p>
<p>Then $p$’s <em>coin-flipping degree</em> (Wästlund 1999)<sup id="fnref:17" role="doc-noteref"><a href="#fn:17" class="footnote" rel="footnote">17</a></sup> is the smallest value of $n$ such that $p$’s Bernstein coefficients of degree $n$ lie in the closed unit interval. <sup id="fnref:18" role="doc-noteref"><a href="#fn:18" class="footnote" rel="footnote">18</a></sup> (This is broader than the use of the term in Wästlund, where a polynomial can have a coin-flipping degree only if its “power” coefficients are integers.) The coin-flipping degree is the smallest value of $n$ such that the algorithm of Goyal and Sigman (2012)<sup id="fnref:7:2" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup> can toss heads with probability $p(\lambda)$ using exactly $n$ biased coin flips (in addition to a fair coin).</p>
<p>The following results give upper bounds on $p$’s coin-flipping degree.</p>
<p>Suppose $p$ is in Bernstein form of degree $m$ with Bernstein coefficients $b_0, …, b_m$. Then:</p>
<ul>
<li>If $0\le\min(b_0, …, b_m)\le\max(b_0, …, b_m)\le 1$, then the coin-flipping degree is bounded above by $m$.</li>
<li>
<p>If $0\le\min(b_0, …, b_m)$ and $\max(b_0, …, b_m)\gt 1$, then the coin-flipping degree is bounded above by—</p>
\[m+\text{iceil}\left(\frac{m(m-1)}{2}\frac{\max(1-b_0, ..., 1-b_m)}{1-\text{Pmax}} - m\right),\]
<p>where iceil($x$) is $x+1$ if $x$ is an integer, or ceil($x$) otherwise, and where $\text{Pmax}$ is the maximum value of $p(\lambda)$ on the closed unit interval (Powers and Reznick 2001)<sup id="fnref:19" role="doc-noteref"><a href="#fn:19" class="footnote" rel="footnote">19</a></sup>.</p>
</li>
<li>
<p>If $\min(b_0, …, b_m)\lt 0$ and $\max(b_0, …, b_m)\le 1$, then the coin-flipping degree is bounded above by—</p>
\[m+\text{iceil}\left(\frac{m(m-1)}{2}\frac{\max(b_0, ..., b_m)}{\text{Pmin}} - m\right),\]
<p>where $\text{Pmin}$ is the <em>minimum</em> value of $p(\lambda)$ on the closed unit interval (Powers and Reznick 2001)<sup id="fnref:19:1" role="doc-noteref"><a href="#fn:19" class="footnote" rel="footnote">19</a></sup>.</p>
</li>
<li>
<p>Suppose $m\ge 2$, that $b_0=0$ or $b_m=0$ or both, and that the following necessary conditions are satisfied (Mok and To 2008; Theorem 1 and Corollary 3)<sup id="fnref:20" role="doc-noteref"><a href="#fn:20" class="footnote" rel="footnote">20</a></sup>:</p>
<ul>
<li>For every $i$ such that $b_i\lt 0$, if $b_m=0$, there must be $j\gt i$ such that $b_j\gt 0$.</li>
<li>For every $i$ such that $b_i\lt 0$, if $b_0=0$, there must be $j\lt i$ such that $b_j\gt 0$.</li>
<li>For every $i$ such that $b_i\gt 1$, if $b_m=1$, there must be $j\gt i$ such that $b_j\lt 1$.</li>
<li>For every $i$ such that $b_i\gt 1$, if $b_0=1$, there must be $j\lt i$ such that $b_j\lt 1$.</li>
</ul>
<p>Then the coin-flipping degree is bounded above by—</p>
\[\max(M(b_0, ..., b_m), M(1-b_0, ..., 1-b_m)),\]
<p>where—</p>
\[M(\beta_0, ..., \beta_m) = \text{ceil}\left(\max\left(2m\mu, \frac{m(m-1)\mu^m}{2(1-c)}\frac{a_{\max}}{a_{\min}}\right)\right),\]
<p>and where:</p>
<ul>
<li>$a_{\max} = \max(\max(0,\beta_0), …, \max(0, \beta_m))$.</li>
<li>$b_{\max} = \max(-\min(0, \beta_0{m\choose 0}), …, -\min(0, \beta_m{m\choose m})$.</li>
<li>$a_{\min}$ is the minimum of $(\beta_i{m\choose i})$ over all values of $i$ such that $\beta_i>0$.</li>
<li>$\mu = m+Lb_{\max}/a_{\min}$ where $L$ is the number of values of $\beta_i$ that satisfy $\beta_i<0$.</li>
<li>$c$ is the smallest number $r$ that satisfies $FN(\lambda)/FP(\lambda)\le r$ where $0\lt\lambda\lt 1$. $c$ can also be a greater number but less than 1.</li>
<li>$FP(\lambda) = \sum_{k=0}^m \max(0,\beta_k){m\choose k}\lambda^k(1-\lambda)^{m-k}$.</li>
<li>$FN(\lambda) = \sum_{k=0}^m -\min(0,\beta_k){m\choose k}\lambda^k(1-\lambda)^{m-k}$.</li>
</ul>
<p>(Mok and To 2008; Theorem 2 and remark 1.5(v))<sup id="fnref:20:1" role="doc-noteref"><a href="#fn:20" class="footnote" rel="footnote">20</a></sup>.</p>
<p>Additional results are found in Leroy (2008)<sup id="fnref:51" role="doc-noteref"><a href="#fn:51" class="footnote" rel="footnote">21</a></sup>.</p>
</li>
</ul>
<blockquote>
<p><strong>Examples:</strong></p>
<ol>
<li>Let $p(\lambda)=1 -8\lambda +20\lambda^2 -13\lambda^3$, a polynomial of degree $m=3$. $p$’s Bernstein coefficients are $b_0=1, b_1=-5/3, b_2=7/3, b_3=0$, and its coin-flipping degree is 46 (Wästlund 1999, Example 4.4)<sup id="fnref:17:1" role="doc-noteref"><a href="#fn:17" class="footnote" rel="footnote">17</a></sup>.</li>
<li>An exhaustive search shows that 46 is the highest possible coin-flipping degree for a degree-3 polynomial whose “power” coefficients are integers.</li>
<li>The degree-4 polynomial $-43\lambda^4 + 81\lambda^3 - 47\lambda^2 + 9\lambda$ has a coin-flipping degree of 5284.</li>
</ol>
<p><strong>Note:</strong> If a polynomial’s “power” coefficients can be rational numbers (ratios of two integers), even a degree-2 polynomial can have an arbitrarily high coin-flipping degree. An example is the family of degree-2 polynomials $r\lambda-r\lambda^2$, where $r$ is a rational number greater than 0 and less than 4.</p>
</blockquote>
<p><strong>Lemma:</strong> Let $p(\lambda)=a_0 \lambda^0 + … + a_n \lambda^n$ be a polynomial that maps the closed unit interval to itself. Then the values $a_0, …, a_n$ must sum to a value that is 0 or greater and 1 or less.</p>
<p><em>Proof</em>: This can be seen by evaluating $p(1) = a_0 + … + a_n$. If $p(1)$ is less than 0 or greater than 1, then $p$ does not meet the hypothesis of the lemma. □</p>
<p>In the following lemmas, let $p(\lambda)=a_0 \lambda^0 + … + a_n \lambda^n$ be a polynomial that maps the closed unit interval to itself and satisfies $0\lt p(\lambda)\lt 1$ whenever $0\lt\lambda\lt 1$.</p>
<p><strong>Lemma:</strong> If $p$’s coin-flipping degree is $n$, then $\text{abs}(a_i)\le 2^i {n\choose i}$.</p>
<p><em>Proof</em>: Consider the matrix that transforms a polynomial’s Bernstein coefficients to “power” coefficients, which is $n\times n$ if the polynomial’s degree is $n$ (Ray and Nataraj 2012, eq. (8))<sup id="fnref:21" role="doc-noteref"><a href="#fn:21" class="footnote" rel="footnote">22</a></sup>. Given the hypothesis of the lemma, each Bernstein coefficient must lie in the closed unit interval and the required matrix size is $n$, which is $p$’s coin-flipping degree. For each row of the matrix ($0\le i\le n$), the corresponding “power” coefficient of the polynomial equals a linear combination of that row with a vector of Bernstein coefficients. Thus, the $i$-th power coefficient equals $a_i$ and its absolute value is bounded above by $\sum_{m=0}^i {n\choose m}{n-m\choose i-m} = 2^i {n\choose i}$. □</p>
<p><strong>Lemma:</strong> $\text{abs}(a_i)\le \text{abs}(b_i)$, where $b_i$ is the corresponding power coefficient of the following polynomial:</p>
\[q(\lambda) = b_0 \lambda^0 + ... + b_n\lambda^n = (T_n(1-2\lambda)+1)/2,\]
<p>and where $T_n(x)$ is the <a href="https://mathworld.wolfram.com/ChebyshevPolynomialoftheFirstKind.html"><strong>Chebyshev polynomial of the first kind</strong></a> of degree $n$.</p>
<p>See <em>MathOverflow</em> for a <a href="https://mathoverflow.net/questions/449135"><strong>proof of this lemma</strong></a> by Fedor Petrov.</p>
<p><a id="Proofs_for_Polynomial_Building_Schemes"></a></p>
<h3 id="proofs-for-polynomial-building-schemes">Proofs for Polynomial-Building Schemes</h3>
<p>This section shows mathematical proofs for some of the polynomial-building schemes of this page.</p>
<p>In the following results:</p>
<ul>
<li>A <em>strictly bounded factory function</em> means a continuous function on the closed unit interval, with a minimum of greater than 0 and a maximum of less than 1.</li>
<li>A function <em>f</em>(<em>λ</em>) is <em>polynomially bounded</em> if both <em>f</em>(<em>λ</em>) and 1−<em>f</em>(<em>λ</em>) are greater than or equal to min(<em>λ</em><sup><em>n</em></sup>, (1−<em>λ</em>)<sup><em>n</em></sup>) for some integer <em>n</em> (Keane and O’Brien 1994)<sup id="fnref:22" role="doc-noteref"><a href="#fn:22" class="footnote" rel="footnote">23</a></sup>. For examples, see “<a href="https://peteroupc.github.io/bernoulli.html#About_Bernoulli_Factories"><strong>About Bernoulli Factories</strong></a>”.</li>
<li>A <em>modulus of continuity</em> of a function <em>f</em> means a nonnegative and nowhere decreasing function <em>ω</em> on the closed unit interval, for which <em>ω</em>(0) = 0, and for which abs(f(<em>x</em>) − f(<em>y</em>)) ≤ <em>ω</em>(abs(<em>x</em>−<em>y</em>)) whenever <em>x</em> and <em>y</em> are in <em>f</em>’s domain. Loosely speaking, a modulus of continuity <em>ω</em>(<em>δ</em>) is greater than or equal to <em>f</em>’s maximum range in a window of size <em>δ</em>.</li>
</ul>
<p><strong>Lemma 1.</strong> Omitted.</p>
<p>Lemma 6(i) of Nacu and Peres (2005)<sup id="fnref:1:7" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> can be applied to continuous functions beyond just Lipschitz continuous functions. This includes the larger class of <em>Hölder continuous</em> functions (see “<a href="#Definitions"><strong>Definitions</strong></a>”).</p>
<p><strong>Lemma 2.</strong> <em>Let f(λ) be a continuous function that maps the closed unit interval to itself, let X be a hypergeometric(2*n, k, n) random variable, and let $n\ge 1$ be an integer.</em></p>
<ol>
<li><em>Let ω(x) be a modulus of continuity of f. If ω is continuous and concave, then the expression—<br />abs(<strong>E</strong>[f(X/n)] − f(k/(2*n))), (1)<br />is less than or equal to each of the following:</em>
<ul>
<li><em>ω(sqrt(1/(8*n−4))).</em></li>
<li><em>ω(sqrt(1/(7*n))) if n≥4.</em></li>
<li><em>ω(sqrt(1/(2*n))).</em></li>
<li><em>ω(sqrt( (k/(2*n)) * (1−k/(2*n)) / (2*n−1) )).</em></li>
</ul>
</li>
<li><em>If f is Hölder continuous with Hölder constant M and with Hölder exponent α such that 0 < α ≤ 1, then the expression (1) is less than or equal to—</em>
<ul>
<li><em>M*(1/(2*n))$\alpha/2$,</em></li>
<li><em>M*(1/(7*n))$\alpha/2$ if n≥4, and</em></li>
<li><em>M*(1/(8*n−4))$\alpha/2$.</em></li>
</ul>
</li>
<li><em>If f has a Lipschitz continuous derivative with Lipschitz constant M, then the expression (1) is less than or equal to—</em>
<ul>
<li><em>(M/2)*(1/(7*n)) if n≥4, and</em></li>
<li><em>(M/2)*(1/(8*n−4)).</em></li>
</ul>
</li>
</ol>
<p><em>Proof.</em></p>
<ol>
<li><em>ω</em> is assumed to be nonnegative because absolute values are nonnegative. To prove the first and second bounds: abs(<strong>E</strong>[<em>f</em>(<em>X</em>/<em>n</em>)] − <em>f</em>(<em>k</em>/(2 * <em>n</em>))) ≤ <strong>E</strong>[abs(<em>f</em>(<em>X</em>/<em>n</em>) − <em>f</em>(<em>k</em>/(2 * <em>n</em>))] ≤ <strong>E</strong>[<em>ω</em>(abs(<em>X</em>/<em>n</em> − <em>k</em>/(2 * <em>n</em>))] (by the definition of <em>ω</em>) ≤ <em>ω</em>(<strong>E</strong>[abs(<em>X</em>/<em>n</em> − <em>k</em>/(2 * <em>n</em>))]) (by Jensen’s inequality and because <em>ω</em> is concave) ≤ <em>ω</em>(sqrt(<strong>E</strong>[abs(<em>X</em>/<em>n</em> − <em>k</em>/(2 * <em>n</em>))]<sup>2</sup>)) = <em>ω</em>(sqrt(<strong>Var</strong>[<em>X</em>/<em>n</em>])) = <em>ω</em>(sqrt((<em>k</em>*(2 * <em>n</em>−<em>k</em>)/(4*(2 * <em>n</em>−1)*<em>n</em><sup>2</sup>)))) ≤ <em>ω</em>(sqrt((<em>n</em><sup>2</sup>/(4*(2 * <em>n</em>−1)*<em>n</em><sup>2</sup>)))) = <em>ω</em>(sqrt((1/(8*<em>n</em>−4)))) = <em>ρ</em>, and for every integer <em>n</em>≥4, <em>ρ</em> ≤ <em>ω</em>(sqrt(1/(7*<em>n</em>))). To prove the third bound: abs(<strong>E</strong>[<em>f</em>(<em>X</em>/<em>n</em>)] − <em>f</em>(<em>k</em>/(2 * <em>n</em>))) ≤ <em>ω</em>(sqrt(<strong>Var</strong>[<em>X</em>/<em>n</em>])) ≤ <em>ω</em>(sqrt(1/(2*n))). To prove the fourth bound: abs(<strong>E</strong>[<em>f</em>(<em>X</em>/<em>n</em>)] − <em>f</em>(<em>k</em>/(2 * <em>n</em>))) ≤ <em>ω</em>(sqrt((<em>n</em><sup>2</sup>/(4*(2 * <em>n</em>−1)*<em>n</em><sup>2</sup>)))) = <em>ω</em>(sqrt( (<em>k</em>/(2*<em>n</em>)) * (1−<em>k</em>/(2*<em>n</em>)) / (2*<em>n</em>−1) )).</li>
<li>By the definition of Hölder continuous functions, take <em>ω</em>(<em>x</em>) = <em>M</em>*<em>x</em><sup><em>α</em></sup>. Because <em>ω</em> is a concave modulus of continuity on the closed unit interval, the result follows from part 1.</li>
<li>
<p>(Much of this proof builds on Nacu and Peres 2005, Proposition 6(ii)<sup id="fnref:1:8" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.) The expected value (see note 1) of $X$ is $E[X/n]=k/(2n)$. Since $E[X/n-k/(2n)] = 0$, it follows that $f’(X/n) E(X/n-k/(2n)) = 0$. Moreover, $\text{abs}(f(x)-f(s)-f’(x)(x-s))\le (M/2)(x-s)^2$ (see Micchelli 1973, Theorem 3.2)<sup id="fnref:23" role="doc-noteref"><a href="#fn:23" class="footnote" rel="footnote">24</a></sup>, so—</p>
\[E[\text{abs}(f(X/n)-f(k/(2n)))]=\text{abs}(E[f(X/n)-f(k/(2n))-f'(k/(2n))(X/n-k/(2n))])\]
\[\le (M/2)(X/n-k/(2n))^2 \le (M/2) Var(X/n).\]
<p>By part 1’s proof, it follows that (<em>M</em>/2)*<strong>Var</strong>[<em>X</em>/<em>n</em>] = (<em>M</em>/2)*(<em>k</em>*(2 * <em>n</em>−<em>k</em>)/(4*(2 * <em>n</em>−1)*<em>n</em><sup>2</sup>)) ≤ (<em>M</em>/2)*(<em>n</em><sup>2</sup>/(4*(2 * <em>n</em>−1)*<em>n</em><sup>2</sup>)) = (<em>M</em>/2)*(1/(8*<em>n</em>−4)) = <em>ρ</em>. For every integer <em>n</em>≥4, <em>ρ</em> ≤ (<em>M</em>/2)*(1/(7*<em>n</em>)). □</p>
</li>
</ol>
<blockquote>
<p><strong>Notes:</strong></p>
<ol>
<li><strong>E</strong>[.] means expected value (“long-run average”), and <strong>Var</strong>[.] means variance. A hypergeometric(2 * <em>n</em>, <em>k</em>, <em>n</em>) random variable is the number of “good” balls out of <em>n</em> balls taken uniformly at random, all at once, from a bag containing 2 * <em>n</em> balls, <em>k</em> of which are “good”.</li>
<li>Parts 1 through 3 exploit a tighter bound on <strong>Var</strong>[<em>X</em>/<em>n</em>] than the bound given in Nacu and Peres (2005, Lemma 6(i) and 6(ii), respectively)<sup id="fnref:1:9" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. However, for technical reasons, different bounds are proved for different ranges of integers <em>n</em>.</li>
<li>All continuous functions that map the closed unit interval to itself, including all of them that admit a Bernoulli factory, have a modulus of continuity. The proof of part 1 remains valid even if <em>ω</em>(0) > 0, because the bounds proved remain correct even if <em>ω</em> is overestimated. The following functions have a simple modulus of continuity that satisfies the lemma:
<ol>
<li>If <em>f</em> is strictly increasing and convex, <em>ω</em>(<em>x</em>) can equal <em>f</em>(1) − <em>f</em>(1−<em>x</em>) (Gal 1990)<sup id="fnref:24" role="doc-noteref"><a href="#fn:24" class="footnote" rel="footnote">25</a></sup>; (Gal 1995)<sup id="fnref:25" role="doc-noteref"><a href="#fn:25" class="footnote" rel="footnote">26</a></sup>.</li>
<li>If <em>f</em> is strictly decreasing and convex, <em>ω</em>(<em>x</em>) can equal <em>f</em>(0) − <em>f</em>(<em>x</em>) (Gal 1990)<sup id="fnref:24:1" role="doc-noteref"><a href="#fn:24" class="footnote" rel="footnote">25</a></sup>; (Gal 1995)<sup id="fnref:25:1" role="doc-noteref"><a href="#fn:25" class="footnote" rel="footnote">26</a></sup>.</li>
<li>If <em>f</em> is strictly increasing and concave, <em>ω</em>(<em>x</em>) can equal <em>f</em>(<em>x</em>) − <em>f</em>(0) (by symmetry with 2).</li>
<li>If <em>f</em> is strictly decreasing and concave, <em>ω</em>(<em>x</em>) can equal <em>f</em>(1−<em>x</em>) − <em>f</em>(1) (by symmetry with 1).</li>
<li>If <em>f</em> is concave and is strictly increasing then strictly decreasing, then <em>ω</em>(<em>h</em>) can equal (<em>f</em>(min(<em>h</em>, <em>σ</em>))+(<em>f</em>(1−min(<em>h</em>, 1−<em>σ</em>))−<em>f</em>(1)), where <em>σ</em> is the point where <em>f</em> stops increasing and starts decreasing (Anastassiou and Gal 2012)<sup id="fnref:26" role="doc-noteref"><a href="#fn:26" class="footnote" rel="footnote">27</a></sup>.</li>
</ol>
</li>
</ol>
</blockquote>
<p>There are weaker bounds for Lemma 2, part 1, which work even if $f$’s modulus of continuity $\omega$ is not concave. According to Pascu et al. (2017, Lemma 5.1)<sup id="fnref:27" role="doc-noteref"><a href="#fn:27" class="footnote" rel="footnote">28</a></sup>:</p>
\[\text{abs}(\mathbb{E}[f(Y)] - f(\mathbb{E}[Y])) \le \omega(\delta) + \omega(\delta)\text{Var}[Y]/\delta^2,\]
<p>where $f$ is a continuous function, $Y$ is a discrete random variable on a closed interval, and $\delta>0$. Given that $Y = X/n$ (where $X$ is as in Lemma 2), taking $\delta=1/n^{1/2}$ leads to:</p>
\[\text{abs}(\mathbb{E}[f(Y)]-f(\mathbb{E}[Y]))=\text{abs}(\mathbb{E}[f(Y)]-f(k/(2n)))\le\omega(1/n^{1/2}) (1+n\cdot\text{Var}[Y]),\]
<p>and in turn, plugging in bounds for $\text{Var}[Y]$ leads to the following bounds for Lemma 2, part 1:</p>
<ul>
<li>$\omega(1/n^{1/2}) (1+n/(8n-4))$.</li>
<li>$\omega(1/n^{1/2}) (1+n/(7n)) = \frac{8}{7}\omega(1/n^{1/2})$ if n≥4.</li>
<li>$\omega(1/n^{1/2}) (1+n/(2n)) = \frac{3}{2}\omega(1/n^{1/2})$.</li>
</ul>
<p><strong>Lemma 2A.</strong> _Let f(λ) map the closed unit interval to itself, and let $C=15$. Suppose $f$ is in the Zygmund class with constant $D$ or less. Then, for every integer $n$ ≥ 1, the expression (1) in Lemma 2 is less than or equal to $(C/2) D\sqrt{1/(8n-4)}$.</p>
<p><em>Proof.</em> Strukov and Timan (1977)<sup id="fnref:28" role="doc-noteref"><a href="#fn:28" class="footnote" rel="footnote">29</a></sup> proved the following bound:</p>
\[\text{abs}(\mathbb{E}[f(Y)]-f(\mathbb{E}[Y]))\le C\omega_2((\text{Var}[Y])^{1/2}/2),\]
<p>where <em>Y</em> is a random variable and ω<sub>2</sub>(.) is a <em>second-order modulus of continuity</em> of <em>f</em> (see note below), and where $C$ is 3 if $Y$ takes on any value in the real line, or 15 if $Y$ takes on only values in a closed interval, such as the closed unit interval in this case.</p>
<p>Suppose <em>Y</em> = <em>X</em>/<em>n</em>, where <em>X</em> is as in Lemma 2. Then <em>Y</em>’s variance (<strong>Var</strong>[<em>Y</em>]) is less than or equal to 1/(8*<em>n</em>− 4), and the left-hand side of Strukov and Timan’s bound is the same as the expression (1).</p>
<p>Since <em>f</em> is in the Zygmund class, there is an $\omega_2$ for it such that $\omega_{2}(h)\le D h$. Therefore, applying Strukov and Timan’s bound and the bound on <em>Y</em>’s variance leads to—</p>
\[\text{abs}(\mathbb{E}[f(Y)]-f(\mathbb{E}[Y]))\le C\omega_2((\text{Var}[Y])^{1/2}/2)\]
\[\le CD ((\text{Var}[Y])^{1/2}/2) = CD\sqrt{1/(8n-4)}/2.\]
<p>□</p>
<blockquote>
<p><strong>Note:</strong> A <em>second-order modulus of continuity</em> is a nonnegative and nowhere decreasing function <em>ω</em><sub>2</sub>(<em>h</em>) with <em>h</em> ≥ 0, for which <em>ω<sub>2</sub></em>(0) = 0, and for which abs($f(x)+f(y)-2 f((x+y)/2)$) ≤ $\omega_2(\text{abs}((y-x)/2))$ whenever <em>f</em> is continuous and <em>x</em> and <em>y</em> are in <em>f</em>’s domain.</p>
</blockquote>
<p><strong>Theorem 1.</strong> <em>Let $f$ be a strictly bounded factory function, let $n_0\ge 1$ be an integer, and let $\phi(n)$ be a function that takes on a nonnegative value. Suppose $f$ is such that the expression (1) in Lemma 2 is less than or equal to $\phi(n)$ whenever $n\ge n_0$ is an integer power of 2. Let—</em></p>
\[\eta(n)=\sum_{k\ge \log_2(n)} \phi(2^k),\]
<p><em>for every integer n≥1 that’s a power of 2. If the series η(n) converges to a finite value for each such $n$, and if it converges to 0 as $n$ gets large, then the following scheme for f(λ) is valid in the following sense:</em></p>
<p><em>There are polynomials $g_n$ and $h_n$ (where $n\ge 1$ is an integer power of 2) as follows. The $k$-th Bernstein coefficient of $g_n$ and $h_n$ is <strong>fbelow</strong>(n, k) and <strong>fabove</strong>(n, k), respectively (where $0\le k\le n$), where:</em></p>
<p><em>If $n_0 = 1$:</em></p>
<ul>
<li><em><strong>fbelow</strong>(n, k) =</em> $f(k/n)-\eta(n)$.</li>
<li><em><strong>fabove</strong>(n, k) =</em> $f(k/n)+\eta(n)$.</li>
</ul>
<p><em>If $n_0 > 1$:</em></p>
<ul>
<li><em><strong>fbelow</strong>(n, k) =</em> min(<strong>fbelow</strong>($n_0$,0), <strong>fbelow</strong>($n_0$,1), …, <strong>fbelow</strong>($n_0$,$n_0$)) <em>if</em> $n < n_0$; $f(k/n)-\eta(n)$ <em>otherwise.</em></li>
<li><em><strong>fabove</strong>(n, k) =</em> max(<strong>fabove</strong>($n_0$,0), <strong>fabove</strong>($n_0$,1), …, <strong>fbelow</strong>($n_0$,$n_0$)) <em>if</em> $n < n_0$; $f(k/n)+\eta(n)$ <em>otherwise.</em></li>
</ul>
<p><em>The polynomials $g_n$ and $h_n$ satisfy:</em></p>
<ol>
<li><em>$g_n \le h_n$.</em></li>
<li><em>$g_n$ and $h_n$ converge to $f$ as $n$ gets large.</em></li>
<li>$(g_{n+1}-g_n)$ <em>and</em> $(h_{n}-h_{n+1})$ <em>are polynomials with nonnegative Bernstein coefficients once they are rewritten to polynomials in Bernstein form of degree exactly $n+1$.</em></li>
</ol>
<p><em>Proof.</em> For simplicity, this proof assumes first that $n_0 = 1$.</p>
<p>For the series <em>η</em>(<em>n</em>) in the theorem, because $\phi(n)$ is nonnegative, each term of the series is nonnegative making the series nonnegative and, by the assumption that the series converges, <em>η</em>(<em>n</em>) is nowhere increasing with increasing <em>n</em>.</p>
<p>Item 1 is trivial. If $n\ge n_0$, $g_n$ is simply the Bernstein polynomial of $f$ minus a nonnegative value, and $h_n$ is the Bernstein polynomial of $f$ plus that same value, and if $n$ is less than $n_0$, $g_n$ is a constant value not less than the lowest point reachable by the lower polynomials, and $h_n$ is a constant value not less than the highest point reachable by the upper polynomials.</p>
<p>Item 2 is likewise trivial. A well known result is that the Bernstein polynomials of $f$ converge to $f$ as their degree $n$ gets large. And because the series <em>η</em> (in Theorem 1) sums to a finite value that goes to 0 as $n$ increases, the upper and lower shifts will converge to 0 so that $g_n$ and $h_n$ converge to the degree-$n$ Bernstein polynomials and thus to $f$.</p>
<p>Item 3 is the <em>consistency requirement</em> described earlier in this page. This is ensured as in Proposition 10 of Nacu and Peres (2005)<sup id="fnref:1:10" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> by bounding, from below, the offset by which to shift the approximating polynomials. This lower bound is <em>η</em>(<em>n</em>), a solution to the equation 0 = <em>η</em>(<em>n</em>) − <em>η</em>(2 * <em>n</em>) − <em>φ</em>(<em>n</em>) (see note below), where <em>φ</em>(<em>n</em>) is a function that takes on a nonnegative value.</p>
<p><em>φ</em>(<em>n</em>) is, roughly speaking, the minimum distance between one polynomial and the next so that the consistency requirement is met between those two polynomials. Compare the assumptions on <em>φ</em> in Theorem 1 with equations (10) and (11) in Nacu and Peres (2005).</p>
<p>The solution for $\eta(n)$ given in the statement of the theorem is easy to prove by noting that this is a recursive process: we start by calculating the series for <em>n</em> = 2*<em>n</em>, then adding <em>φ</em>(<em>n</em>) to it (which will be positive), in effect working backwards and recursively, and we can easily see that we can calculate the series for <em>n</em> = 2*<em>n</em> only if the series converges, hence the assumption of a converging series.</p>