From 3b027244ec241a75b2b78115470cfb32d383007f Mon Sep 17 00:00:00 2001 From: Greg Weber Date: Thu, 24 Sep 2020 15:24:37 -0500 Subject: [PATCH 1/3] RFC for QoS: Quality of Service --- text/2020-09-24-QoS.md | 163 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 163 insertions(+) create mode 100644 text/2020-09-24-QoS.md diff --git a/text/2020-09-24-QoS.md b/text/2020-09-24-QoS.md new file mode 100644 index 00000000..44360e06 --- /dev/null +++ b/text/2020-09-24-QoS.md @@ -0,0 +1,163 @@ +# QoS: Quality of Service + +## Motivation + +Queries compete for resources and thus interfere with each other. Currently users can only deal with this in very time consuming ways by either increasing cluster capacity or altering their applications, the latter of which may take hours or days. + +Users want to ensure a quality of service for their queries. Some queries should be prioritized above others. For queries of the same priority, resources should be divided fairly among queries. When there are multiple tenants, provide resource isolation but still allow for high utilization. + +## Summary + +This solution provide QoS at the level of the TiKV node. QoS is configured both globally in PD and dynamically by clients. + + * QoS Policy is set in PD for region groups such as key spaces (tenant) and tables + * Larger region groups have more capacity allocated + * Allow an application (TiDB) to create its own policies by sending a QoS-Request that further prioritizes its own capacity. + * Analytics queries can request a low QoS + * Apply local back pressure on a TiKV node by rejecting queries using too much capacity + + +## Terminology + +* QoS: a relative priority setting. This is not a quota: usage is always “bursting” to achieve high utilization. +* Capacity: the total resources available to be prioritized +* Key Space: in a multi-tenant setup, every tenant gets a distinct key space. More generally a key space is designed for applications with different data ownership. + + +## Detailed design + +### Architectural and Implementation advantages + +Ti Components are loosely coupled: + * PD stores policies and communicates them to TiKV + * TiKV performs query admission + * TiDB can have its own QoS policies by just sending a header + +Iterative. We can produce a useful first version without: + * Bursting + * Fairness with a PD placement policy + * Back Pressure fairness with detailed resource usage measurements + + +This is designed to be a minimal step towards supporting QoS sensitive workloads such as multi-tenant. Future work will be needed to create an improved scheduler and probably to provide a more global perspective. + +### TiKV Back Pressure + +#### Local Back Pressure at TiKV + +TiKV will have an admission controller component. This component will track the QoS status and reject queries before they are accepted. + +The downside to following this approach is that TiKV does not understand a multi-node query. One node blocking a query can slow down a larger transaction and end up slowing down the system as a whole. Trying to give TiKV global information won’t scale up well for a large cluster. + +#### Query inhibition + +Queries should be inhibited based on +* The total capacity available on the node +* The QoS policies that apply to the query +* The estimated resources needed for the queries + +#### Resource Estimation + +The amount of inhibition required depends on the number of requests and amount of resources being requested. Effectively when resources are highly utilized we build up a queue of pending requests with a limited size where the overflow is rejected. + +Policy application is allowed to take into account resources that will be used + +* less intensive queries can be prioritized above more intensive queries, particularly for bursting +* queries can be prioritized that together make for better resource utilization given the multiple dimensions of resource usage. + +#### Resource measurement + +TiKV must measure the resource usage of the node as a whole. +However, in our first version we do not take into account the actual usage of different policies. To improve our ability to estimate resource usage we will need to develop the ability to measure the actual resources used of policies being applied. These measurements can eventually be used to apply QoS more intelligently. For example, the effects of bad estimates can be corrected. + +### QoS Policy + +#### QoS Value and composition + +QoS is specified as an integer value on a linear scale. A greater value reflects a greater priority and a value twice as large is twice as high of a priority. Negative values are effectively treated as a fraction between 0 and 1. + +QoS values can compose in two different ways (these are also discussed in later sections) +* Inner Override (replace): a table QoS value overrides a keyspace QoS setting +8 Inner Prioritization (greater specificity): a custom application request QoS value is a priority relative to other application requests. The application as a whole is still governed by the keyspace QoS value + +#### QoS Policy stored in PD + +A QoS policy is set by an administrator in PD. It is a combination of a region group and a QoS value. The main region group is a key space. Smaller regions within a key space may be specified such as a table and this QoS setting will take precedence over that of the key space. These groups are dynamic (new regions can be added) and translated to regions by PD which has knowledge of tenant and table groupings. + +A default QoS request setting may be provided for applications that send a QoS value per request, see the QoS Request section. + +These QoS policies must be periodically (perhaps once a minute) communicated to TiKV. + +A first implementation can assume that all regions have the same QoS. + +#### QoS Request: Custom Application Policies + +In TiDB we would like to attach a QoS to a user, a role, or some other TiDB specific object. These application-specific policies should remain in TiDB rather than being pushed down to PD. + +The application will already have a QoS relative to other applications based on the number of regions in its key space and the QoS setting. Application-specific policies allow queries within the same QoS to be prioritized differently. + +Custom application policies are sent to TiKV by setting a “QoS-Request” field. The QoS request is relative to other requests using the same QoS Key and the value is not compared to the region-based QoS that it is dividing. + +A default value for QoS-Request may be set in PD as part of the QoS policy for a region group. Otherwise a default value is assumed. + +In a delegated authentication setup, the QoS-Request field should be received in a signed auth token. This proves the QoS was negotiated with the application owner. + +#### Policy Application + +When a node approaches full utilization capacity it inhibits queries by prioritizing queries based on QoS Policy. QoS Policy is a combination of recorded settings in PD and dynamic QoS requests that determine the relative QoS share of different region groups. + +The QoS share of a region group is specified by the QoS Policy in PD for that group or a low priority request QoS setting. That QoS is multiplied by the total number of regions in the group on the node so that larger groups get assigned more capacity. + +Queries of groups that are being inhibited then are prioritized according to their QoS request value (if these values are sent). + + +### PD Placement for multi-tenancy + +This QoS solution is expected to perform poorly in the following scenario: + * multi-tenant where a tenant has few regions on a node + * a user has hot regions on one node and cold regions on another + +Here the user will not get to share capacity between their hot and cold regions. +We can solve this with group-based node placement. + +A group are regions of a single key space (tenant) that is a balance of hot and cold regions. +If multiple hotspot regions are in the same group, we should balance these regions to other groups. + +Leaders of regions in the same group are placed on the same node. For a small user with just one region group, this placement reduces the liklihood of a small availability incident occurring but greatly increases the probability of a large availability incident, which is undesireable. For a large user with many groups the overall availability may not be changed. + +See this [PD Github Issue](https://github.com/tikv/pd/issues/2950). + + +### Dynamic QoS adjustment by PD + +Instead of altering physical data placement, PD can dynamically adjust the QoS value of regions. PD can tell a TiKV node that the QoS for a hot region is larger to make up for cold regions not utilizing capacity on another node. This approach requires hot regions to be balanced evenly across nodes. + +This creates a more active role for PD in which it needs to both +* understand what regions are hot and cold (it already does this to some extent) +* update TiKV nodes about QoS settings more frequently + + + +## Drawbacks + + * No Global perspective + * Because different queries operate on different nodes, a query with a lower QoS request may effectively be prioritized above a query with a higher QoS. + * Tenants will experience degraded QoS due to tenant conflict in some cases, but this can be mitigated by rebalancing + * No integration with a resource scheduler + * No ability to stop queries once started + + + +## Alternatives + +Static quota enforcement. Users may prefer to communicate about QoS in terms of quota guarantees. However, static quotas can be inferred from QoS. A Quota is the division of capacity according to the QoS settings. + +Bursting is important for high utilization. With QoS it is clear what should happend with bursting. With quotas there must be some assumptions about priority. + + +## Unresolved questions + +The exact way to communicate that queries are rejected has not been specified yet. Clients should be able to recognize that they are getting rejected due to overloading of the server. + +The work required in the scheduler to allow for fair usage of resources and to stop queries that are using too many resources is unknown. Overall work will be limited for this proposal and improving resource scheduling will continue as an independent long-term project. + From 7c8c2215a8b8f77e5ad3546ac7f6784098ea64f0 Mon Sep 17 00:00:00 2001 From: Greg Weber Date: Thu, 24 Sep 2020 15:34:22 -0500 Subject: [PATCH 2/3] add diagrams --- media/QoS-capacity-slice.png | Bin 0 -> 15075 bytes media/qos-architecture.png | Bin 0 -> 16669 bytes text/2020-09-24-QoS.md | 4 ++++ 3 files changed, 4 insertions(+) create mode 100644 media/QoS-capacity-slice.png create mode 100644 media/qos-architecture.png diff --git a/media/QoS-capacity-slice.png b/media/QoS-capacity-slice.png new file mode 100644 index 0000000000000000000000000000000000000000..ee4345864824d88d7131177197300910473588f8 GIT binary patch literal 15075 zcmd73cR1W#w>KUUC5RG4h%QkhL>Go&M2m=s5;Y>C69i+lNQhB_AVeFCAbRvRL-ZC# z7j3jK`e269=N%HDTTxRzEe%D@m?X~y*to2zF{`ir`6)HBW z3l}b2(b5Dyxp095LHtXjBqz4$q5{V+T)0=D1yp$on%qqHiz+tu&CvKoMnURs<9S(@ zBbcm<ot@_Q1BLEdn3#PM4cHRolF7e9{$8c|cu@b$dVYrE8N2eLV5w41I(0OArq8`~ z!EufyvZd>=Gt&4eqQbO3kT9S5ANZs@jZN;PD1$-C>ogzK*rAp5l}J%}OnvD1_}P9; zv`6pQjBcitd77uGJR08awsUpXJpseHR9)PV%7GcLqigYNT}Yc^L3%wyD{k%dv;#0E zY4~>xdKl$9@CgN$PBE(=1UgpwqjhrV;JKYBq~Ks4uyN5r0}IDeN$4EhjYoLa!hHRM z2g)|mrbetE;mDKVBN}a}@4L7#J#}sTc&Olf)wn3t8P@n@hK_~TNPT@I^k7<=8@xBI!=?`-#45F?rXFvn za)|}bMl9O)A)4xHJD!=H2H$qUF6T!&&5liDPuktYWc|)f4b#Y+*q#2go<0ge<)nMM ziM@E>xkP5>6Z73!O~fdujDA`st)Ujl%Bb$`SdEUgQIatwXe1u0&!S<;oAP=qAeE0lJStMiW$|4b(9^k=PR91wZsc zz_}}Zx1Qfs&myedLR*w`qmZAe-5|3=BZuvwul>F53X|as}R~ja90G(XK7&?du`!Vts;~t%-OZpHM|C&*pBCN z4%vu^mOFcXA6K#l5#}7~H@7FPyzx8n?;PpC)YtDJz-C_h3J}x`H=)XLUu-Q4>S@I7# z6zDswVPe<{M};mWH~oSmLn9aa`~KJwOxw$MOdzA`W~ZatS2-K$Ga{>_@tf)O^l6f8 zg9<@BX<{ZPL6bkEk(MAAmPDtt>e^Qdp>vg?<33t^5Cr-g#wV~dRl%pX&iDMF_+|YK zR+G_6!2-?p#Qw3+Ww5YcGqs=&pD*TrJfM}CANMIqZz>OoZ!KPopV5=7`!{Ecb$n|; z;94WWZ@q2wCJ+dec}D^GokhqSMNDLzOa!qoFiZ$4?XH$qt}mfP!Lz4aJ`oUzGZSS7 zOn{d+0t$gZ&Og9b!XXerFz^4qDFXWc?Il6*`2hbcw!7c=tknFTUvER{Rv^B3ir~7x zkYssR8O@h*)}s|vT@y4nIZ<)PAGN`t!71&x(#j=&e29fg`?asn${cs#3Eq6e3sXU- z@e>sp$1M}1I>YjA;^3T*|MQ1+_b&Kh(2e ze6aYlWzNIR@e#j-;Kd^O7bL;913rVRQEOG&Nm}AKyjF=clw{ z!5I2xT`kgu;}*0P`{FY#oYGFaTi+x3y%9z@{6-Ua{p23V+oc)l3*lrAGs7J0Jc!_& zn}Kfhq!2BQB4I)ZgGznZ8y*B5=a>;ZpeyTl6oXcc#ZqSp6|)D}Ej%dDA6XF%7*C*~ zmTLHz9`^dS0-byRo{{gk^1({p^2LzXO9&$dKuXZg9GIE3WajvaZiGLf40*D;a$LQ7 zRE0SiJpq|9IHZb)Em#{4_MUzis#~7e4|MbIqBW~+#Z6p@)`9*BU`28ODF}=*eGA+l(ea3{RLJ`K?k#L)Yt9;7(VNU zl26Quy81UIx1luuo`8i3qn#zdJ}gI-3BZTG0I_6--&KLV&K6fOr@sX!qq-J72u#94 z-Ot4OsSB$UAF2GTWBR3rzdvx`QliD9WMrd9(&Z+XF^e^4dT0J#wQ^Hd?cjhzyp|91 zlvLf+%eu{-j(tgB)tOQ+YLw*tcLfI>}62>F-;{vdh^NZ`?Pt(bgRr(4~aP0&_2 z){Yz19kUZMB_#mpa{7w?TzWXehzj@NDX#R?$Ndgzp^dryPFCvst;ezYnbj{I1RJFO zTyPor1g0cw%851IH)r}P#Nlk%+@r+aoncKi`yn%>i`4qy! ztt$axCcx+PX{MPxOlhqFxk^@wS|w`DDR#wur9v0{48)$?kJ+Q7=AjhhyeQ#^!^_tTX|$gi;w12{pYyq&D%#(3 zZZAU53s_E}SHn4>@-II3sD2#IhsoXwYF}puS6lk;?}kEPJ`nX%7pRW4%Df38m`N%P zDGyk?KyhR`N7T9`#}4NVDyzsTEh-UuRDM-N^ML7a2>^ail&N2a5ebTt%(Sf z$?!8~fMR3Tfyr1>wZS0YM=)l3-))bkO2%vph#R8Ow!JjZr$le8IJZeWOX9K)qKu1I z!eIhV>1ojJd9MkJqR<|6NUSs{Bl&wKM`hQBO$V3a?w}r-#1x*o67VS^cvB4 z4{mfR<}5R7`PmVxlcoChwzgk#nrEF+iumHK8r@vQabe3%qGubcbJ=3#+~r1++9lP^ z$9TXHi_#3AQ6+?w^%#l%UUE2g+B0HhNq$P{IfJ_1;Ch6~g9~H%q?FfrhiEY|8lzzQ zRlF*24bo)WdLSu}V7>ws8K#_E?z>kY(W1s^h7xO;+9$QG6r5cO^xfJIkoZA1f2-AQ zEz^GT)&uC_dyQ)|G>*Z4ccN8kaPN%{XfqVoM%NCtFzM1|+K3O6*)Q%q|u~4y@ zq0{~j1XGDo^nkxAuea$bO-DKA#87o_+bFs=#}7LHOjSkYuqcXJO!R z>d(UkCD2Vpz97k_JEX51Nv@N?N+awRYd}0gX$%(vk#|GrkippP72a)!=&zZ-S!?;B zJeY9G8Usn*cWFAoKM}qkCQs_kV6r3r#!2pK_qQL0D?=49-T2M>iJK9YTk0qDd2J*M zp_!wH?+k!35i}0Kl-C(-7)qy0&Jip6-KJweQP>KHgy8FrHV8&8FM@{5fa6M+6ggFO z7DJrW-8-qN<4*kxC`!$c=;L`nC1bzAg@tsUAFL2H>*;85V5JbHFSJNUtKv4)?5tEI zU;ut2EAwqGw`(XM+fuS7cDX%2epqC>A>f1UX~whb&+X-97eq*PP&zM z!|ivuKl`Tc7&xA$QoVRDSR}!{j`(Si@xN~xG&q0uzjUHHJcgr>-YJ_ha2DS=A|_+pG$taO^qSK%Y3!vXw^D8dW;G8vPl!(+?DwC0 zk48S2KRvOkd(%(E2Co7)(gc)j_lXE_>qjRc$@R2%;(gG*b%ntrfY*Kzb%tz^1`z54 z$L)#;SiGnD!56LQxL+WPIQ@0Eq*WTysoPTuUsvNg>taM15^oNUNR%ZNr#Y@y45~aH z=6WyjlpLx(mOq=!A0Do@S(bN?(9)$cIt5`Qk@C zOM`~tCi=LSF+j0*8Y=gURu&v~07X;lFc~_vssJiP=h-wPMFus);UY39pprF8H6D8* z#%V>dq`4|Fz0)TR8!i^;i|Nl$Mygj>iAX#VMkhK@7EBET>dx%?D{~M^sUp$T61+AL^8!>9m|Nv<&9zG`pcy5 zE@lPp-v|o4edZL%jha1(ru#`KwcHq~UAT@4)XhtYhI0jc+217I7E{*{l&CMizG~ee z<3^JZjFBNN5Y>G2M|bBg9coP7@?s~I6_qm7t>{9`in~&%3!H(<9wpSIn?bMI0g#Ih z!=B@Q0wBn2{?!~1egz1zVNoJ1*jUl z8g0dz&b$atOIBof?RHTbvSIh+q%ndaW9R#~*x3H?!nVt+ZP<&a@#ZxVC}DSXwoi!= z0SCVwojb}Zb8&8knGu3uiDx)^394+Sr%|yR>3V%ojL=g*GI26PcqXN_Xzh+>N69Re z6+B{%dCs<@)m}ZO$tBB@B6K_1X`g(Hn-k=B_|W=iS#>^LvznBz9_sx%w#bE1q)8oK zEvb6fK_bVu(uWAj@b!f#1^n)0uJ)yGE|$7EunvtBw(!Dg>9-NO%(;dO5pOvC0(VE4 z^h}8ETe3p(;_Z0iUEC3xsvWZcCJCw-6xPViTuhU~J#QlpFiHXOb#tos^5Gv10dl@Wa}$6oN(*@v4N zsa>CQx!#C(*Bs!!VfZ4>m=|oYP!PeHZPIxZABM6ZYLNpAo3oW-=i?t18>W36UK^#4 z5IhI(Oy5A7QURMUyY9VF>jYNQ?f6~&VB--#e6 zTv9N^Z8sKlnj@NoU`ZFV#tE7^Fd{fJcW+`d?ghp;(OhmMhTl`BpU4=p=QN}wu9Gn= zN zP=Zgf0o8%BZYKkx%}^v3QESQ%c6c}I3J)g+la}~nng^Kb_*l_KpxoLs-Jf3#gNs0#VcWZU(jTV~8Bzn8*l4NPp~ zz@oZWZF0P(fa{7f{aMMe>pKmo=b6!WIZ~r$%bgQMP)rVkU`GbP!grsUr?f*4qq-Pa zAjY0o2A8k z`AVNtouQwKP+?Q^-0%1i|MhkD{r-Sj@{1$1Wic!IKbGs{SO)D=4Ok^Innjtm>o`?A z{Kl4_Mf2Ten)5}XhU}lRdFWpHoI6og7oDv1Nm@PZ%Mf8=T_ER$g!c^1o`5SvVIkPA zlp1W1l^>9D;@VZgTfAK$A9ldY5y$VTJCzeqP(?8 z8p54)?~uCdWh`0Vuisake0acRHsq|p9s4XmKVmif*`kTMH zKjijR$0WUSPPx4^G#;R==&5Ls#7*?seP*wjwiy(|ym+J~e;{UbJ0Qh~!2%d9l{XuY zgv^)oqc)lDKg~1`B*LT#pMevgiaABV#x>A29J5Ocr{7dCLq*%3D|pyu&bB;4=LolQ zJoWKoV7EcxqfPq1sf{4XT@hiXyOe;@r{t9A|+&oyQOFlsDH{h!UV=fL5bvCo@D3r~$^POC%mFrVn)-OVG1;6zjdnV^~QDH&btfm z@B()p0rnCm@=AhgU;+`)F8*K4p-Q304tXEz!vyBd;!QHHUsRQR?!p;jcGl~KU37`( z=HweUp;X1Qr*I?vj^dY9TRq)5JLQs^Ha!FIO=svsA1=gl@eF#|)ITG4s-jIy!>$%fr?+t_(%#kbVwYhhB0>Ak7qXFn-tly6E zwt|wUvJzfVJ4Mo$VoQer4LDy%_SX&C6-%*sPs6*GriqC-Jf(q)b`<`gGEd1GReJ5N zTF8N@r8OW)KgWgfb`!dWaPk{vaZ zRl2W2hUci}_k}5Ds6onUE=VU$XZW`;OvB0?p3CFm3M};rYHYQM@_B$22zY}CtetG? z!VUntrxng9fb?RL8J#6NEQHN?tR1+&A|vZOBjgdgDU2R!s@^bPUgNp9N-!ltniRP) zKIqJsm8E)6*v<)kd6`}Rg+$v*%z&TI7V%&qzVu@=LWRxitu|<4Ni2o#waV|IgkLun zVgyF8oKwqvmXbHK`5m->ve(`njD*bRxNi#a3D*TKv+@?T=`Jtsr1_H{QLuhfBUrfY zu|_W%3jbwjSSgi$CZ=HDj0c3XwJo^#-9tyAHl%3}%qlcN`%5gr9Y!$t1NF}>^pbRgN{mKd?Jd1%8yQ39tcdPgkZ%=+ zT1?n;=1XtKbp6lB8!6>K_T=cYekPDL7M^@^ z(;{QGvC;71gv)z>Zke`{0}kDk%}X9fzwKc^fm`>e;u)eD9<&9ITG(&3G4{Sq`r)t0 z-;AE5q&fB6v>&qdT%hkQtMe6A^g+M55;{Qf*GAQtLOd3QRi z%;s`*lT&NPcS_Ig471pzgsY=I2Fm|&DNSfLz+yxLLZ$LJL(VUaLB_$7YO+%ZZf_7? zuT4n#`yKH23!+<#0boS-JpVfXm7%)sweg>68TEUuD%TTxZzBEsUn0H37Jhgr6+yl` z502}hB{-(eOrfEUsoeiHo*`*YjwRnV(Lz7V0?w8BN#9iTf+>q#%J+X#a>wn63FLQXYR>pys| z6>j)dTJcCe-d>Va?LTq{#3W(j++{3q$R9@$8tonQdid9Wzg{l^I zj`vS;nYJ;_&w-{`G6Xc&cCCI*Y8pZJwmL$i~>ASRwy#rL(G+D1+W_{cXl^Z?rOYM>WcMm1R@pu^@sK_a=@DzP}Wrk z5t-$CTYJXjXvi+~R`9tXlS4GG{)AHm5F;Ajx_97eZw97%nECdW=WkCaMevfR==b}s6qTQ*@d_uG@r98YN0UcVzFKRPyP zh~|=mY0K22X9I-^Ccu8Va+}Cu(V8buN~m%c-KgPpP6&qZb3UJm2;2QmWwook5TVOa zsp8|u3l{3iW!i9Ac*u;PxjuI?cKT?3Os1;(25qiY&NbJYO03q}*1#-din5nFF=eyt zH#Y(=vnBzv@2{H#*N+Rdml40H1U-> zdj#{YSBY#m6i6~CXgjI*)K}-+R5ca(_Uj6tiljLI)HQ33Up@R+8ckDIa$=L8e6K-2 zVn*OTSL!AkzA+enfD`1NYE!j0FmR)85mM_}MdtU}xCHfEvmk^5svMY23~?SY)w~AO zVW1eI7M46gv0{$xw_2ogo;%==7A)4d6M33@FL0%V-#%bQlN>FX?D=hCaIf>Q zar`tybkH_|e^3fNZZW}(Xln9XKgu`Le%WwjYc^2xLPhl}`B2=Y2s~uP-CwZT$t9h? zH0#kcn?A!sDh(ylCbiet!|xHCsv}$FiqfS-Bc_FwRj=w9yaLGj7L1$g64+akmN|xE zkT!g?f;1qdtyZi6qPzEZmA|XVGbScQu5sh-mxc{JMgW|?n1ZbCro%EiX!qzb*Iv0D zx0nIX!`|s~iQR#4)JQct>Us9IJ)p_<#V7t=TpmudS)L!fBc=H`B`5b~!3ZeP-9%js z{fKq`^I(xyp`m)rt1nqNy{4_l(1O#LWFxh}qR1;Wx(3N{XL}s>Mlkt!cw@=sHji3h zj5Z_s(L{o@%`FM;+W~#VxL$&Oiq#2*JWLhW4{(7c4XTZ_O!fOuCJx_JR~fDrpksc_ z`r$^I<_#}oHgDGYO4HrJUV3wAu-`Uez)`sB0zav{?iodh1gB0khtXhU=R@=AY?r%& zDG$5&qSp zDXMr%uFG-D52N#z&B!8B{H6$T&6f>tKk2GO-9!(WYw_lVJ3hn>we{d%M&z4K;g38E zI>BF#alAA_D^BepJ=z~E?0_!JwZBun>+gq%z*tJ*XkGr8TXdp~h^O8wQ-y_vMG}x( z#W!3+!|l2#>)tmWMQtD~^j&cq!hO4=C61wzi|nSD+ZS2I#EUr+`VbNkRJKOUY-}@u z-;#KA8YIHsez7$rxg5`*=?&zNMpv>fGN|UVR}vTY5!Dk%G?OCJJN81)>_p)bGZ+m(|;tZqv0GUy_#Z_+S z>-Iws+`@GSjiZJ3@!YFX(QkepKKqdjD*SwP=p#IiO20%`y?{*-&svBZp&A)k*4VYP za3L2fGhXGf!jYSi8NbxWRofVYoPXCqE%aG8CYXI2jO2#v2YayzdQX3??HTwwsct5_-y8Lzj=8bKA4ZL$;KM_jn~N6;x)$W#F&xYYgBbg@fG8Uik(MM@pYgglQ=`oFON}E zR~;L~Kgo>SJ-!;>VVqten_v}O>2bMi(Gw0~Wa&fQXn28|d8}l+SD-twi@qO$`n85F zx^$le5zi1mVHTmaRD>$KArgr9UhnvoSAx6@u`o)qw)eNl${4-pT`xh+M_<8vWhZ5+ z{_X1Zb>X5KJ}?Aept7c|xOZqGWGCSJR$I-;pJG8?M3bJyAc@H=(aIB1k6Fl7QBPPx zC5px*`470FZVPmllzR+ZPtZ&KerGwh&tP9~!qPzx7^5m%aY)jfDHB)ce%rGmo>^)Q zHq;>RP#1wJ_fqtWM={7c+V5d0-8BLLW5&wB0&lyv3qoZ5l)u-3+ z(lt?o8y`*X>bW+Mr=-(z>@|jzM z3Z`4C-Yff}T|;_lnJ5Czkb(Y8fuP3>xhI@4h1a5LI}mo+&lf>Q6lR01 z3R;Y8=#O>3CP=sySYnE)T74&MzAI5PzlT%cU5`a8!mN&`CYd6|OR~c~nHPWV-;`2_ z3HRN%a1M>szNmX0!iMR$lJGV2<&KFGz1rWPHzRH180ccWrDD#|a{G3lG#LMTBA@c6 zl_$qS0@mq_KYjfyPFlGN+3Clq2O0?%P^xi%L>86Xt7zlI_?r(JiC;8TQ*d_Ez;zKO z>P3*6iX)X#GdpRDH;}ATJ+Ost>Sz9G2lk8McG-qmx$P3iQ&NMZrne>|e0Q}MKE2_$ z^xM}$W!@5{hT#_)lDG_f3vBIw)Xkfk$KxE#?0A3OrnTCHA}3O!{jciDtTqgnZ+?y7 z+}@?s3BcSk7b+EAlZJi<>n>M>rU&T8jK30DXR$Zmi9RQAf)#IBed6_|AE8_6sJ@>wPt^4~X?TPfL z-*&rt!aJ;dZ|$+P3LH$nL+WA-Nh|{Iv2mMS-+`t8)HT`(#$xFQR4v})bDKmisxNhd z@paFIcWlNE*NVxEsl>K)9^^x>SgMovd?jx?)~_(N3qlKbB!kkP{A$y@+IhF04V^FK z0UwhI4}-iSPkqj9ZZkR<8s5OvAKW5K<|R^3$xf!K@Zt`cc66Rwk`V-KNdr=Vv4}N& z%xeqS`V4Y9{unMm6+PpKimE{ZCL_4;C7)-eQxAUDo%D6UH=(KBk}fVCSw&?XIub(#S*yb1?7|e}~ zg>Lwb$+YfHPcZ|a`7A7t5!^${3N|sih-Wn`);>(mfU7^=Y~HzvKh>d=`i(dXw` zVpY}G{nVONZMU@*-CbJfvm0oaL*uL{O%{W;vSRVL#8qu#He)DYxYuvOE*|%gHO5oI z$QrN>>!kMJ4An|da($ySMUZ_e7u%gL`dpj|%-SDRt{&6&HUCx5+oVbMPj>5@SH7U1 zz53Bc!uzAa0BhFyp{nq@K&bK5S9sBc_(EE4QCaRl*&y`kg<2zW?e^BZIrgb4cwgyt zo*nkm24>_}^*r?d>@{j9r3H6mb#v8@TJ&UlUCO2fSF$KutW3uf55RMitezoEa} z(#y*?P;HB`#XLvhd4@wThErs_F;Wh>T%~sF`4{lJ+PV544JABgHzAh&>oW>{#|*2L zE4E<9&wQrB1qd$t!AQF#O1)xPmnuNP`qVYrkq1M@Di>{5MkF9oy!e-qmQ*RW&)1k-=OxXCfi6cGBTF|eQE}>ofb*%uLe334x?5{b#%Xv!-sOsWSVbUbjZ*?P#m^0 zk-%S;UbH>@{Hg_FC*Am!EOhO0pl*g6Ic<>v?uiO}y$|yn=lAbEEqWy#PG(fxfF-HY zC@ju1b!tvNYV>zuqe{+BhZzemd}BYi7Tf zXHrp*+-PaS22Ej#iyIu)h+pP)#HwK%J92M-eXXiEsT2~$ZpIIG$BgrU9VpxwDOB!P z@_y)6ba;qxmD0H92;BBoLjxzUy)b- zDZOf36zpKMp_}l?pNJiCzSY&UGBr6Z)BC4O)2E91e(9vo)_!D}l2_ZXedrl2E6)Ru zdEI;JXIb*8@*&&4@XqF4XpWMsYLfo8))c7-7%2Qehs0o!5^Z6!ITKgavW3D`5&hY{KNul9#z=nrhOb?|bk0~j{<`pi z0mJ>b_!o6wL7G(q#qmQ1sf8%V6!EFb@cX%|lB5Y;8i-qc5CYAY71n9t!_NoHmeWnk zV!I?z12e%IQeWOi7lPwZ9-Fu?x#^&`xf8oQ?T9U_lPP^UVD2o;{JDBVt5>&* z?nuvXMtrqg+m!!oO6sQx^+p6?7(czgo_j2Sti-67j-SN6yD_QCO?K;3Gr>zb%81)+ zuvbv}YAghnIakB) zQ1#kT*8IJ;>XsRR8g+Mb0Ztt%g4F(?ryEskuA%Fr&>L&JMn-m(`C5vZq+8(~^|20C zch#e~!09AywcNKOruJ0ILo9qipxg(Q*JmSv?LwlA4Bzgt-Sv&G6`}?$-EPO2d;2lT zPs;u_vCcT-7Y5R$Bf)(65F6Qq>@*g$wa+zAEe`>}@uS==i${P*7U;s#nYCDSzcDIq~w-?UbOlNg(HRs1SJnXgBtSU$U!#zUeD6kc~> zO;O~yTkM-v9J?D;B_?JIYH&2J&+9PCo@5-H5*m2~sWtSG`(bj(0jqa@A@`v%-S4p< zPq|S(VD|=<(3bJXSQ?ewwc=0Rd-?O*4Ynw@S!};~26i05zxt@=zE=IVQq}#OtHJhr ziP?M>xw&-Xzb9QzphfDhygTw+kSf`Z;ca zkomr~Nd&*cFFvem1a!l%=JHB82&^XhSd@4OTE??{I7%`2@F3BRr2G@9+8}Y~zY?%C}Rcvd47& z{+Rm9O>cFA#SEBg@Wm9TVhw7Gm4Odl0eUs@HhIyjp?m}JWvN3hrb=%r{g-@n%`;gA zqw`6BvZxT*Z*Em^!h5ul78$VcKG>W~bnc&W-D#}@y=N8b!GQfTs*jKU;9a}4%0$S& z&#l_V|EjO^5_w=)xji=NWH?B@;oZFa-JDv#_y&SVeYi(-mHc96Y}|o=kU4CZwE2E{ zy!Szm0yCicO;6f=@A}N-1g9b8Z-0=$4X=zR6qEYAlCGGC|Joo{`gVp5lo{0h5gT`D zRfvRyU3tS$2(4k!qeI^A4cY|a0?$dnpj^Utj)$!mSpK9D3>#il-a^~S`d+$hZS6?n zxqT45xV|!M^En9wX3!_E5>FXcI_K+ue{TCN)Z~Q-^#%=$zHUVb0JRqn3_Y#7C z-U|Ozgz)dm0RFSO;h!oL{-ccHUrYFFzx`*8LzEzq7yp;4{SV6~{$5UCeH8h3YyH1c zPQa>9EGzhzs=+KA{znxG=M>p_>4OmQ07-7evHZQrO&wkq*)kl#UcBLa5Ru2!iwykfzeR^xj*fgx*DZM+k)8 zd*~sAkR5!U|Ml*7zx%^E=QpA;hgFzGc37^6eFgTlPIB-X~99IWmbq zqd8*Xl2_zqx?>dcI0IzCTvA9QR2f*vch2^V_Uk#z4cY;^NS3=@fjX0QuOv6`Cho@E ztK<`1FHLb-Fdyba91d-XWI%ldTNr z_*q?oUW9nd8if*nGk7*I@yRpc)!>+@h-H$V3 z=Tnb)!WQNy?(;NW$V;AlDJ#8Ys^XEqi(dOY|M~&j+UKQ$M5}!_rKzLyu-><^3l%LW z-FWVHT?qFX13-i01?r3s7Hqs*g$5_yZ;CZg-?FD50JenKYA6EdUB1wJw(IJ$L27t@}ni#ojldU+hI8YR4csST+d*=2N9TCSkt`(!3okbgk49 zXP8z5rZT-|A)TOt?as4?80{89U_10CY)nKBR(KmUs;MA?4F2lBcLO2CC;I!B=(+u$ znZ}b5a%iPkaG47M-nch9%h_r`IroSaMg|Qa1WM?`6Fd4^vOGD_ERX+=SIX3Da$-x# z$W}csdGK$_`bglEBxM%L-nnpf zHU8S8?W@RLT&Hv)k6BhLwmDPq z1S8R(F7dR&KIWQ6WV{f4=#KaKbP+eyECQhXqqhkNc!Q+qZ)5aQMPGs`7fcrqs>!?f zO^S}p^o7k6dlTa@w$;)QV@Ny&H|*lc_pZj%sqz-NHDyS=?MI8h7V-bVEdGRW$7({H zC@YW$Um98+KNl^(8)IkX9vZfIm<9UAgEI!52hgS%g`_-sNUtq%vDUaU7y8+p{{ey7f_!`p zt-KhGsOIx)Gk6S*8|1SuSdRlgO*a!A@Ryk!B{J^VowT>l+I&E%YR1gWUEdpnAKWf0vRMiV{2zEU8B$PSHNIIiKRfXe)Fm|K2$ zd?XPsQ(v~%9@mPGGv*RK7`{Otwp*$`x7J(B6_2~M>Z>=$@PkZCc~J>fl~6%;I7%1& zUVP)EzeNatraOY+HmFDi-n%bjkmvg%B#JkSiLuFkBRv&!7|0=L zc21&*5BE-YYVk<4XR;gG!QT2hUf$_oDG-z!lv|3wNPj*^=>VjGW5b8pPAfbZXR+`=TnHNWwZlK4pN~npuoxH93|3d7HabfJj`C-~D|TWCs9}-RobrJPp_AaUY?G zk%%&9Q&p863bY2WzZIG}gi1RXCd@3vP(lY<2ZXs!?&CKZVa0wBkC91%ZH>k?OvTYZ z=cwf&%I(s6?pNLE9-_9IZy!bzqh76a4!tDsJLaDfnCfuNCWNvR#NXc-F76=&>X(_> zg~Li;6P8AU?$GuLuB9!M8fr3})AOzQS<{cillp1H#@?NosU$ww$eh)b>Xi3x;pKlr zv=(FTFFiHfOfY|YbC%o|&S{0Sv}dp*UTU(@^Pq5&2M`)Xj~5CMO+?s1+Q%<6sdoV0Yp(AzNbZm%bdJjd*5O~MwZ3Cc2V>J>FTQzK= zEJ%T&nRu^Da?a_A$kS?NCQa7wwS;(Zp-bss7oxxl17c(nIm% zo$)CQiX}>yWY`u-VI9#y5_OYg>n|1va9h!kg8xr{TA6r_7|D~iu#U9{FStJ6SCw*B zf2M_S*r5Lv3>@FGk;h{{;PdPreOt&s<}?;;>;sAq)0bT_Ycj@YG4j^hZ7Bex6D zC_AG)0T15O(QGXeMa|#OvWa%(Hrgh~Yj*m+aF0R{-jz>Nzs}P1OZ*5QO5!Uqo7$wh zUKsRG$uuTi8SmEDSeIGS1bc83#v+6m?e>mj3aCm;c`H47eSh@GB;`FD(FBAB*|$WG zCJ<8rh+-D%7J7*&^EPJr|9>4>VUnamJul_jU&>#(VTj-ePd5;RAA@-Mjn8eR!y#4STx_gC0Xj z{2y^p*&~f$jITk!O)v(+y*9frwGi_1tze%If;Y@>$v>fLg5gwUrbHsnjmu6Sq9&do zPi#4xZAE>SAD!XC@TCkS19=^2>j44D5D?(|(Q&^v$dUPVw&wLBJ8F;mdp<;yvD5AS zVHC%0Gn?8Ucv^;uWocU>&`zR~!!f)DhL1;RAZ|0aZOb42Go&6QXERGIfubN5=b2jZ zs!ZiNGLApu0^nvb5>jKG>|{@&ve;XtUcdg3EzEB1tJD4z$=3@%%yh`neaoijHkh2X zHtLakOvfZ9N%u_WfclUcb03~`;<^REqIkLlQqAZotdElnxG3wN-G+A;U}kJU%0mC% z3mGviG2l17NS4d4u`5TLYZ<_)C~cqni7zMG{=Ai2+n!!z$}XTV+jOE@=sxV$YW>rI zvgYr8?54An;kU4tmmiURC~gY!#7`fGUS_R63amZ<=%W`nz+R_cp@Rb>o_^dEX5eZ` z%Y;_lE1yk`p=nB)L_lqV*CvesRoVcka@zw9mC=R(FexQ^QjH;FONco%E3${dNC?$|U;*P0I$fC=WnyNdgf|TJsd|FVlsq z=tlW=ge#8{9USBQfBw*4wt1w%pjn}DR`*8wOIQ7~$J^|N=#yza#-Pm~9MZh<<4)_809aXKTaxWJ!h4E69$b*F zup$~CaZ$r6Ia3OX>a~y?ND!;5=H#-F>+;)sZyBKaln#G#A}73j{)4V-PjF25?&v&y z4HuUMzy4$RCjH+2D8FS8sBk8Cvn(pi;CtRQyQROXFuU-eC{bD;*|dg;LFi5#1*`v= z2d*t$m$%r`(xuWyh2waN*V9#!V)vV=W@36R;zdq}`)DT(vTJ39KD=D7C6c`GiM(}U z6J06yw4|;^%t04ZRT*6KP;zF?AinamSF=LX)bGYJB!{>~<#&SuyOUuJ-5%OG-7}k- zN~ZBQ`=ipQo8Rc0OLwDsIj!6)!(Qkj!$9w7jQ1kw2X%FGXX*+a_@koxZY7n1>|8BB znCBKIgMK{ZR*@C;Y|d*PCDkv%>4Nk9q;#XUJ`-KJ@RUbgrJ;GU3dL+xZ(r@3sltr> z(kKeI&H!K$sOP87D3CeEl7G5Q4O7LRWUW>%qpGQUGDBSR%d!$L<-*#s@}^zrJlSbA zJJ;PESC1#Ly~+Z(@mdgtCFMam=ptb`Bh7H-psQA_Qq|4=o6(;9 z*hXx1`fR5CCLg&T2ZE!vVTDsj$I4Foij1RHyw_rF;^B5id@r%-<6Cu|F8m=-dV6Gx zfc=UAiSRwi8WT#xl1LVwk0hvbYd zR5SVQ#l&H3!kQlDwF(nKLLY;K$M!PTs08cxm_q~YRpP*odZ8OeIFRd48~)~c3{;S^ z(F4OnD__9RXE5P{HrG!RwO{dO)?Zc*VkXyLx|SSApfKTuU@%G<>9u=Sr7`y$@rY!u zC1(z`i=3J}Js+9hO^eh(-D=kr#+_={BQE(7{+WB~%*=J6AHNPipXyf3SC7~vk5G+c zCLXXAp=th;C%-1s@PIOn3*x@z0Z0kT*^p+$<6BMr0uu1Yn)~>(p+=t2%DxXIi%R=m ziH-EQZCg1FlBa-tb*(4P53zqJpcm z?ZYnehi8-UWt;Yg?!%ys)t#*UP%qmi>SiH<1cycw(EItoXkXkoe(u!jQolmcmbQ;}s`ZH` zs1Coquyw^Si3ruhBp*ZJkE(qTUePL%Ubb$GlK>9=T|lbL;c4zey}_7(rTv7wDemEoSx)TAj?Y6zlP^x zQC7YtUW|t*6DS@sY+uQ4f$XzO%2tZ~Q~JR?vu9vjxd4+64VC)S<`B)BfiGW?` zwRZp1%o(%t{u%RIdV;aD?ELjt<9K#X+{^Nug@L11nHg{Ka76-6J6W{iIHDiBA$+~5li z9lVze)i}LHxk?Q8DF0}NhO*a2CY@}E^LW>mgph)(9Q%AXk{rG{gR~%!}ML!x@7k{yTGGxtV#J7RmE_4_= z^7#n%dVMHVk@d6RXhpJ|2{IGlx!TuI)+E$x6teYn>cDtQNcmQylf)p#=WSa&MX;ug z@Xb-XjF{y-19I?Z`pBy&l9fSC>=wOOi;$ArAc~o(fW9&6 zWQS2^wtn873?oDN73To2v-c$NX`1`HpST|FCQI-Pn|^>8Q49906gsdLi)7DEk))*R zi4F#oAw=EkkJb~ZuyZct&b4`h09_Br+{4)&--?@P$fO7VYxZ8zu%56Y zRERbiQ9Pu6XzJiZB`f7M&*c|nD!!m&U`wSKs^<@Kqf(FMS`EpWrSgHk3Kw8k)Agvj znpUDxc8DR^F-ATcIsklz9j`Vt|Aw|thE!pJDJ1iPDmzsaVDiNJ0sJ*6yIW1E*O$FU zvNqRTySj7opJ7fL3w!a0gZpSR$uE`dkmzy^D8_u};I@{rq$?VVU|YUrM2 z2ULK)03Zm`#0&-;uYUnL1)yo+U$sw&`4aUT{hW@|3&*m(SXL%u%X zX7a2ks<90(PiV~@+zK(%7-l|6&bjg_%Y&I#+u%Xx%i?t!z|cqVb=oudi9@td@w6)I zpa+@7qto#3TodYMeoxgdh;D%ICeUxRr=37w`BXtc;TNq<%`gv>L5|z=5C0&JT=q0E zEwwY$r-My7{+gZe;y=>j8g}2R8AD3j!!_1h@X_Z2bW~sV?vl(zEI#rgfPHjh?26q9 zyBMCt*Ff~vrRyddOWvYiG~RYuJyCSCH;Oh!J{%IJxrU+7h@e6FgkCj-uzkhDUC@V` zPxPS9=q{3+7Tb|sp(G9>gLU{0uk-_jX(o@P(#i0$JwyD&tl7 zY3G-IQ3aI>^!f0dnQFT)c_x+a?}~2d%qYG~8FL8bx_`C9F8&3 z)5eoPiJfFeLUNWB)}W2wbBz;L;MBI6A*H^%PxY+qgpOuS8`=BPy5Uw)2EsD1=M`1* zP*vSHJmC(gu;fFc&y7=Lrv=);blQk3 zmV6Tuf8i=l=^PpW<0mm|&4BhFJ6boctdkrV?~sk3vAdmc&F`(V`>^DmQc~uc*qUoI znB{GpviE)folZ!nP(tBNM`OF&L@Dk2YI@+XA-^&Pd1%qCg_DT6yQgK9_l6WL$>~X< zl??(0!FvIY=^mOFjWu1+)x&)852<=yr*%CaY-)BhxQU?0kM8R=B5E5e7nX$jke-DOE zEyJ!-3{Lb0L$|%=t@Ky)n_Hz*DuyR(J2~2_{}hRlwF&RNlXw^EIGroD_q;-P>t}Q? zr+MJ$nS212Y-p^|S64AcsO1k33lfD`YCF|wh#{6{Wb(ZsYcSsIUyGd59YhxP9{swK zV*L7RVZNpp4kQAluCXNm%gT25$?Y9_Z8m2zdz{^>btPcUEjOwWZbbBQ{m zV=~fQAhk|X)ind&q1fLRt}Z{#ZZ1vM9mXA9!pwcXe|Q9CruSsX?qjBIbQ8_*>}l>Z z7EC`kxz!-p;qeFfsPw_kotWNY^XjXOTg&zGT>X|h{DTe>jybnXTL8^Ku2`L38%+cq z3+et_*ZyJ)0CXO0y)*h%@Q#x;VdbP_LT@psDs4)C50YXP8KX1fZgu0~y(jjXu5lp2 zH$Uz;1T+~WfgHog>@2Cq3$$j-KnXe{28PU<6|ub*l}u|)BOPUv1AO+QK>CI!b zwF0NO1wZa)FEO%4^(NNMlWy&?l)Pi@T67uf_VBOq1iY_SwS!{=4|HfAQ!MBeT;tRepL4^`tt<*e&-Stwb>$}NtAhc!5e<75O)oX z6)LT1BrDfiYA73BEJcaK()Yp&9sW=+RIM35SoJUScWvFkyQlWzQQq)VLFw(*{Y^*S z>l{1r!e^_Yuh4!e0ZM z^J@GahZCJgS49)gZ)$CRq6@3O6W(req7YKTdWdmFc+E%=?+kLJgRA;$%Et# z*sk@&uaB3LVecA^{-}O9YtcC@U%z1$Unz_0;dg#sXlJ(Q65A6z!JSHN*~bP$_=8I}8c z3>PRxx}OZ)lYcyO9dOo`G;;0IIgFp0i>b#1nY7%=;PWUx`ats?i(xhDU%`bHXrvpl zZhiNZKFB?rHh+rt!8i$uw`fQ;m|}QpPVDP z7(!stUHn2^)FmtfHSKU=PtfteWG2s>R^AgA9_~Gc_^QbE~RIjX=YooTrp+%@=%?`;4#@U4uE|#Su>vgM%-c40Z z{Awb@7$3EgM){B%<{k0o6ketapBB_2Yad&nP0&%A9_cHT`WWM?w=@ z%A)keUdi)dl%YE`@wv)H;mjLxNvWdK`y8)x=6&9KuWBkBMr!!p*FeJ#dlmQlTOEDw zdRA8Eli$|hf$1$q7 zSo(5DeWgaE{OfR%??)q#`lgoG*k|{C6n=?jDO!d!BY=^=1PavSL!MF;r@NoeyD$Ix zCWN@x)wzzbm5;}n1+I&2XT6f>m2dZcceTQi%y=@ipM8*YwOwE_>o;m>>)|9mt`=Ni zyI4DTQ|2{s<)OUa??lBbn&|om&%zMCi_sVUd)eyG!ZN(w12}z)tQwXpLsuGGQG3a! z!}GN+jcr1|s&QdM_nCnq_x6s+t_1{c(1=!Fu?Q@ZJZ7RKF&$3LbG6#NJiRga4P1$> zndvzDwqYD$JuIO>)uR0tV}*$Q)E#(;Q^5DT0aHUSoV6}5_d|%NiZbcvJQXBRr8f@u zo6&Y8aJrWBuJbGL)?6}^;`-<4iPuRgG{s=uBkNvbjtotc~@NSXKp6~m&5^JT0 zl5LoRnKsw`RQ+#yS$)REHJg3s$K{qdIY-+Ay>FNGjL*{nPbf(1tdRgD!UT1?*qPw3 zS-hF&d4qL0**X2fI#2cIueG&A6HZ4(f=OiU+{0l}hqJ{_R}dPlJ*wi8^V7T=C%3yx&pO1dcv*Z-1))Eb)*_8M(G#2;92upU1ylw; zoB88LD)Vnu+v|QnsMv8eX>O7pSsB5ql#_kf+ z8Lb3Gey!+mc+5(FS;?Jh^*zlG(Pv}-*6f)6};9F1tIhS&!uCwR_&jF{U(>!v#xzghd~T`1BUg%d+|*n ztu03aqFa*KlYm42#0Da`LB|=CuJyJ$jhg{}bRzgB{H`jNro)q!0M^@~En-*Ld2cXLHo%t1D zzN@}OsReY}G2)`#=ZihJ8M)#|eUI0d-3+Ps-F&RfjMa~%v6gPeuo%)8YFP%Qp-nsGsumEaK6yfDu6f=g28Noc|gUDhAuY>Y=?@EoT*0cr1 z&?bnN5h^ipef8TnuVv5r3<0(LKbowxi+6*1TC!9?D%PzvNk-c zzoAPAodTnV&T+@?S9M!)ZvQ>;h=*ARKecsW#x0*;;zL$N5vYE*fTw7PZ6Ae6ZIMQ*D7!c4UoYHzFyoZw8E zE3$vm(%>-mYdls1V$arG0X|J$oz;Ed?b;jE49A>YHUuqjm#QW=yO<5vr=v&LQ%|r* zJW|oC1h!~6AsnyGb*tdHYNqKtJr!@%iNC@p(c z7oSXzY_}B1#QR9R*ZS({@j2Z@q_p?LU%{UG9RbK+B-l2xr9X_KT& zp0=irCT(3-{K{u_vDrDm=are&b;$`@39|m z@AlBUU5c+D!=#ilql<}w7t$U>W#GGofY}gXuqf(Ql-Tc#$(_MHbV$g`C|mmN1;@d$ z`4tWl%2_XG#35kHl)M`6UY3c3R10?S!NP}MRvur8f#n=GSz~X_ms2X7F-w|;goL~G zx|Vm+66JOy3+U1CRa#QISm6aZ*m71?xWuA+dDYsvd2AD-{g9byjUBU3%Q($qoi}}p zDe%>1!obPWhx9m~f{Yll1xiu;(A!Wch~F2=m*m}*CZs@_-`2#3KSEwNP(!!Y$+nosz?hL$AaHQY&E1LjrddvbSkcY|_7D?(Z2fG_FAmYbK4)r6{ z0Ky(X@Yx7%(x-;w6DPZy*+PUg0x_Le^<(?2dm^zObL%(O$9I!9a&KKcu_pi-U~QRT z1`|vs3?Kuk+`WvzU!HA!qN?sQmv2s(B%oS8(7-T{$TMd><9itwSDy|!5O#@rZ1KC( zR;RYKpy`i3VGK#m`#^I-Prr_@hQ}W$INj|66GVSE=@O8S2rm8r)V5Re2yfBu_($_( zCK0hQKp#0|Su|-jeq@MXV#mniP1*C_tcYFs?$2!-xCH7~Xbh=7A%f#|P@gY}Y89cg z@r51llLHsbxrm@o0m|>FV9OL21T`-hMu;{8_5Jy6(NTQ39kQ?G@DI<^E!+tO$Z*#G zA#|UfHT<5$yK=Z}eSBcqTGBVK8{>=VW&YZvQ1>HZOOw@4HKr+N+mSNN@nk?oUi5&q`yu*E$lSJ{cI3L3RAfbVkfzRQefQW=f0-M3Eb@JYfcQ4j2O2IXE zA#28)*6MZ|BM6))#?ZmNROtQKh1ZVdp}P{~wTaQO-Zvyo#MPSkaBGkro(<^@fL%al zDT4nd9-YO(goyg{L$dyj;z9)2-T=1X|j}(3orv!iMLPHAjf8mqnK5>(G*)X?TNJ|4` z`e0*4oHu7Ud53klhYRedEd{$PwK4C<=BOUPJuis-Rpa<(Ki+Ey-|9z8`2CG{9V?IQ zGzoE+3TqfXD=|NwygWIheecivnrrTjBw-j-tq$OJIk{;ZJpx;sS^ z>^bh+pZv1VczVg8F5i)#qC8;AvWM+#N$OH!(V+9n1S>KmxANuqfNqM~6O&+aCU%=S z8YP{qFer5HL6<&0YL-P8 zHujIP2i}AqrF!J&U3b7QMh;UV#T$qekC8A%6{&G)P+o7`KkCwM2E7zdE(B`ZB>6VjrlK*00SOR;;pkF+%;7C^v zMZvqce;6q^WQ}9gUUQ^z&=v=@i?U1`eO25fe3w_}Z%idUJvibql(vbrGxRwwJDeq_ zzIEYaf+N-82r)ROFbL9_0-t#>*wZpDN*Dj>}T)HYTC;KcH{X{t+X` zdXv((rO!7bSE};gMB?u9=F4~PeYXNPS;3Jc(jKf*I9m6-h6~)bfUWx>NXhP+TJ2{o zeo$jqdBW4UTwFW(GM%qPJ2lP8HOJtOd5)op{VSFHyKPgX1;whDbHaE(w7U5XFeD2?g-$KgyQjM ziVwtc(rLgJR|kZHw-r{4SP|8#f8pEu3_d2Kw3!$j(4OFohT}F+;xLr)WrG?eYj-qT z>QvyA;hw({L%Q3`)<4cHMcRAbAh}*QVWSpN9VV(%!ZV>v`uGcNgmd<`2I&rom0|J!67Bht=5bmHd6W<*uO z`F*R<2iBr@KV`@31+clR))gR_ugy9Gn@UYlW#R|!$rGc4UX>4Iz~dPj4pkFYMtqH% zf=^g>LO$MLy7#_I#nMmfCyxGTx^>)jkL7VD@iTir?IH-#?AdePt>|)!DE`RC`!FXT z_Z>g6LFc@e_+cmt-s2DVZSTAfxveAcotux;wWL z^|2OFH|!T%-q`&xr>3sZ7dRcZI5JUvEkJ;djp~XC^q@c z%uTH;2iq+;Lb#CpAb4tk&+O_brP*n<^&<9chO*L*vGf+KmoCMUi_k-;reVWSz{`(| zQjb#?ZQ`CwQvLXFf%s0+%K+}OManl)Z{wSB&XOsfjL>fNA5q%Na<=8hGW|O*lWXg! z!?U}W1=!TW^o%h26v;71!Qs-_g$Lc%X_ED&d_*=iJTf{apHVy-=TOpk=O`p-f4fhTYns!s?e`E=Sk|N5~J;-LkOJ=5p>xmSzA&L z$p7oYvP&?b$M&Yy%RTpghI97;6GFW7r%ZmoeX9Pw)nkL3Ht(6v>sHUD!*e^k@<_Hf zsWd#T`zis@&uJ`5T1IV?41zVw!9s`i0@86e4Q%b~Lc6P>${rJAW7*#9nP+F?%j4Y< zq2e3fFFbTu4EN+?SZv?5>v9^|R=ih~Q*ZJ$ujq)YEK5fe%v6b8I~#96wR*{GtuZ{u z_9Yu=4$GGJ`H+&c_xRTy6&#sUywt4!70n-GRher?2#$jo8eGQBGW@7lb&qCmLHfv zp<&*qkYBV~^Au}%Qsztn-ISeCHTXxO_9qxVd;aHD@p;B^7XH`Sr;%fNeZuSF(7mPE zU!~vrKZg&5-dz0A4KpEma9hBZxVy4W0&&W3^nP;FzD<>Oz;@v3a_v$Igh~@s3L(1x zqLsTzCdc0#_OW|JVbEu~ST^^;5E1va69}fP<6l-byAoMQZ0sKAZQS%(A76tZOeS75 z3KiDfpW1V^$SPw4S$pyH0*O{XWXCg`eDrI;rYpwtY=|FyJV(N|h0gr-c1`41x09DY zvp{pvYsm0{N7lyczYeNmP7vM`2w^5Q{w-wf!nw*OHRYI$>tUYJH|s~iZWPe7Bwn5c zHGri34VjcboQAhk5KW zCw#6p6>veBW@vY)g5tYL-7cnIc8ml+m>?jG{?(Ymhi=QVrr5KaGqQ(In@kzf2c+$n zJ~UZ0UOj_3NgI}w?GudDv;95a+jVInqyz**)xMfd%H#y-Cxj|5s3R$(9;%`u*lftD z0CofqmzG8XSaLy&RYG=2rui_t4evywPdY;mH@{RKJBH7~)oRNS6m)QpF;f-#+rjI` zb;8Hr4}1QC2tuH38~ra^<7kw4O5z&x)cOSz);*0$@(JX-jeq0i|66UhM1MT9a|Fs*YJ9nR2 zsC-TEUCx4LNBz@+QVcG|Ep=RaZOwnvYiH@T;p5z+IEVkW8T?Igj?wGNxPt#%Wn95^ zWqvsvF1!vv;QXR2|8(eYikoX(dd*PArGMAe-}K+g*I@F0R>o1F|Jz6Zd*%O<5YjB4 zF+tmk<0*f_fucUjzwH(W|GpeWFSCDNe*4B=&sXuf<61(Wm8Rkk;qzxu_pOZz-T&lP zA1-y@8$QEfyw@lu+}AMWKO?xeYTfn)EhZlSW=wFKH0q%k|0zBD7*F|ku-DqURuC@v zJN)Z!aXJ4<{wnHPk^iHb4hi&uxA!M$ZsUe$*U$ch4cB=~#%oFtUx)W3c=Ts|`c9Ir zgHv~n zG=)<-LA$6W0aU+Gu?B7oFfmxZTC()UCHYE%TiOWk!{e-J2BoIy>-Td*h-;92jrZ&J zs>lC6V4Q0|E6hnjdwjiChE_n^B&Ui*M~JHg-Egz1+1)3P#$7*nPgCi!q9x7-`VX5z zpTF^?6E}O`w7X5`>_^mdKtD}9Aa&NuDw;KX{e+N*@W$5l^RYEU0w$9t5cJ-~sa$bt zie06%vsx<89)0cJZ0xMckCt!foCdiz$(BUXxW!1FTGL(*j z`jC!SsY?wfpx3}lA+d6>7`2o(vG7T9+@+pKZC_6c%S8v84@71)kg_&BP#B2ot%w)8qbS=(=4M*XTK$} z(^q@I*YbwUGy=W=ZkS40t&)a>I4_MZ1))rZDSH+J5`(%o3ml?I-r3I6$9v958Z3ZX zya6o4{j!iQIEy61{A>Xwy%YgAe8|V~rbg95&&Kr&(wDFp6G89YOC}H=7&gL)ea1uK zeTD7tJw_MfarjYSGg3*^@2k%vU)8fgfdm}n(8(7^%mnU(zE?Oi7LouRy8Dr5S9WzC z%RzPPL(H@|7kDL`G>=mcw5x{th5hWJ6xKQL-g0_M@Gue(7L8#1WewTp>EeaE?~)CP zJ)mbmulJM4<`93!b#k20@v4SdT6s~f$Aak{c|a8a>l*|P>mncpv=|`a?%E_pv z-jSR$ON~4&;5p-6L7S!u-WwLq?8Of&hwe=m$Y1o?$ ze4erVx}@#eW~u$SnM^t74o+#$2#6=uHz>d2e{h%6ZkiO?WKhq0L-eNWO^?{4)K0#e zmCFM2@d#VEOm?-K`VC|-rdRvIkeps{vFd6O_x^59=F5iqS8)jvl+A`ZL1xegQ3m9D zL@qTskeuz_w}aX{teO7zT8_zR_8R(v%7orslHy}S-ka@&4AvtLJaIcbA^bkfV9hQ^ zvuI-Qcv)asAj>M?2^BX4qS@C{pxmWq38Ie-|0?7@oyiixEJ*gq=bgT%en4iOaReT5 zTfhlugbKNY%v)ZHNoWrX4tTtSw8VFpes%K%1kk1i`QKb53DnOiIBuRdU$Owxd8p3A z+>Sz;`bk=x%=)o$!Rj24HLNAl*x{ZNZpHjz1r4U;E)y2F-H7a`{DaTR=>t$<4IKd1 z^wECBBa!a7!E;>Dg3ECL`!X35IP$fIx)uf`5^Ls%){UBwH6^#1mp;N@mAAOUqeL!+$ zDOR02i=7ZiKILrBSsKGa!39g;;?w{n3(0&_PPX?G8QdWR`LUKrQ=L~zQi!Fdd_Igg zZk~*d&rgfB`&C8JIb$edYu%jQA*v!NjUILEp%pcXIIyYb!JcL-hdu`k@kMbwuXkhI zi+qxJeTwFNzcDgm+UEQV&hQ5L#=NW54M!5XK8AE&pA}P@O?+^#IekX1aBn_A@1Ja) zz`T&1HpcztmCKrb`Cow|d!+yX literal 0 HcmV?d00001 diff --git a/text/2020-09-24-QoS.md b/text/2020-09-24-QoS.md index 44360e06..05308460 100644 --- a/text/2020-09-24-QoS.md +++ b/text/2020-09-24-QoS.md @@ -16,6 +16,10 @@ This solution provide QoS at the level of the TiKV node. QoS is configured both * Analytics queries can request a low QoS * Apply local back pressure on a TiKV node by rejecting queries using too much capacity +![QoS Architecture](../media/qos-architecture.png) + +![QoS Capacity Slicing](../media/media/QoS-capacity-slice.png) + ## Terminology From f39e11a26232d9e1bae9ce30b1b617c4327556eb Mon Sep 17 00:00:00 2001 From: Greg Weber Date: Fri, 9 Oct 2020 11:59:48 -0500 Subject: [PATCH 3/3] update with links to academic paper --- text/2020-09-24-QoS.md | 33 +++++++++++++++++++-------------- 1 file changed, 19 insertions(+), 14 deletions(-) diff --git a/text/2020-09-24-QoS.md b/text/2020-09-24-QoS.md index 05308460..36687b4e 100644 --- a/text/2020-09-24-QoS.md +++ b/text/2020-09-24-QoS.md @@ -34,16 +34,16 @@ This solution provide QoS at the level of the TiKV node. QoS is configured both Ti Components are loosely coupled: * PD stores policies and communicates them to TiKV - * TiKV performs query admission - * TiDB can have its own QoS policies by just sending a header + * TiKV performs query admission, providing localized back pressure + * TiDB can create its own QoS policies for its users/tables just by sending a header -Iterative. We can produce a useful first version without: +Iterative. We can try to produce a useful first version without: * Bursting - * Fairness with a PD placement policy + * Global Fairness with adjusted weighting and a PD placement policy * Back Pressure fairness with detailed resource usage measurements -This is designed to be a minimal step towards supporting QoS sensitive workloads such as multi-tenant. Future work will be needed to create an improved scheduler and probably to provide a more global perspective. +This is designed to be a minimal step towards supporting QoS sensitive workloads such as multi-tenant. Future work will be needed to create an improved scheduler and to improve global fairness. ### TiKV Back Pressure @@ -114,6 +114,20 @@ The QoS share of a region group is specified by the QoS Policy in PD for that gr Queries of groups that are being inhibited then are prioritized according to their QoS request value (if these values are sent). +## Global Fairness + +### Dynamic QoS adjustment by PD + +Instead of altering physical data placement for fairness, PD can dynamically adjust the QoS value of regions. +PD can tell a TiKV node that the QoS for a hot region is larger to make up for cold regions not utilizing capacity on another node. +When this can be used we can think of this as achieving the same effect as placement without having to move the data. +This approach is explained and measured in [this paper](https://www.usenix.org/system/files/conference/osdi12/osdi12-final-215.pdf) as adjusting local per-tenant weight (QoS). +Data placement happens on a long term time scale whereas dynamic adjustment can happen every few seconds. + +### Adjusted Follower Read + +The proposal assumes a single leader. +When using follower read, we should take it into account with our QoS weighting similar to outlined in [this paper](https://www.usenix.org/system/files/conference/osdi12/osdi12-final-215.pdf) as replica selection. ### PD Placement for multi-tenancy @@ -132,15 +146,6 @@ Leaders of regions in the same group are placed on the same node. For a small us See this [PD Github Issue](https://github.com/tikv/pd/issues/2950). -### Dynamic QoS adjustment by PD - -Instead of altering physical data placement, PD can dynamically adjust the QoS value of regions. PD can tell a TiKV node that the QoS for a hot region is larger to make up for cold regions not utilizing capacity on another node. This approach requires hot regions to be balanced evenly across nodes. - -This creates a more active role for PD in which it needs to both -* understand what regions are hot and cold (it already does this to some extent) -* update TiKV nodes about QoS settings more frequently - - ## Drawbacks