-
Notifications
You must be signed in to change notification settings - Fork 2
/
P0901.bs
1225 lines (987 loc) · 51.2 KB
/
P0901.bs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<pre class='metadata'>
Title: Size feedback in operator new
Status: D
Shortname: D0901
Group: WG21
Revision: 11
Editor: Andrew Hunter, [email protected]
Editor: Chris Kennelly, Google, [email protected]
Editor: Thomas Köppe, Google DeepMind, [email protected]
Date: 2023-06-16
Audience: CWG, EWG, LWG
Abstract: Provide access to actual malloc buffer sizes for users.
URL: http://wg21.link/D0901R11
Markup Shorthands: markdown yes
Default Highlight: C++
</pre>
# Motivation # {#mot}
*Throughout this document, "malloc" refers to the* **implementation**
*of* `::operator new` *both as fairly standard practice for implementers, and to
make clear the distinction between the interface and the implementation.*
Everyone's favorite dynamic data structure, `std::vector`, allocates memory with
code that looks something like this (with many details, like `Allocator`,
templating for non `char`, and exception safety, elided):
<xmp>
void vector::reserve(size_t new_cap) {
if (capacity_ >= new_cap) return;
const size_t bytes = new_cap;
void *newp = ::operator new(new_cap);
memcpy(newp, ptr_, capacity_);
ptr_ = newp;
capacity_ = bytes;
}
</xmp>
Consider the sequence of calls:
<xmp>
std::vector<char> v;
v.reserve(37);
// ...
v.reserve(38);
</xmp>
All reasonable implementations of malloc round sizes, both for alignment
requirements and improved performance. It is extremely unlikely that malloc
provided us exactly 37 bytes. We do not need to invoke the allocator
here...except that we don't know that for sure, and to use the 38th byte would
be undefined behavior. We would like that 38th byte to be usable without a
roundtrip through the allocator.
This paper proposes an API making it safe to use that
byte, and explores many of the design choices (not all of which are obvious
without implementation experience.)
## nallocx: not as awesome as it looks## {#nallocx}
The simplest way to help here is to provide an informative API answering the
question "If I ask for N bytes, how many do I actually get?" [[jemalloc]] and
[[TCMalloc]] call this `nallocx`. We can then use that hint as a smarter
parameter for operator new:
<xmp>
void vector::reserve(size_t new_cap) {
if (capacity_ >= new_cap) return;
const size_t bytes = nallocx(new_cap, 0);
void *newp = ::operator new(bytes);
memcpy(newp, ptr_, capacity_);
ptr_ = newp;
capacity_ = bytes;
}
</xmp>
This is a good start, and does in fact work to allow vector and friends to use
the true extent of returned objects. But there are three significant problems
with this approach.
### nallocx must give a conservative answer ### {#whatval}
While many allocators have a deterministic map from requested size to allocated
size, it is by no means guaranteed that all do. Presumably they can make a
reasonably good guess, but if two calls to `::operator new(37)` might return 64
and 128 bytes, we'd definitely rather know the right answer, not a conservative
approximation.
### nallocx duplicates work ### {#speed}
Allocation is often a crucial limit on performance. Most allocators compute
the returned size of an object as part of fulfilling that allocation...but if
we make a second call to `nallocx`, we duplicate all that communication, and
also the overhead of the function call.
### nallocx hides information from malloc ### {#feedback}
The biggest problem (for the authors) is that `nallocx` discards information
malloc finds valuable (the user's intended allocation size.) That is: in our
running example, malloc normally knows that the user wants 37 bytes (then 38),
but with `nallocx`, we will only ever be told that they want 40 (or 48, or
whatever `nallocx(37)` returns.)
Google's malloc implementation ([[TCMalloc]]) rounds requests to one of a small
(<100) number of *sizeclasses*: we maintain local caches of appropriately sized
objects, and cannot do this for every possible size of object. Originally,
these sizeclasses were just reasonably evenly spaced among the range they
cover. Since then, we have used extensive telemetry on allocator use in the
wild to tune these choices. In particular, as we know (approximately) how many
objects of any given size are requested, we can solve a fairly simple
optimization problem to minimize the total internal fragmentation for any
choice of N sizeclasses.
Widespread use of `nallocx` breaks this. By the time TCMalloc's telemetry sees
a request that was hinted by nallocx, to the best of our knowledge the user
*wants* exactly as many bytes as we currently provide them. If a huge number
of callers wanted 40 bytes but were currently getting 48, we'd lose the ability
to know that and optimize for it.
Note that we can't take the same telemetry from `nallocx` calls: we have no
idea how many times the resulting hint will be used (we might not allocate at
all, or we might cache the result and make a million allocations guided by it.)
We would also lose important information in the stack traces we collect from
allocation sites.
Optimization guided by malloc telemetry has been one of our most effective
tools in improving allocator performance. It is important that we fix this
issue *without* losing the ground truth of what a caller of `::operator new`
wants.
These three issues explain why we don't believe `nallocx` is a sufficient
solution here.
## after allocation is too late ## {#afteralloc}
Another obvious suggestion is to add a way to inspect the size of an object
returned by `::operator new`. Most mallocs provide a way to do this; [[jemalloc]]
calls it `sallocx`. Vector would look like:
<xmp>
void vector::reserve(size_t new_cap) {
if (capacity_ >= new_cap) return;
void *newp = ::operator new(new_cap);
const size_t bytes = sallocx(newp);
memcpy(newp, ptr_, capacity_);
ptr_ = newp;
capacity_ = bytes;
}
</xmp>
This is worse than nallocx. It fixes the non-constant size problem, and avoids
a feedback loop, but the performance issue is worse (this is the major issue
*fixed* by [[SizedDelete]]!), and what's worse, the above code invokes UB as
soon as we touch byte `new_cap+1`. We could in principle change the standard,
but this would be an implementation nightmare.
## realloc's day has passed ## {#realloc}
We should also quickly examine why the classic C API `realloc` is insufficient.
<xmp>
void vector::reserve(size_t new_cap) {
if (capacity_ >= new_cap) return;
ptr_ = realloc(ptr_, new_cap);
capacity_ = new_cap;
}
</xmp>
In principle a realloc from 37 to 38 bytes wouldn't carry the full cost of
allocation. But it's dramatically more expensive than making no call at all.
What's more, there are a number of more complicated dynamic data structures that
store variable-sized chunks of data but are never actually resized. These data
structures still deserve the right to use all the memory they're paying for.
Furthermore, `realloc`'s original purpose was not to allow the use of more bytes
the caller already had, but to (hopefully) extend an allocation in place to
adjacent free space. In a classic malloc implementation this would actually be
possible...but most modern allocators use variants of slab allocation. Even if
the 65th byte in a 64-byte allocation isn't in use, they cannot be combined into
a single object; it's almost certainly required to be used for the next 64-byte
allocation. In the modern world, `realloc` serves little purpose.
# Proposal # {#proposal}
## Allocation functions
We propose adding new overloads of `::operator new` (so-called "size-returning allocation
functions") that directly inform the user of the size available to them. C++
makes `::operator new` replaceable (15.5.4.6), allowing a program to provide
its own version different from the standard library's implementation.
We note immediately that we are deliberately not providing this facility as an unrelated
free function, nor leave it as an implementation detail of `std::allocator`; see [[#alt-allocfn]]
for a detailed discussion.
The new overloads are selected by a tag argument of type `std::return_size_t`,
usually provided as the value `std::return_size`. (This is analogous to
`std::nothrow_t`/`std::nothrow` and to `std::align_val_t`.)
The new overloads return a new type, `std::sized_allocation_t`, which
stores a void pointer that corresponds to the pointer returned by the
existing allocation functions, as well as the new size feedback
information.
## *New-expression*s ## {#new}
In the previous revision, R10, we had proposed to make *new-expression*s be able to use the new
allocation functions, so that a placement *new-expression* of the form `new (std;:return_size)
T` would have a novel return type that includes the size feedback. However, during EWG discussion
it became clear that that is not a useful facility; see [[#alt-newexpr]] for details.
Consequently, we are not proposing to expose size-returning allocation functions to
*new-expression*s. We propose that a hypothetical (placement) *new-expression* that would find a
size-returning allocation function during lookup is ill-formed. (No such expressions can
currently exist.)
# Proposed Wording # {#prop}
We propose wording, relative to [[N4917]]:
* Amend [basic.stc.dynamic.general], paragraph 2:
<blockquote>
The library provides default definitions for the global allocation and deallocation functions.
Some global allocation and deallocation functions are replaceable ([new.delete]).
A C++ program shall provide at most one definition of a replaceable allocation or deallocation function.
Any such function definition replaces the default version provided in the library ([replacement.functions]).
The following allocation and deallocation functions ([support.dynamic]) are implicitly declared
in global scope in each translation unit of a program.
<pre>
[[nodiscard]] void* operator new(std::size_t);
[[nodiscard]] void* operator new(std::size_t, std::align_val_t);
<ins>[[nodiscard]] std::sized_allocation_t operator new(std::size_t, std::return_size_t);</ins>
<ins>[[nodiscard]] std::sized_allocation_t operator new(std::size_t, std::align_val_t,
std::return_size_t);</ins>
void operator delete(void*) noexcept;
void operator delete(void*, std::size_t) noexcept;
void operator delete(void*, std::align_val_t) noexcept;
void operator delete(void*, std::size_t, std::align_val_t) noexcept;
[[nodiscard]] void* operator new[](std::size_t);
[[nodiscard]] void* operator new[](std::size_t, std::align_val_t);
<ins>[[nodiscard]] std::sized_allocation_t operator new[](std::size_t, std::return_size_t);</ins>
<ins>[[nodiscard]] std::sized_allocation_t operator new[](std::size_t, std::align_val_t,
std::return_size_t);</ins>
void operator delete[](void*) noexcept;
void operator delete[](void*, std::size_t) noexcept;
void operator delete[](void*, std::align_val_t) noexcept;
void operator delete[](void*, std::size_t, std::align_val_t) noexcept;
</pre>
These implicit declarations introduce only the function names `operator new`, `operator new[]`, `operator delete`, and `operator delete[]`.
[Note 2: The implicit declarations do not introduce the names `std`, `std::size_t`, `std::align_val_t`, <ins>`std::return_size_t`,</ins> or any other names that the library uses to declare these names.
Thus, a *new-expression*, *delete-expression*, or function call that refers to one of these functions without importing or including the header `<new>` ([new.syn]) or importing a C++ library module ([std.modules]) is well-formed.
However, referring to `std` or `std::size_t` or `std::align_val_t`<ins> or `std::return_size_t`</ins> is ill-formed unless a standard library declaration ([...]) of that name precedes ([basic.lookup.general]) the use of that name.
— end note]
Allocation and/or deallocation functions may also be declared and defined for any class ([class.free]).
</blockquote>
* Amend [basic.stc.dynamic.allocation], paragraph 1:
<blockquote>
An allocation function that is not a class member function shall belong to the global scope and not have a name with internal linkage.
The return type shall be <ins>`std::sized_allocation_t` ([new.syn]) if the allocation function is a size-returning allocation function and </ins>`void*`<ins> otherwise</ins>.
The first parameter shall have type `std::size_t` ([support.types]).
The first parameter shall not have an associated default argument ([dcl.fct.default]).
The value of the first parameter is interpreted as the requested size of the allocation.
<ins>An allocation function is a *size-returning* allocation function if it has a second parameter of type `std::return_size_t`, or it has a second parameter of type `std::align_val_t` and a third parameter of type `std::return_size_t`.</ins>
An allocation function can be a function template.
Such a template shall declare its return type and first parameter as specified above (that is, template parameter types shall not be used in the return type and first parameter type).
Allocation function templates shall have two or more parameters<ins> as specified above</ins>.
</blockquote>
* Insert [basic.stc.dynamic.allocation], after paragraph 1:
<blockquote>
<ins>A size-returning allocation function returns a value with a member subobject that is a pointer suitable to hold the address of potentially allocated storage, and therefore all allocation functions are said to return a pointer value.</ins>
</blockquote>
* Amend [replacement.functions], paragraph 2:
<pre>
operator new(std::size_t)
operator new(std::size_t, std::align_val_t)
operator new(std::size_t, const std::nothrow_t&)
operator new(std::size_t, std::align_val_t, const std::nothrow_t&)
<ins>
operator new(std::size_t size, std::return_size_t)
operator new(std::size_t size, std::align_val_t al, std::return_size_t)
operator new(std::size_t size, std::return_size_t, std::nothrow_t)
operator new(std::size_t size, std::align_val_t al, std::return_size_t, std::nothrow_t)
</ins>
[...]
operator new[](std::size_t)
operator new[](std::size_t, std::align_val_t)
operator new[](std::size_t, const std::nothrow_t&)
operator new[](std::size_t, std::align_val_t, const std::nothrow_t&)
<ins>
operator new[](std::size_t size, std::return_size_t)
operator new[](std::size_t size, std::align_val_t al, std::return_size_t)
operator new[](std::size_t size, std::return_size_t, std::nothrow_t)
operator new[](std::size_t size, std::align_val_t al, std::return_size_t, std::nothrow_t)
</ins>
[...]
</pre>
* Amend header `<new>` synopsis in [new.syn]:
<pre>
namespace std {
class bad_alloc;
class bad_array_new_length;
struct destroying_delete_t {
explicit destroying_delete_t() = default;
};
inline constexpr destroying_delete_t destroying_delete{};
<ins>
// global operator new control
struct return_size_t {
explicit return_size_t() = default;
};
inline constexpr return_size_t return_size{};
struct sized_allocation_t {
void *ptr;
size_t bytes;
};
</ins>
enum class align_val_t : size_t {};
[...]
}
[[nodiscard]] void* operator new(std::size_t size);
[[nodiscard]] void* operator new(std::size_t size, std::align_val_t alignment);
[[nodiscard]] void* operator new(std::size_t size, const std::nothrow_t&) noexcept;
[[nodiscard]] void* operator new(std::size_t size, std::align_val_t alignment,
const std::nothrow_t&) noexcept;
<ins>
[[nodiscard]] std::sized_allocation_t operator new(
std::size_t size, std::return_size_t);
[[nodiscard]] std::sized_allocation_t operator new(
std::size_t size, std::align_val_t alignment, std::return_size_t);
[[nodiscard]] std::sized_allocation_t operator new(
std::size_t size, std::return_size_t, std::nothrow_t) noexcept;
[[nodiscard]] std::sized_allocation_t operator new(
std::size_t size, std::align_val_t alignment, std::return_size_t,
std::nothrow_t) noexcept;
</ins>
[...]
[[nodiscard]] void* operator new[](std::size_t size);
[[nodiscard]] void* operator new[](std::size_t size, std::align_val_t alignment);
[[nodiscard]] void* operator new[](std::size_t size, const std::nothrow_t&) noexcept;
[[nodiscard]] void* operator new[](std::size_t size, std::align_val_t alignment,
const std::nothrow_t&) noexcept;
<ins>
[[nodiscard]] std::sized_allocation_t operator new[](
std::size_t size, std::return_size_t);
[[nodiscard]] std::sized_allocation_t operator new[](
std::size_t size, std::align_val_t alignment, std::return_size_t);
[[nodiscard]] std::sized_allocation_t operator new[](
std::size_t size, std::return_size_t, std::nothrow_t) noexcept;
[[nodiscard]] std::sized_allocation_t operator new[](
std::size_t size, std::align_val_t alignment, std::return_size_t,
std::nothrow_t) noexcept;
</ins>
[...]
[[nodiscard]] void* operator new (std::size_t size, void* ptr) noexcept;
[[nodiscard]] void* operator new[](std::size_t size, void* ptr) noexcept;
void operator delete (void* ptr, void*) noexcept;
void operator delete[](void* ptr, void*) noexcept;
</pre>
* Insert [new.delete.single], after paragraph 4:
<blockquote>
<pre>
<ins>
[[nodiscard]] std::sized_allocation_t ::operator new(
std::size_t size, std::return_size_t);
[[nodiscard]] std::sized_allocation_t ::operator new(
std::size_t size, std::align_val_t alignment, std::return_size_t);
</ins>
</pre>
* <ins>*Effects*: Same as above, except these are called by a placement version of
a *new-expression* when a C++ program prefers the size-returning
allocation function.</ins>
* <ins>*Replaceable*: A C++ program may define functions with either of these
function signatures, and thereby displace the default versions defined by
the C++ standard library.</ins>
* <ins>*Required behavior*: Return a `sized_allocation_t` whose `ptr` member
represents the address of a region of *N* bytes of suitably aligned storage
([basic.stc.dynamic]) for some <math>*N* >= *size*</math>, and whose `bytes` member is
*N*, or else throw a `bad_alloc` exception. This requirement is binding on
any replacement versions of these functions.</ins>
* <ins>*Default behavior*: Returns `std::sized_allocation_t{operator new(size), N}` and
`std::sized_allocation_t{operator new(size, alignment), N}` respectively. If a
user-provided operator new is invoked directly or indirectly, N is `size`.</ins>
</blockquote>
* Insert [new.delete.single], after paragraph 9:
<blockquote>
<pre>
<ins>
[[nodiscard]] std::sized_allocation_t ::operator new(
std::size_t size, std::return_size_t, std::nothrow_t) noexcept;
[[nodiscard]] std::sized_allocation_t ::operator new(
std::size_t size, std::align_val_t alignment, std::return_size_t, std::nothrow_t) noexcept;
</ins>
</pre>
* <ins>*Effects*: Same as above, except these are called by a placement version of
a *new-expression* when a C++ program prefers the size-returning
allocation function and a null pointer result as an error indication.</ins>
* <ins>*Replaceable*: A C++ program may define functions with either of these
function signatures, and thereby displace the default versions defined by
the C++ standard library.</ins>
* <ins>*Required behavior*: Return a `sized_allocation_t` whose `ptr` member represents the
address of a region of *N* bytes of suitably aligned storage
([basic.stc.dynamic]) for some <math>*N* >= *size*</math>, and whose `bytes` member is *N*,
or else return `std::sized_allocation_t{nullptr, 0}`. Each of these nothrow
versions of `operator new` returns a pointer obtained as if acquired from
the (possibly replaced) corresponding non-placement function. This
requirement is binding on any replacement versions of these functions.</ins>
* <ins>*Default behavior*: Returns `std::sized_allocation_t{operator new(size), N}` and
`std::sized_allocation_t{operator new(size, alignment), N}` respectively. If a
user-provided operator new is invoked directly or indirectly, N is `size`.
If the call to `operator new` throws, returns
`std::sized_allocation_t{nullptr, 0}`.</ins>
</blockquote>
* Amend [new.delete.single], paragraph 11:
<pre>
void operator delete(void* ptr) noexcept;
void operator delete(void* ptr, std::size_t size) noexcept;
void operator delete(void* ptr, std::align_val_t alignment) noexcept;
void operator delete(void* ptr, std::size_t size, std::align_val_t alignment) noexcept;
</pre>
[...]
* If the `alignment` parameter is not present, `ptr` shall have been
returned by an allocation function without an alignment parameter. If
present, the `alignment` argument shall equal the alignment argument passed
to the allocation function that returned `ptr`. If present, the `size` argument
shall equal the `size` argument passed to the allocation function that
returned `ptr`<ins>, if `ptr` was not allocated by a size-returning
allocation function. If present, the `size` argument shall satisfy <math>*bytes* >=
*size* >= *requested*</math> if `ptr` was allocated by a size-returning allocation
function, where `bytes` is the size returned in `std::sized_allocation_t`
and `requested` is the size argument passed to the allocation
function</ins>.
[...]
</blockquote>
* Insert [new.delete.array], after paragraph 4:
<blockquote>
<pre>
<ins>
[[nodiscard]] std::sized_allocation_t ::operator new[](
std::size_t size, std::return_size_t);
[[nodiscard]] std::sized_allocation_t ::operator new[](
std::size_t size, std::align_val_t alignment, std::return_size_t);
</ins>
</pre>
* <ins>*Effects*: Same as above, except these are called by a placement version of
a *new-expression* when a C++ program prefers the size-returning
allocation function.</ins>
* <ins>*Replaceable*: A C++ program may define functions with either of these
function signatures, and thereby displace the default versions defined by
the C++ standard library.</ins>
* <ins>*Required behavior*: Return a `sized_allocation_t` whose `ptr` member
represents the address of a region of *N* bytes of suitably aligned storage
([basic.stc.dynamic]) for some <math>*N* >= *size*</math>, and whose `bytes` member is
*N*, or else throw a `bad_alloc` exception. This requirement is binding on
any replacement versions of these functions.</ins>
* <ins>*Default behavior*: Returns `std::sized_allocation_t{operator new[](size), N}` and
`std::sized_allocation_t{operator new[](size, alignment), N}` respectively. If a
user-provided operator new is invoked directly or indirectly, N is `size`.</ins>
</blockquote>
* Insert [new.delete.array], after paragraph 8:
<blockquote>
<pre>
<ins>
[[nodiscard]] std::sized_allocation_t ::operator new[](
std::size_t size, std::return_size_t, std::nothrow_t) noexcept;
[[nodiscard]] std::sized_allocation_t ::operator new[](
std::size_t size, std::align_val_t alignment, std::return_size_t, std::nothrow_t) noexcept;
</ins>
</pre>
* <ins>*Effects*: Same as above, except these are called by a placement
version of a *new-expression* when a C++ program prefers the
size-returning allocation function and a null pointer result as an error
indication.</ins>
* <ins>*Replaceable*: A C++ program may define functions with either of these
function signatures, and thereby displace the default versions defined by
the C++ standard library.</ins>
* <ins>*Required behavior*: Return a `sized_allocation_t` whose `ptr`
member represents the address of a region of *N* bytes of suitably aligned
storage ([basic.stc.dynamic]) for some <math>*N* >= *size*</math>, and whose `bytes`
member is *N*, or else return `std::sized_allocation_t{nullptr, 0}`.
This requirement is binding on any replacement versions of these
functions.</ins>
* <ins>*Default behavior*: Returns `std::sized_allocation_t{operator
new[](size), N}` and `std::sized_allocation_t{operator new[](size,
alignment), N}` respectively. If a user-provided operator new is invoked
directly or indirectly, N is `size`. If the call to `operator new[]`
throws, returns `std::sized_allocation_t{nullptr, 0}`.</ins>
</blockquote>
* Amend [new.delete.array], paragraph 10:
<pre>
void operator delete[](void* ptr) noexcept;
void operator delete[](void* ptr, std::size_t size) noexcept;
void operator delete[](void* ptr, std::align_val_t alignment) noexcept;
void operator delete[](void* ptr, std::size_t size, std::align_val_t alignment) noexcept;
</pre>
[...]
* If the `alignment` parameter is not present, `ptr` shall have been
returned by an allocation function without an `alignment` parameter. If
present, the `alignment` argument shall equal the `alignment` argument
passed to the allocation function that returned `ptr`. If present, the
`size` argument shall equal the `size` argument passed to the allocation
function that returned `ptr`<ins>, if `ptr` was not allocated by a
size-returning allocation function. If present, the `size` argument
shall satisfy <math>*bytes* >= *size* >= *requested*</math> if `ptr` was allocated by a
size-returning allocation function, where `bytes` is the size returned in
`std::sized_allocation_t` and `requested` is the size argument passed
to the allocation function</ins>.
[...]
</blockquote>
* Amend Table 19 ("Feature-test macros") [cpp.predefined]
<blockquote>
<table>
<tr><th>Name</th><th>Value</th></tr>
<tr><td><ins>`__cpp_size_returning_new`</ins></td><td><ins>PLACEHOLDER DATE</ins></td></tr>
</table>
</blockquote>
# Implementation, Deployment and Performance Experience # {#impldepl}
## Implementability
The fundamental size feedback logic is a shipping part of TCMalloc ([[MallocExtension]]),
as a set of functions such as:
<pre>
tcmalloc::sized_ptr_t tcmalloc_size_returning_operator_new(std::size_t size);
</pre>
We implemented the new `::operator new` overload in Clang. This
presented no noteworthy complication or obstacle. An `::operator new`
whose return type is not `void*` is novel, but entirely implementable.
Building a few large applications from Google's codebase with the
modified version of Clang showed nothing unexpected and continued
working.
## Deployment
We modified the libc++ standard library implementation with minimal effort to
use that function. (The implementation has a convenient
<a href="https://github.com/llvm/llvm-project/blob/ed8dae9a438dd8b76a9760506fb7333ce7057d96/libcxx/include/__memory/allocate_at_least.h#L55">central point</a>
at which the underlying allocation function is called.)
We were able to build a number of significant production binaries of the Google
codebase without problems.
As above, building a few large applications from Google's codebase
succeeded without incident.
## Performance
At Google, we modified libc++'s `basic_string` to use TCMalloc's size-returning allocation
function. This resulted in savings on heap allocations that we will describe below. In summary,
the proposed feature allows vectors and strings to grow more efficiently. While it is hard to
spot these effects in real-world benchmarks, where they are within the noise, we performed
micro-benchmarks and traced allocations to show that string allocations are reduced by about
4%. For the application deployment in Google's production systems, we extrapolate fleet-wide
memory savings from strings of about 0.2%.
One source of inefficiencies is the interplay between malloc and the implementation of
`basic_string`. The libc++ implementation of `basic_string` always allocates memory in multiples
of 16 bytes. The glibc malloc (ptmalloc2) allocates blocks of memory of size
24 + <em>n</em> × 16. For example, a call `reserve(35)` would make
`basic_string` allocate a rounded-up value of 48, but malloc would allocate 56 bytes. With size
feedback, `basic_string` would ask for "at least 35 bytes" and receive 40 bytes from malloc.
With TCMalloc the situation is different, but savings are still possible: its size classes are
8, 16, 32, 48, 64, followed by 8-byte increments up to 128. In this case, size feedback would
allow `basic_string` to hit every size bucket instead of only every even one for strings of
length 64‒128.
TCMalloc's size feedback is used in Google's implementation of
<a href="https://github.com/abseil/abseil-cpp/blob/1feab4fff90f904518e66cf80971063486fbc984/absl/strings/cord.h#L19-L23">`absl::Cord`</a>
and of <a href="https://protobuf.dev/reference/cpp/api-docs/google.protobuf.repeated_field/">`google::protobuf::RepeatedField`</a>.
# Alternative Designs Considered # {#alternatives}
## Parameters and return types
Another signature we could use would be:
<xmp>
enum class return_size_t : std::size_t {};
void* ::operator new(std::size_t size, std::return_size_t&);
</xmp>
(and so on.) This is slightly simpler to read as a signature, but arguably
worse in usage:
<xmp>
std::tie(obj.ptr, obj.size) = ::operator new(37, std::return_size_t{});
// ...vs...
// Presumably the object implementation wants to contain a size_t,
// not a return_size_t.
std::return_size_t rs;
obj.ptr = ::operator new(37, rs);
obj.size = rs;
</xmp>
More importantly, this form is less efficient. In practice, underlying malloc
implementations provide actual definitions of `::operator new` symbols which
are called like any other function. Passing a reference parameter requires us
to actually return the size via memory.
* Linux ABIs support returning at least two scalar values in registers (even if
they're members of a trivially copyable struct) which can be dramatically
more efficient.
* The [[MicrosoftABI]] returns large types by pointer, but this is no worse
than making the reference parameter an inherent part of the API.
Whether we use a reference parameter or a second returned value, the
interpretation is the same.
## Why `operator new`? ## {#alt-allocfn}
Given the added complexity of the `operator new` overload set, one might ask why we want to
expose this facility as such an overload, rather than something else. Concretely, we could
provide a completely unrelated free function, or we could ask that the new behavior be exposed
via `std::allocator<T>::allocate_at_least`.
Those alternatives fall short, however, since the `operator new` allocation functions have a
special place in C++ that the alternatives do not have: They are *replaceable* and can be chosen
by the application owner:
* Compared to an unrelated free function: an unrelated free function would not be used by
existing code, in particular, code that may be outside the user's control. The replaceable
allocation function, by contrast, can affect and benefit existing code. Moreover, it would
require additional integration to make the free function interoperate with `operator delete`,
whereas we have an existing framework for matching `operator new` and `operator delete`.
Finally, `operator new` enjoys various special properties regarding implicit lifetimes
which would need to be added explicitly to an unrelated free function.
* Compared to `std::allocator`: The implementation of `std::allocator` is generally under the
control of the standard library vendor, not the user. The vendor is not going to select any
one particular malloc implementation in their `std::allocator`. In fact, the user may want
to use different malloc implementations in different projects. By contrast, a replaceable
allocation function opens the choice of malloc implementation to the user on a
program-by-program basis. (Note that `std::allocator` is specified to use `::operator new`,
and while no further details (such as choice of overload) are specified, it is reasonable to
assume that a size-returning `operator new` would be picked up by
`std::allocator::allocate_at_least` as a matter of quality-of-implementation.)
Finally, a suggestion that came up during the review of R10 was that we could aim at a *lower*
level and propose a new `malloc`-like facility to both C and C++. While this would undoubtedly
be useful, we believe that it does not replace `operator new`. That is because `operator new` is
generally how one "obtains memory in C++"; it enjoys special properties in the language,
e.g. regarding implicit lifetimes, and also with regards to elision or merging of allocations.
As a data point, the codebase at Google uses TCMalloc and its `operator new` interface, not its
`malloc` interface, which are entirely different. A significant performance gain comes from the
use of `operator new` and sized-`operator delete`, compared to the `malloc`/`free` interface.
(The vast majority of memory allocations in that codebase come from `operator new` and not from
`malloc`.) Exposing the size feedback as part of this established framework makes the performance
benefits available in ways that a less integrated facility would not easily be able to do.
## New expressions ## {#alt-newexpr}
We considered previously to have "size-returning *new-expression*s that expose the excess
allocation space resulting from the allocation that's done as part of the
*new-expression*. However, it is exceedingly awkward to make use of the excess allocation space,
since the lifetime of the object(s) created by the `new-expression` would be distinct from
whatever objects one later creates manually in the excess space.
The following example was given in the previous revision:
<xmp>
auto [p, sz] = new (std::return_size) T[5];
for (int i = 5; i < sz / sizeof(T); i++) {
new (p[i]) T;
}
for (int i = 0; i < sz / sizeof(T); i++) {
p[i].DoStuff();
}
for (int i = 5; i < sz / sizeof(T); i++) {
p[i].~T();
}
delete[] p;
</xmp>
The example shows how the lifetime of the "requested" objects is managed by the delete
expression, but the lifetime of the manually constructed excess objects has to be handled
manually. This is awkward, easy to misuse, and we do not see any need for such a facility.
Non-array *new-expression*s would be similarly useless.
One conceivable use case, which we ultimately will not pursue, is perhaps worth describing: for
trivial types, such as `char`, one could imagine a chain of expressions `new (std::return_size)
char[N]`. These could successively return excess storage, but in addition, the compiler would be
allowed to merge the allocations, because of the special allowances made to how
*new-expression*s (don't) call allocation functions. This behavior is not achievable with
`operator new` function calls alone.
We also note in passing that in R10 `std::sized_allocation_t` was a template, whose template
parameter would take the type used in the *new-expression*. There were implementer objections to
having a core language facility having to instantiate templates.
### Which kind of "size" ### {#alt-newsize}
We considered alternatives for returning the size.
* We could return two pointers, the initial object and one past the end of
the array (minus the array allocation overhead).
<xmp>
auto [start, end] = new (std::return_size) T[5];
for (T* p = start + 5; p != end; p++) {
new (p) T;
}
for (T* p = start; p != end; p++) {
p->DoStuff();
}
for (T* p = start + 5; p != end; p++) {
p->~T();
}
delete[] start;
</xmp>
The pair of pointers provides convenience for use with iterator-oriented
algorithms. The problem we foresee is that a size-returning allocation
function may not provide a size that is an appropriate multiple of
`sizeof(T)` (or fail to be a multiple of `alignof(T)`, a possible, but
unlikely scenario). This imposes a need for more extensive logic around
the *new-expression* in handling the returned allocation size.
* We could return the size in units of `T`, this leads to an inconsistency
between the expected usage for `new` and `new[]`:
* For `new`, we may only end up fitting a single `T` into an allocator size
quanta, so the extra space remains unusable. If we can fit multiple `T`
into a single allocator size quanta, we now have an array from what was a
scalar allocation site. This cannot be foreseen by the compiler as
`::operator new` is a replaceable function.
* For `new[]`, the size in units of `T` can easily be derived from the
returned size in bytes.
* We could pass the size in units of `T` or bytes to the constructor of `T`:
* For `new`, this is especially useful for tail-padded arrays, but
neglects default-initialized `T`.
* For `new[]`, a common use case is expected to be the allocation of
arrays of `char`, `int`, etc. The size of the overall array is
irrelevant for the individual elements.
* We could return the size via a reference parameter:
<xmp>
std::return_end<T> end;
T* p = new (end) T[5];
for (T* p = start + 5; p != end; p++) {
new (p) T;
}
for (T* p = start; p != end; p++) {
p->DoStuff();
}
for (T* p = start + 5; p != end; p++) {
p->~T();
}
</xmp>
or, demonstrated with bytes:
<xmp>
std::return_size_t size;
T* p = new (size) T[5];
for (int i = 5; i < size / sizeof(T); i++) {
new (p[i]) T;
}
for (int i = 0; i < size / sizeof(T); i++) {
p[i].DoStuff();
}
for (int i = 5; i < size / sizeof(T); i++) {
p[i].~T();
}
delete[] p;
</xmp>
(Casts omitted for clarity.)
As discussed for `::operator new` in [[#prop]], a reference parameter poses
difficulties for optimizers and involves returning the size via
memory (depending on ABI).
For `new[]` expressions, we considered alternatively initializing the returned
(`sz / sizeof(T)`) number of elements.
* This would avoid the need to explicitly construct / destruct the elements
with the additional returned space (if any).
The *new-initializer* is invoked for the returned number of elements,
rather than the requested number of elements. This allows `delete[]` to
destroy the correct number of elements (by storing `sz / sizeof(T)` in the
array allocation overhead).
* The presented proposal (leaving this space uninitialized) was chosen for
consistency with `new`.
# Discussion # {#discuss}
## *How* many `::operator new`'s? ## {#splode}
It is unfortunate that we have so many permutations of `::operator new`--eight
seems like far more than we should really need! But there really isn't any
significant runtime cost for having them. Use of raw calls to `::operator new`
is relatively rare: It's a *building block* for low-level libraries, allocators
([[P0401R4]]), and so on, so the cognitive burden on C++ users is low.
The authors have considered other alternatives to the additional overloads. At
the Jacksonville meeting, EWG suggested looking at parameter packs.
* Parameter packs do not reduce the number of symbols introduced.
Implementers still need to provide implementations each of the n overloads.
* Retrofitting parameter packs leaves us with *more* mangled variants.
Implementers need to provide both the legacy symbols as well as the
parameter pack-mangled symbols.
The authors have also considered APIs where all parameters are passed, thereby
requiring a single new overload. This adds further overhead for
implementations, as it moves compile-time decisions (is the alignment at or
below the minimum guaranteed by `operator new`) into runtime ones.
The alternative to modifying the handling of *new-expressions* invoking
deallocation functions (when an exception is thrown) would require additional
overloads for `operator delete` / `operator delete[]` whose sole purpose would
be to accept and discard the `std::return_size_t`.
## Implementation difficulty ## {#trivial}
It's worth reiterating that there's a perfectly good trivial implementation of these
functions:
<xmp>
std::sized_allocation_t ::operator new(std::size_t n, std::return_size_t) {
return {::operator new(n), n};
}
</xmp>
Malloc implementations are free to properly override this with a more impactful
definition, but this paper poses no significant difficulty for toolchain
implementers.
Implementation Experience:
* TCMalloc has developed an implementation opensourced on GitHub
([[MallocExtension]]). While this requires mapping from an integer size
class to the true number of bytes, combining this lookup with the
allocation is more efficient as we avoid recomputing the sizeclass itself
(given a request) or deriving it from the object's address.
* jemalloc is prototyping a `smallocx` function providing a C API for this
functionality [[smallocx]].
## Interaction with Sized Delete ## {#sizeddelete}
For allocations made with `sized_allocation_t`-returning `::operator new`, we need to
relax `::operator delete`'s size argument (16.6.2.1 and 16.6.2.2). For
allocations of `T`, the size quanta used by the allocator may not be a multiple
of `sizeof(T)`, leading to both the original and returned sizes being
unrecoverable at the time of deletion.
Consider the memory allocated by:
<xmp>
using T = std::aligned_storage<16, 8>::type;
std::vector<T> v(4);
</xmp>
The underlying heap allocation is made with `::operator new(64,
std::return_size_t)`.
* The memory allocator may return a 72 byte object: Since there is no `k`
such that `sizeof(T) * k = 72`, we can't provide that value to `::operator
delete(void*, size_t)`. The only option would be storing 72 explicitly,
which would be wasteful.
* The memory allocator may instead return an 80 byte object (5 `T`'s): We
now cannot represent the original request when deallocating without
additional storage.
For allocations made with
<xmp>
std::tie(p, m) = ::operator new(n, std::return_size_t{});
</xmp>
we permit `::operator delete(p, s)` where <math>*n* <= *s* <= *m*</math>.
This behavior is consistent with [[jemalloc]]'s `sdallocx`, where the
deallocation size must fall between the request (`n`) and the actual allocated
size (`m`) inclusive. It is also consistent with the standard library's
allocator interface (as amended by [[P0401R4]]).
## Advantages ## {#advantages}
It's easy to see that this approach nicely solves the problems posed by other methods:
* We pay almost nothing in speed to return an actual-size parameter. For
TCMalloc and jemalloc, this is typically a load from to map from
*sizeclass* to size. This cost is strictly smaller than with `nallocx` or
the like, as that same translation must be performed in addition to the
duplicative work previously discussed.
* We are told exactly the size we have, without risk of UB. We can avoid
subsequent reallocations when growing to a buffer to an already-allocated
size.
* Allocator telemetry knows actual request sizes exactly.
## Naming ## {#naming}
The library support type is named `sized_allocation_t`, based on
LEWG's naming suggestions and for consistency (in choice of `_t`) with the
other allocation library support types (`size_t`, `align_val_t`, `nothrow_t`,
etc.). We expect this to be spelled rarely.
Its members are `ptr` and `bytes`. `ptr` for the memory returned is an
intuitive name. `bytes` conveys units in its name (`bytes`). Since new
expressions can return additional storage under this proposal, this
distinguishes it from returning the amount of storage available for *whole*
objects. This contrasts with [[P0401R4]], which does return storage for whole
objects and whose support type's field is named `count`.
# Related work # {#rel}
[[P0401R4]] considers this problem at the level of the `Allocator` concept.
Ironically, the lack of the above API was one significant problem in its
original revision: how could an implementation of `std::allocator` provide the
requested feedback in a way that would work with any underlying malloc
implementation? See also [[#alt-allocfn]].
# History # {#history}
## R10 → D11 ## {#R11}
Support for *new-expression*s is removed, and the class template
`sized_return_t` is changed to a non-template class with a `void*`
member.
Rationale added why the facility needs to be provided as an allocation
function, rather than some unrelated free function or as a detail of
`std::allocator`.
Reports on implementation and deployment experience and impact on
runtime allocations have been added.
## R9 → R10 ## {#R10}
A detailed design of *new-expression*s has been added. Previously,
changes to *new-expression*s had only been outlined at a high level.
As a result, previously suggested changes to overload resolution rules
for finding the allocation function (which was allowed to fall back
to a non-size-returning allocation functon, with unclear consequences)
have been removed. The relation between *new-expression*s and allocation
functions is now described explicitly.
Sections are reorganized: alternative designs are now separate from
a discussion of the chosen design, and an expanded section on *new-expressions*
has been moved up before the proposed wording.
## R8 → R9 ## {#R9}