1
1
2
- NOTE: This repo has moved to https://github.com/cryptonector/ctp
3
-
4
- Q: What is it? A: A "thread-safe global variable" (TSGV) for C
5
- ---------------------------------------------------------------
6
-
7
- This repository's main feature is a thread-safe global variable (TSGV)
8
- for C. More C thread-primitives may be added in the future, thus the
9
- repository's name.
10
-
11
- A TSGV lets readers keep using a value read from it until they read the
12
- next value. Memory management is automatic: values are automatically
13
- destroyed when the last reference is released (explicitly, or implicitly
14
- at the next read, or when a reader thread exits). Reads are * lock-less*
15
- and fast, and _ never block writes_ . Writes are serialized but otherwise
16
- interact with readers without locks, thus writes * do not block reads* .
17
-
18
- This is not unlike a Clojure "ref". It's also similar to RCU, but
19
- unlike RCU, this has a much simpler API with nothing like
20
- ` synchronize_rcu() ` , and doesn't require any cross-CPU calls nor the
21
- ability to make CPUs/threads run, and it has no application-visible
22
- concept of critical sections, therefore it works in user-land with no
23
- special kernel support.
24
-
25
- - One thread needs to create the variable by calling
26
- ` pthread_var_init_np() ` and providing a value destructor. There is
27
- no static initializer, though one could be added.
28
- - Most threads only ever need to call ` pthread_var_get_np() ` , and maybe
29
- once ` pthread_var_wait_np() ` to wait until at least one value has
30
- been set.
31
- - One or more threads may call ` pthread_var_set_np() ` to publish new
32
- values.
2
+ > NOTE: This repo is mirrored at https://github.com/cryptonector/ctp and https://github.com/nicowilliams/ctp
3
+
4
+ # Q: What is it? A: A user-land-RCU-like API for C
5
+
6
+ This repository's only current feature is a read-copy-update (RCU) like,
7
+ thread-safe variable (TSV) for C. More C thread-primitives may be added
8
+ in the future, thus the repository's name.
9
+
10
+ A TSV lets readers safely keep using a value read from the TSV until
11
+ they read the next value. Memory management is automatic: values are
12
+ automatically destroyed when the last reference to a value is released
13
+ whether explicitly, or implicitly at the next read, or when a reader
14
+ thread exits. Reads are _ lock-less_ and fast, and _ never block
15
+ writers_ . Writers are serialized but otherwise interact with readers
16
+ without locks, thus writes * do not block reads* .
17
+
18
+ This is not unlike a Clojure ` ref ` , or like a Haskell ` msync ` . It's
19
+ also similar to RCU, but unlike RCU, this has a very simple API with
20
+ nothing like ` synchronize_rcu() ` , and doesn't require any cross-CPU
21
+ calls nor the ability to make CPUs/threads run, and it has no
22
+ application-visible concept of critical sections, therefore it works in
23
+ user-land with no special kernel support.
24
+
25
+ - One thread needs to create the variable (as many as desired) once by
26
+ calling ` thread_safe_var_init() ` and providing a value destructor.
27
+
28
+ > There is currently no static initializer, though one could be
29
+ > added. One would typically do this early in ` main() ` or in a
30
+ > ` pthread_once() ` initializer.
31
+
32
+ - Most threads only ever need to call ` thread_safe_var_get() ` .
33
+
34
+ > Reader threads _ may_ also call ` thread_safe_var_release() ` to allow
35
+ > a value to be freed sooner than otherwise.
36
+
37
+ - One or more threads may call ` thread_safe_var_set() ` to set new
38
+ values on the TSVs.
33
39
34
40
The API is:
35
41
36
- typedef struct pthread_var_np *pthread_var_np_t;
37
- typedef void (*pthread_var_destructor_np_t)(void *);
38
- int pthread_var_init_np(pthread_var_np_t *var, pthread_var_destructor_np_t value_destructor);
39
- void pthread_var_destroy_np(pthread_var_np_t var);
40
- int pthread_var_get_np(pthread_var_np_t var, void **valuep, uint64_t *versionp);
41
- int pthread_var_set_np(pthread_var_np_t var, void *value, uint64_t *versionp);
42
- int pthread_var_wait_np(pthread_var_np_t var);
43
- void pthread_var_release_np(pthread_var_np_t var);
42
+ ``` C
43
+ typedef struct thread_safe_var *thread_safe_var; /* TSV */
44
+
45
+ typedef void (* thread_safe_var_dtor_f)(void * ); /* Value destructor * /
46
+
47
+ /* Initialize a TSV with a given value destructor */
48
+ int thread_safe_var_init(thread_safe_var *, thread_safe_var_dtor_f);
49
+
50
+ /* Destroy a TSV */
51
+ void thread_safe_var_destroy(thread_safe_var);
52
+
53
+ /* Get the current value of the TSV and a version number for it */
54
+ int thread_safe_var_get(thread_safe_var, void **, uint64_t *);
55
+
56
+ /* Release the reference to the last value read by this thread from the TSV */
57
+ void thread_safe_var_release(thread_safe_var);
58
+
59
+ /* Wait for a value to be set on the TSV */
60
+ int thread_safe_var_wait(thread_safe_var);
61
+
62
+ /* Set a new value on the TSV (outputs the new version) */
63
+ int thread_safe_var_set(thread_safe_var, void *, uint64_t *);
64
+ ```
44
65
45
66
Value version numbers increase monotonically when values are set.
46
67
47
- Why? Because read-write locks are teh worst
48
- --------------------------------------------
68
+ # Why? Because read-write locks are terrible
69
+
70
+ So you have rarely-changing typically-global data (e.g., loaded
71
+ configuration, plugin lists, ...), and you have many threads that read
72
+ this, and you want reads to be fast. Worker threads need stable
73
+ configuration/whatever while doing work, then when they pick up another
74
+ task they can get a newer configuration if there is one.
75
+
76
+ How would one implement that?
49
77
50
- So you have rarely-changing global data (e.g., loaded configuration,
51
- plugins, ...), and you have many threads that read this, and you want
52
- reads to be fast. Worker threads need stable configuration/whatever
53
- while doing work, then when they pick up another task they can get a
54
- newer configuration if there is one. How would you implement that? A
55
- safe answer is: read-write locks around reading/writing the global
56
- variable, and reference count the data. But read-write locks are
57
- inherently bad: readers either can starve writers or can be blocked by
58
- writers.
78
+ A safe answer is: read-write locks around reading/writing the variable,
79
+ and reference count the data.
59
80
60
- A thread-safe global variable, on the other hand, is always fast to
61
- read, even when there's an active writer, and reading does not starve
62
- writers .
81
+ But read-write locks are inherently bad: readers either can starve
82
+ writers or can be blocked by writers. Either way read-write locks are a
83
+ performance problem .
63
84
64
- How?
65
- ----
85
+ A "thread-safe variable", on the other hand, is always fast to read,
86
+ even when there's an active writer, and reading does not starve writers.
87
+
88
+ # How?
66
89
67
90
Two implementations are included at this time.
68
91
@@ -72,9 +95,9 @@ The two implementations have slightly different characteristics.
72
95
reads and O(1) serialized writes.
73
96
74
97
But readers call free() and the value destructor, and, sometimes have
75
- to signal a potentially-waiting writer -- a blocking operation,
76
- though on an uncontended resource (so not really blocking, but it
77
- does involve a system call) .
98
+ to signal a potentially-waiting writer, which involves acquiring a
99
+ mutex -- a blocking operation, though on an uncontended resource, so
100
+ not really blocking .
78
101
79
102
This implementation has a pair of slots, one containing the "current"
80
103
value and one containing the "previous"/"next" value. Writers make the
@@ -93,10 +116,11 @@ The two implementations have slightly different characteristics.
93
116
Values are reference counted and so released immediately when the
94
117
last reference is dropped.
95
118
96
- - The other implementation ("slot list") has O(1) lock-less (but
97
- spinning) reads, and O(N log(M)) serialized writes where N is the maximum
98
- number of live threads that have read the variable and M is the
99
- number of referenced values).
119
+ - The other implementation ("slot list") has O(1) lock-less reads, with
120
+ unreferenced values garbage collected by serialized writers in `O(N
121
+ log(N))` where N is the maximum number of live threads that have read
122
+ the variable and M is the number of values that have been set and
123
+ possibly released).
100
124
101
125
Readers never call the allocator after the first read in any given
102
126
thread, and writers never call the allocator while holding the writer
@@ -123,16 +147,14 @@ The first implementation written was the slot-pair implementation. The
123
147
slot-list design is much easier to understand on the read-side, but it
124
148
is significantly more complex on the write-side.
125
149
126
- Requirements
127
- ------------
150
+ # Requirements
128
151
129
- C89, POSIX threads (though TSGV should be portable to Windows),
152
+ C89, POSIX threads (though TSV should be portable to Windows),
130
153
compilers with atomics intrinsics and/or atomics libraries.
131
154
132
155
In the future this may be upgraded to a C99 or even C11 requirement.
133
156
134
- Testing
135
- -------
157
+ # Testing
136
158
137
159
A test program is included that hammers the implementation. Run it in a
138
160
loop, with or without valgrind, ASAN (address sanitizer), or other
@@ -147,27 +169,25 @@ of bugs during development. In both cases writes are, on average, 5x
147
169
slower than reads, and reads are in the ten microseconds range on an old
148
170
laptop, running under virtualization.
149
171
150
- Performance
151
- -----------
172
+ # Performance
152
173
153
- On an old i7 laptop, virtualized, reads on idle thread-safe global
154
- variables (i.e., no writers in sight) take about 15ns. This is because
155
- the fast path in both implementations consists of reading a thread-local
156
- variable and then performing a single acquire-fenced memory read.
174
+ On an old i7 laptop, virtualized, reads on idle thread-safe variables
175
+ (i.e., no writers in sight) take about 15ns. This is because the fast
176
+ path in both implementations consists of reading a thread-local variable
177
+ and then performing a single acquire-fenced memory read.
157
178
158
179
On that same system, when threads write very frequently then reads slow
159
180
down to about 8us (8000ns). (But the test had eight times more threads
160
181
than CPUs, so the cost of context switching is included in that number.)
161
182
162
- On that same system writes on a busy thread-safe global variable take
163
- about 50us (50000ns), but non-contending writes on an otherwise idle
164
- thread-safe global variable take about 180ns.
183
+ On that same system writes on a busy thread-safe variable take about
184
+ 50us (50000ns), but non-contending writes on an otherwise idle
185
+ thread-safe variable take about 180ns.
165
186
166
187
I.e., this is blindingly fast, especially for intended use case
167
188
(infrequent writes).
168
189
169
- Install
170
- -------
190
+ # Install
171
191
172
192
Clone this repo, select a configuration, and make it.
173
193
@@ -190,16 +210,12 @@ Configuration variables:
190
210
191
211
Values: `-DHAVE___ATOMIC`, `-DHAVE___SYNC`, `-DHAVE_INTEL_INTRINSICS`, `-DHAVE_PTHREAD`, `-DNO_THREADS`
192
212
193
- - ` TSGV_IMPLEMENTATION `
213
+ - `TSV_IMPLEMENTATION `
194
214
195
- Values: ` -DUSE_TSGV_SLOT_PAIR_DESIGN ` , ` -DUSE_TSGV_SUBSCRIPTION_SLOTS_DESIGN `
215
+ Values: `-DUSE_TSV_SLOT_PAIR_DESIGN `, `-DUSE_TSV_SUBSCRIPTION_SLOTS_DESIGN `
196
216
197
217
- `CPPDEFS`
198
218
199
- Other options, mainly: what yield() implementation to use
200
- (` -DHAVE_PTHREAD_YIELD ` , ` -DHAVE_SCHED_YIELD ` , or ` -DHAVE_YIELD ` ).
201
- This is needed for the slot-list implementation.
202
-
203
219
`CPPDEFS` can also be used to set `NDEBUG`.
204
220
205
221
A build configuration system is needed, in part to select an atomic
@@ -214,8 +230,17 @@ Several atomic primitives implementations are available:
214
230
- global pthread mutex
215
231
- no synchronization (watch the test blow up!)
216
232
217
- TODO
218
- ----
233
+ # TODO
234
+
235
+ - Don't create a pthread-specific variable for each TSV. Instead share
236
+ one pthread-specific for all TSVs. This would require having the
237
+ pthread-specific values be a pointer to a structure that has a
238
+ pointer to an array of per-TSV elements, with
239
+ `thread_safe_var_init()` allocating an array index for each TSV.
240
+
241
+ This is important because there can be a maximum number of
242
+ pthread-specifics and we must not be the cause of exceeding that
243
+ maximum.
219
244
220
245
- Add an attributes optional input argument to the init function.
221
246
232
257
the current value's version number is the given one.)
233
258
234
259
- Add an API for waiting for values older than some version number to
235
- be released
260
+ be released?
236
261
237
262
This is tricky for the slot-pair case because we don't have a list of
238
263
extant values, but we need it in order to determine what is the
@@ -261,25 +286,17 @@ TODO
261
286
Note too that both implementations can (or do) defer calling of the
262
287
value destructor so that reading is fast. This should be an option.
263
288
264
- - Make this cache-friendly. In particular, the slot-list case could
265
- have the growable slot-array be an array of pointers to slots instead
266
- of an array of slots. Or slots could be sized more carefully, adding
267
- padding if need be.
289
+ - Add a static initializer?
268
290
269
- - Maybe add a static initializer... as a function-style macro that
270
- takes a destructor function argument. This basically means adding a
271
- ` pthread_once_t ` to the variable. C99 would then be required though
272
- (for the initializer).
291
+ - Add a better build system.
273
292
274
- - Add a proper build system.
275
293
- Add an implementation using read-write locks to compare performance
276
294
with.
277
- - Parametrize the test program.
278
- - Use symbol names that don't use the ` pthread_ ` prefix, or provide a
279
- configuration feature for renaming them with C pre-processor macros.
280
- - Use symbol names that don't conflict with known atomics libraries (so
281
- those can be used as an atomics backend). Currently the atomics
295
+
296
+ - Use symbol names that don't conflict with any known atomics libraries
297
+ (so those can be used as an atomics backend). Currently the atomics
282
298
symbols are loosely based on Illumos atomics primitives.
299
+
283
300
- Support Win32 (perhaps by building a small pthread compatibility
284
301
library; only mutexes and condition variables are needed).
285
302
0 commit comments