Skip to content

Commit 57c927b

Browse files
committed
Long overdue updates
These changes were made long ago, but not pushed out.
1 parent 3d997c9 commit 57c927b

8 files changed

+570
-612
lines changed

.gitignore

-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
21
t
32
*.o
43
*.so.*
@@ -7,7 +6,6 @@ t
76
*.gcov
87
*~
98
.*.sw[a-p]
10-
119
cscope.in.out
1210
cscope.out
1311
cscope.po.out

Makefile

+6-6
Original file line numberDiff line numberDiff line change
@@ -20,21 +20,21 @@ CSANFLAG = -fsanitize=address -fsanitize=undefined
2020
# -DNO_THREADS
2121
ATOMICS_BACKEND = -DHAVE___ATOMIC
2222

23-
# Implementations: -DUSE_TSGV_SLOT_PAIR_DESIGN (default),
24-
# -DUSE_TSGV_SUBSCRIPTION_SLOTS_DESIGN
25-
TSGV_IMPLEMENTATION =
23+
# Implementations: -DUSE_TSV_SLOT_PAIR_DESIGN (default),
24+
# -DUSE_TSV_SUBSCRIPTION_SLOTS_DESIGN
25+
TSV_IMPLEMENTATION =
2626

2727
CPPDEFS =
28-
CPPFLAGS = $(ATOMICS_BACKEND) $(TSGV_IMPLEMENTATION)
28+
CPPFLAGS = $(ATOMICS_BACKEND) $(TSV_IMPLEMENTATION)
2929
CFLAGS = -fPIC $(CSANFLAG) $(CDBGFLAG) $(COPTFLAG) $(CWARNFLAGS) $(CPPFLAGS) $(CPPDEFS)
3030

3131
LDLIBS = -lpthread -lrt #(but not on Windows, natch)
3232
LDFLAGS =
3333

34-
slotpair : TSGV_IMPLEMENTATION = -DUSE_TSGV_SLOT_PAIR_DESIGN
34+
slotpair : TSV_IMPLEMENTATION = -DUSE_TSV_SLOT_PAIR_DESIGN
3535
slotpair : t
3636

37-
slotlist : TSGV_IMPLEMENTATION = -DUSE_TSGV_SUBSCRIPTION_SLOTS_DESIGN
37+
slotlist : TSV_IMPLEMENTATION = -DUSE_TSV_SUBSCRIPTION_SLOTS_DESIGN
3838
slotlist : t
3939

4040
slotpairO0 : COPTFLAG = -O0

README.md

+118-101
Original file line numberDiff line numberDiff line change
@@ -1,68 +1,91 @@
11

2-
NOTE: This repo has moved to https://github.com/cryptonector/ctp
3-
4-
Q: What is it? A: A "thread-safe global variable" (TSGV) for C
5-
---------------------------------------------------------------
6-
7-
This repository's main feature is a thread-safe global variable (TSGV)
8-
for C. More C thread-primitives may be added in the future, thus the
9-
repository's name.
10-
11-
A TSGV lets readers keep using a value read from it until they read the
12-
next value. Memory management is automatic: values are automatically
13-
destroyed when the last reference is released (explicitly, or implicitly
14-
at the next read, or when a reader thread exits). Reads are *lock-less*
15-
and fast, and _never block writes_. Writes are serialized but otherwise
16-
interact with readers without locks, thus writes *do not block reads*.
17-
18-
This is not unlike a Clojure "ref". It's also similar to RCU, but
19-
unlike RCU, this has a much simpler API with nothing like
20-
`synchronize_rcu()`, and doesn't require any cross-CPU calls nor the
21-
ability to make CPUs/threads run, and it has no application-visible
22-
concept of critical sections, therefore it works in user-land with no
23-
special kernel support.
24-
25-
- One thread needs to create the variable by calling
26-
`pthread_var_init_np()` and providing a value destructor. There is
27-
no static initializer, though one could be added.
28-
- Most threads only ever need to call `pthread_var_get_np()`, and maybe
29-
once `pthread_var_wait_np()` to wait until at least one value has
30-
been set.
31-
- One or more threads may call `pthread_var_set_np()` to publish new
32-
values.
2+
> NOTE: This repo is mirrored at https://github.com/cryptonector/ctp and https://github.com/nicowilliams/ctp
3+
4+
# Q: What is it? A: A user-land-RCU-like API for C
5+
6+
This repository's only current feature is a read-copy-update (RCU) like,
7+
thread-safe variable (TSV) for C. More C thread-primitives may be added
8+
in the future, thus the repository's name.
9+
10+
A TSV lets readers safely keep using a value read from the TSV until
11+
they read the next value. Memory management is automatic: values are
12+
automatically destroyed when the last reference to a value is released
13+
whether explicitly, or implicitly at the next read, or when a reader
14+
thread exits. Reads are _lock-less_ and fast, and _never block
15+
writers_. Writers are serialized but otherwise interact with readers
16+
without locks, thus writes *do not block reads*.
17+
18+
This is not unlike a Clojure `ref`, or like a Haskell `msync`. It's
19+
also similar to RCU, but unlike RCU, this has a very simple API with
20+
nothing like `synchronize_rcu()`, and doesn't require any cross-CPU
21+
calls nor the ability to make CPUs/threads run, and it has no
22+
application-visible concept of critical sections, therefore it works in
23+
user-land with no special kernel support.
24+
25+
- One thread needs to create the variable (as many as desired) once by
26+
calling `thread_safe_var_init()` and providing a value destructor.
27+
28+
> There is currently no static initializer, though one could be
29+
> added. One would typically do this early in `main()` or in a
30+
> `pthread_once()` initializer.
31+
32+
- Most threads only ever need to call `thread_safe_var_get()`.
33+
34+
> Reader threads _may_ also call `thread_safe_var_release()` to allow
35+
> a value to be freed sooner than otherwise.
36+
37+
- One or more threads may call `thread_safe_var_set()` to set new
38+
values on the TSVs.
3339

3440
The API is:
3541

36-
typedef struct pthread_var_np *pthread_var_np_t;
37-
typedef void (*pthread_var_destructor_np_t)(void *);
38-
int pthread_var_init_np(pthread_var_np_t *var, pthread_var_destructor_np_t value_destructor);
39-
void pthread_var_destroy_np(pthread_var_np_t var);
40-
int pthread_var_get_np(pthread_var_np_t var, void **valuep, uint64_t *versionp);
41-
int pthread_var_set_np(pthread_var_np_t var, void *value, uint64_t *versionp);
42-
int pthread_var_wait_np(pthread_var_np_t var);
43-
void pthread_var_release_np(pthread_var_np_t var);
42+
```C
43+
typedef struct thread_safe_var *thread_safe_var; /* TSV */
44+
45+
typedef void (*thread_safe_var_dtor_f)(void *); /* Value destructor */
46+
47+
/* Initialize a TSV with a given value destructor */
48+
int thread_safe_var_init(thread_safe_var *, thread_safe_var_dtor_f);
49+
50+
/* Destroy a TSV */
51+
void thread_safe_var_destroy(thread_safe_var);
52+
53+
/* Get the current value of the TSV and a version number for it */
54+
int thread_safe_var_get(thread_safe_var, void **, uint64_t *);
55+
56+
/* Release the reference to the last value read by this thread from the TSV */
57+
void thread_safe_var_release(thread_safe_var);
58+
59+
/* Wait for a value to be set on the TSV */
60+
int thread_safe_var_wait(thread_safe_var);
61+
62+
/* Set a new value on the TSV (outputs the new version) */
63+
int thread_safe_var_set(thread_safe_var, void *, uint64_t *);
64+
```
4465
4566
Value version numbers increase monotonically when values are set.
4667
47-
Why? Because read-write locks are teh worst
48-
--------------------------------------------
68+
# Why? Because read-write locks are terrible
69+
70+
So you have rarely-changing typically-global data (e.g., loaded
71+
configuration, plugin lists, ...), and you have many threads that read
72+
this, and you want reads to be fast. Worker threads need stable
73+
configuration/whatever while doing work, then when they pick up another
74+
task they can get a newer configuration if there is one.
75+
76+
How would one implement that?
4977
50-
So you have rarely-changing global data (e.g., loaded configuration,
51-
plugins, ...), and you have many threads that read this, and you want
52-
reads to be fast. Worker threads need stable configuration/whatever
53-
while doing work, then when they pick up another task they can get a
54-
newer configuration if there is one. How would you implement that? A
55-
safe answer is: read-write locks around reading/writing the global
56-
variable, and reference count the data. But read-write locks are
57-
inherently bad: readers either can starve writers or can be blocked by
58-
writers.
78+
A safe answer is: read-write locks around reading/writing the variable,
79+
and reference count the data.
5980
60-
A thread-safe global variable, on the other hand, is always fast to
61-
read, even when there's an active writer, and reading does not starve
62-
writers.
81+
But read-write locks are inherently bad: readers either can starve
82+
writers or can be blocked by writers. Either way read-write locks are a
83+
performance problem.
6384
64-
How?
65-
----
85+
A "thread-safe variable", on the other hand, is always fast to read,
86+
even when there's an active writer, and reading does not starve writers.
87+
88+
# How?
6689
6790
Two implementations are included at this time.
6891
@@ -72,9 +95,9 @@ The two implementations have slightly different characteristics.
7295
reads and O(1) serialized writes.
7396
7497
But readers call free() and the value destructor, and, sometimes have
75-
to signal a potentially-waiting writer -- a blocking operation,
76-
though on an uncontended resource (so not really blocking, but it
77-
does involve a system call).
98+
to signal a potentially-waiting writer, which involves acquiring a
99+
mutex -- a blocking operation, though on an uncontended resource, so
100+
not really blocking.
78101
79102
This implementation has a pair of slots, one containing the "current"
80103
value and one containing the "previous"/"next" value. Writers make the
@@ -93,10 +116,11 @@ The two implementations have slightly different characteristics.
93116
Values are reference counted and so released immediately when the
94117
last reference is dropped.
95118
96-
- The other implementation ("slot list") has O(1) lock-less (but
97-
spinning) reads, and O(N log(M)) serialized writes where N is the maximum
98-
number of live threads that have read the variable and M is the
99-
number of referenced values).
119+
- The other implementation ("slot list") has O(1) lock-less reads, with
120+
unreferenced values garbage collected by serialized writers in `O(N
121+
log(N))` where N is the maximum number of live threads that have read
122+
the variable and M is the number of values that have been set and
123+
possibly released).
100124
101125
Readers never call the allocator after the first read in any given
102126
thread, and writers never call the allocator while holding the writer
@@ -123,16 +147,14 @@ The first implementation written was the slot-pair implementation. The
123147
slot-list design is much easier to understand on the read-side, but it
124148
is significantly more complex on the write-side.
125149
126-
Requirements
127-
------------
150+
# Requirements
128151
129-
C89, POSIX threads (though TSGV should be portable to Windows),
152+
C89, POSIX threads (though TSV should be portable to Windows),
130153
compilers with atomics intrinsics and/or atomics libraries.
131154
132155
In the future this may be upgraded to a C99 or even C11 requirement.
133156
134-
Testing
135-
-------
157+
# Testing
136158
137159
A test program is included that hammers the implementation. Run it in a
138160
loop, with or without valgrind, ASAN (address sanitizer), or other
@@ -147,27 +169,25 @@ of bugs during development. In both cases writes are, on average, 5x
147169
slower than reads, and reads are in the ten microseconds range on an old
148170
laptop, running under virtualization.
149171
150-
Performance
151-
-----------
172+
# Performance
152173
153-
On an old i7 laptop, virtualized, reads on idle thread-safe global
154-
variables (i.e., no writers in sight) take about 15ns. This is because
155-
the fast path in both implementations consists of reading a thread-local
156-
variable and then performing a single acquire-fenced memory read.
174+
On an old i7 laptop, virtualized, reads on idle thread-safe variables
175+
(i.e., no writers in sight) take about 15ns. This is because the fast
176+
path in both implementations consists of reading a thread-local variable
177+
and then performing a single acquire-fenced memory read.
157178
158179
On that same system, when threads write very frequently then reads slow
159180
down to about 8us (8000ns). (But the test had eight times more threads
160181
than CPUs, so the cost of context switching is included in that number.)
161182
162-
On that same system writes on a busy thread-safe global variable take
163-
about 50us (50000ns), but non-contending writes on an otherwise idle
164-
thread-safe global variable take about 180ns.
183+
On that same system writes on a busy thread-safe variable take about
184+
50us (50000ns), but non-contending writes on an otherwise idle
185+
thread-safe variable take about 180ns.
165186
166187
I.e., this is blindingly fast, especially for intended use case
167188
(infrequent writes).
168189
169-
Install
170-
-------
190+
# Install
171191
172192
Clone this repo, select a configuration, and make it.
173193
@@ -190,16 +210,12 @@ Configuration variables:
190210
191211
Values: `-DHAVE___ATOMIC`, `-DHAVE___SYNC`, `-DHAVE_INTEL_INTRINSICS`, `-DHAVE_PTHREAD`, `-DNO_THREADS`
192212
193-
- `TSGV_IMPLEMENTATION`
213+
- `TSV_IMPLEMENTATION`
194214
195-
Values: `-DUSE_TSGV_SLOT_PAIR_DESIGN`, `-DUSE_TSGV_SUBSCRIPTION_SLOTS_DESIGN`
215+
Values: `-DUSE_TSV_SLOT_PAIR_DESIGN`, `-DUSE_TSV_SUBSCRIPTION_SLOTS_DESIGN`
196216
197217
- `CPPDEFS`
198218
199-
Other options, mainly: what yield() implementation to use
200-
(`-DHAVE_PTHREAD_YIELD`, `-DHAVE_SCHED_YIELD`, or `-DHAVE_YIELD`).
201-
This is needed for the slot-list implementation.
202-
203219
`CPPDEFS` can also be used to set `NDEBUG`.
204220
205221
A build configuration system is needed, in part to select an atomic
@@ -214,8 +230,17 @@ Several atomic primitives implementations are available:
214230
- global pthread mutex
215231
- no synchronization (watch the test blow up!)
216232
217-
TODO
218-
----
233+
# TODO
234+
235+
- Don't create a pthread-specific variable for each TSV. Instead share
236+
one pthread-specific for all TSVs. This would require having the
237+
pthread-specific values be a pointer to a structure that has a
238+
pointer to an array of per-TSV elements, with
239+
`thread_safe_var_init()` allocating an array index for each TSV.
240+
241+
This is important because there can be a maximum number of
242+
pthread-specifics and we must not be the cause of exceeding that
243+
maximum.
219244
220245
- Add an attributes optional input argument to the init function.
221246
@@ -232,7 +257,7 @@ TODO
232257
the current value's version number is the given one.)
233258
234259
- Add an API for waiting for values older than some version number to
235-
be released
260+
be released?
236261
237262
This is tricky for the slot-pair case because we don't have a list of
238263
extant values, but we need it in order to determine what is the
@@ -261,25 +286,17 @@ TODO
261286
Note too that both implementations can (or do) defer calling of the
262287
value destructor so that reading is fast. This should be an option.
263288
264-
- Make this cache-friendly. In particular, the slot-list case could
265-
have the growable slot-array be an array of pointers to slots instead
266-
of an array of slots. Or slots could be sized more carefully, adding
267-
padding if need be.
289+
- Add a static initializer?
268290
269-
- Maybe add a static initializer... as a function-style macro that
270-
takes a destructor function argument. This basically means adding a
271-
`pthread_once_t` to the variable. C99 would then be required though
272-
(for the initializer).
291+
- Add a better build system.
273292
274-
- Add a proper build system.
275293
- Add an implementation using read-write locks to compare performance
276294
with.
277-
- Parametrize the test program.
278-
- Use symbol names that don't use the `pthread_` prefix, or provide a
279-
configuration feature for renaming them with C pre-processor macros.
280-
- Use symbol names that don't conflict with known atomics libraries (so
281-
those can be used as an atomics backend). Currently the atomics
295+
296+
- Use symbol names that don't conflict with any known atomics libraries
297+
(so those can be used as an atomics backend). Currently the atomics
282298
symbols are loosely based on Illumos atomics primitives.
299+
283300
- Support Win32 (perhaps by building a small pthread compatibility
284301
library; only mutexes and condition variables are needed).
285302

0 commit comments

Comments
 (0)