Skip to content

Commit 197435d

Browse files
authored
Merge pull request ElektraInitiative#4715 from kodebach/decisions-keyname
[decisions] keyname
2 parents 6f97507 + dba07bb commit 197435d

File tree

4 files changed

+183
-50
lines changed

4 files changed

+183
-50
lines changed

doc/decisions/0b_rejected/separate_key_name.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ Continue keeping 3 classes: `Key`, `KeySet` and `KDB`.
3434
## Related Decisions
3535

3636
- [Null](../5_implemented/null.md)
37-
- [Store the escaped and/or unescaped key name](../2_in_progress/store_name.md)
37+
- [Namespace and Name of Keys](../3_decided/keyname.md)
3838

3939
## Notes
4040

doc/decisions/2_in_progress/store_name.md

-49
This file was deleted.

doc/decisions/3_decided/keyname.md

+180
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
# Namespace and Name of Keys
2+
3+
## Problem
4+
5+
A `Key` in Elektra is identified by its name, which consists of a namespace and a number of name parts.
6+
There are two common representations for the name: escaped and unescaped.
7+
8+
The unescaped form is essentially a single namespace byte plus and arbitrary-length sequence of arbitrary bytes, in which a `\0` byte separates parts.
9+
The escaped name is a `\0`-terminated string that maps 1:1 onto unescaped names.
10+
More details can be found in [the relevant docs](/doc/KEYNAMES.md).
11+
12+
There is a conflict between these two forms in terms of API convenience and efficient execution.
13+
Generally, the escaped form is more convenient to use, since it is entirely human-readable and just a "normal" string.
14+
Implementing many tasks (like order comparisons) is, however, much simpler when using the unescaped name.
15+
Additionally, using the unescaped name often results in more performant code as well.
16+
17+
A particularly common example that highlights the difficulties in handling escaped names is splitting the name into parts.
18+
In the escaped name, this task requires correct handling of escape sequences, whereas in the unescaped name parts are always delimited by a `\0` byte.
19+
20+
Before this decision, we stored both versions in every `Key`.
21+
However, this resulted in too much memory use, so we need to find another solution.
22+
23+
The question now is, which representations should be used by `libelektra-core` and how.
24+
25+
## Constraints
26+
27+
- Because `KeySet` is ordered by name and stores `Key`, the order comparison between the name of two `Key`s must be "fast enough".
28+
(see assumption below)
29+
- We need a single pointer to a single buffer that contains the entire name of a `Key`.
30+
While there are other options, some of which could even save memory (e.g., split into parts and deduplicate), much of the `KeySet` internals rely on the fact that the name is a single buffer.
31+
Changing this would require major redesigns.
32+
33+
## Assumptions
34+
35+
- In most cases the escaped name is used for convenience and not because of actual requirements.
36+
- The most common case for using the escaped name is UI: reading names from or displaying them in a user interface (e.g., `kdb` CLI)
37+
- In the constraint about order comparisons above, we assume that "fast enough" means "comparable to a single `memcmp`".
38+
Profiling for previous implementations, not based on a single `memcmp` of unescaped names, showed the comparison as a bottleneck, while the current single-`memcmp` implementation does not show the bottleneck.
39+
That said, it may be possible to find a solution slower than the current one that is still fast enough to avoid the previous bottleneck.
40+
41+
## Considered Alternatives
42+
43+
### Only escaped name
44+
45+
Because the escaped name is a simple `\0`-terminated string, it can be represented as a single `char *`.
46+
47+
Storing the name as a single `char *` would be the most space efficient.
48+
But resizing would require counting the length every time.
49+
Therefore, for storage the better solution may be a `char *` and a `size_t`.
50+
51+
However, in the API the name could always be a single `char *`, making for a very easy to use API.
52+
53+
The biggest problem with this approach is that comparing two escaped names is not trivial.
54+
The comparison needs to account for namespaces, parts and escaping.
55+
Previous benchmarks showed that it is very hard or even impossible to make the comparison of escaped names fast enough for our use cases.
56+
57+
Similarly, iterating over the individual parts of a name (and/or manipulating them) is non-trivial, because it requires logic to handle escape sequences.
58+
59+
### Only unescaped name
60+
61+
The unescaped name contains `\0` bytes.
62+
It therefore must be represented as a pointer and a size.
63+
64+
This can make for less convenient API, but there are mitigation strategies using additional types.
65+
Using unescaped names in code can be inconvenient, especially regarding the namespace.
66+
Without a namespace a name could be written as e.g., `"foo\0bar"`.
67+
But with a namespace it would be something like `"\1\0\foo\0bar"` and developers would need to remember what namespace `\1` is.
68+
Using the `KEY_NS_*` constants like this is not easily possible.
69+
70+
Both order and hierarchy comparisons are very simple in this case and can be implemented with a single `memcmp` and a tiny amount of extra logic (e.g., to handle cascading names).
71+
Iterating over the individual parts is also trivial, since all parts are separated by `\0` bytes.
72+
73+
### Only unescaped name, with separate namespace
74+
75+
In the above solution, the entire unescaped name (including the namespace) would always be considered one unit.
76+
As such, there would only be a single pointer and a size in an API that needs a name.
77+
This can be inconvenient, because it makes using the `KEY_NS_*` constants more difficult.
78+
79+
This solution enhances the above, by considering the namespace a separate thing.
80+
Above the namespace is intrinsically part of the name.
81+
It is essentially just a restriction on the first part of the name and sometimes the namespace must be considered specially.
82+
In this solution, we consider the namespace a separate entity from the start.
83+
A key does not have a name, which starts with a namespace.
84+
Instead, a key has a namespace _and_ a name.
85+
86+
This is mostly a theoretical distinction, but it makes it easier to argue in favor of APIs that use separate arguments for the namespace.
87+
It also makes it more obvious that sometimes the namespace on its own can have an influence on the behavior of a function.
88+
89+
In the API the name could now be given as separated into namespace and the rest of the name.
90+
Instead of taking a single pointer and size, which receive values like `"\1\0foo\0bar"` and `10`, the API would take a namespace, a pointer, and a size, with values like `KEY_NS_CASCADING`, `"foo\0bar"` and `8`.
91+
92+
Internally, we don't necessarily need to store this as separate fields.
93+
The namespace could be combined into one buffer with the rest of the name, and stored as a single pointer and size.
94+
However, depending on the API there can also be benefits to keeping the namespace as a separate field.
95+
96+
Even with a separate namespace field, most benefits of "Only unescaped name" are retained.
97+
The memory consumption is near minimal (alignment padding can cause a difference).
98+
Comparisons are exactly the same, just with an additional namespace byte comparison beforehand.
99+
100+
### Both escaped and unescaped name
101+
102+
The previous approach used both to combine the advantages of escaped and unescaped name.
103+
104+
The API could largely rely on the escaped name, while e.g., comparisons can use the unescaped name.
105+
106+
The issue with this approach is the insane memory consumption.
107+
Keynames can already be quite long and `Key` is at the base of Elektra.
108+
Storing every name twice in only slightly different forms essentially doubles the memory consumption.
109+
110+
### Both escaped and unescaped name, but only unescaped stored
111+
112+
Instead of storing both escaped and unescaped name, only the unescaped name could be stored.
113+
114+
APIs that use the escaped name would do conversion on the fly.
115+
116+
This approach has several downsides.
117+
First, while the conversion may be optimized, it will never be free in terms of runtime.
118+
But more importantly, if an escaped name should be returned by an API, it must be stored somewhere.
119+
This means extra allocations and crucially somebody needs to do the cleanup.
120+
In other words, it complicates the API.
121+
122+
### Escaped and unescaped name in single buffer
123+
124+
Another variant of the above.
125+
The escaped and unescaped name are stored in a single buffer.
126+
This avoids extra allocations and extra pointers and sizes in structs.
127+
128+
The escaped name could also be stored lazily only when needed.
129+
This would solve the cleanup problem.
130+
131+
While this may seem like the ideal solution, there are still some downsides.
132+
The biggest problem is the API design.
133+
If the API uses escaped names a lot (because it is more convenient), then this essentially degrades into the "Both escaped and unescaped name" solution.
134+
Even if APIs exist for both escaped and unescaped names, the convenience benefit, will lead to more use of escaped names.
135+
This means the escaped name will be stored for many keys and therefore the benefit of the lazy allocation is negated.
136+
137+
Without the lazy allocation benefit, the only difference to "Both escaped and unescaped name" is that we have fewer pointers and sizes in structs.
138+
This saves some amount of memory and allocations, but makes internal code more difficult to write and understand.
139+
140+
## Decision
141+
142+
Go with "Only unescaped name, with separate namespace" from above:
143+
144+
- Store only unescaped name with size inside `struct _Key`
145+
- API of `libelektra-core` will use unescaped name exclusively
146+
- Convenience functions using escaped names, will be provided via other libraries
147+
- Where appropriate the API will take the namespace as a separate argument to allow using `KEY_NS_*` constants.
148+
- Whether namespace is stored separately in `struct _Key` will be decided at a later point, when the scope of all API changes and changes to `struct _Key` is clear.
149+
150+
## Rationale
151+
152+
- Largest memory savings among the proposed options
153+
- Option to use separate namespace argument leads to more convenient API (`KEY_NS_*` constants).
154+
- Simple internal code
155+
- Escaped name requirements can easily be solved by an additional library (e.g., `libelektra-ease`, `libelektra-extra` or new standalone library for names), because not every caller will need those functions.
156+
- Full API and internal struct layout aren't designed yet, so deciding how to store namespace is difficult.
157+
158+
## Implications
159+
160+
- `keyNew` needs to change
161+
- `keyName` returns unescaped name
162+
- functions for escaped name move out of core
163+
164+
## Related Decisions
165+
166+
## Notes
167+
168+
### Printing unescaped name in GDB
169+
170+
In GDB (and probably others) the unescaped name of a `Key * key` can be printed with (assuming the name is in `key->ukey` and its size in `key->keyUSize`):
171+
172+
```
173+
p *key->ukey@key->keyUSize
174+
```
175+
176+
This prints `key->ukey` as a fixed-length string of length `key->keyUSize`, e.g., for `user:/abc` it prints:
177+
178+
```
179+
$1 = "\006\000abc"
180+
```

doc/news/_preparation_next_release.md

+2
Original file line numberDiff line numberDiff line change
@@ -483,6 +483,8 @@ This section keeps you up-to-date with the multi-language support provided by El
483483
- Add decision for [read-only keynames](../decisions/0_drafts/readonly_keynames.md) _(Maximilian Irlinger @atmaxinger)_
484484
- <<TODO>>
485485
- <<TODO>>
486+
- Revive [keyname decision](../decisions/3_decided/keyname.md) _(@kodebach)_
487+
- <<TODO>>
486488
- <<TODO>>
487489
- Add decision for [copy-on-write](../decisions/2_in_progress/copy_on_write.md) and provide implementation suggestions. _(Maximilian Irlinger @atmaxinger)_
488490
- <<TODO>>

0 commit comments

Comments
 (0)