|
| 1 | +# Namespace and Name of Keys |
| 2 | + |
| 3 | +## Problem |
| 4 | + |
| 5 | +A `Key` in Elektra is identified by its name, which consists of a namespace and a number of name parts. |
| 6 | +There are two common representations for the name: escaped and unescaped. |
| 7 | + |
| 8 | +The unescaped form is essentially a single namespace byte plus and arbitrary-length sequence of arbitrary bytes, in which a `\0` byte separates parts. |
| 9 | +The escaped name is a `\0`-terminated string that maps 1:1 onto unescaped names. |
| 10 | +More details can be found in [the relevant docs](/doc/KEYNAMES.md). |
| 11 | + |
| 12 | +There is a conflict between these two forms in terms of API convenience and efficient execution. |
| 13 | +Generally, the escaped form is more convenient to use, since it is entirely human-readable and just a "normal" string. |
| 14 | +Implementing many tasks (like order comparisons) is, however, much simpler when using the unescaped name. |
| 15 | +Additionally, using the unescaped name often results in more performant code as well. |
| 16 | + |
| 17 | +A particularly common example that highlights the difficulties in handling escaped names is splitting the name into parts. |
| 18 | +In the escaped name, this task requires correct handling of escape sequences, whereas in the unescaped name parts are always delimited by a `\0` byte. |
| 19 | + |
| 20 | +Before this decision, we stored both versions in every `Key`. |
| 21 | +However, this resulted in too much memory use, so we need to find another solution. |
| 22 | + |
| 23 | +The question now is, which representations should be used by `libelektra-core` and how. |
| 24 | + |
| 25 | +## Constraints |
| 26 | + |
| 27 | +- Because `KeySet` is ordered by name and stores `Key`, the order comparison between the name of two `Key`s must be "fast enough". |
| 28 | + (see assumption below) |
| 29 | +- We need a single pointer to a single buffer that contains the entire name of a `Key`. |
| 30 | + While there are other options, some of which could even save memory (e.g., split into parts and deduplicate), much of the `KeySet` internals rely on the fact that the name is a single buffer. |
| 31 | + Changing this would require major redesigns. |
| 32 | + |
| 33 | +## Assumptions |
| 34 | + |
| 35 | +- In most cases the escaped name is used for convenience and not because of actual requirements. |
| 36 | +- The most common case for using the escaped name is UI: reading names from or displaying them in a user interface (e.g., `kdb` CLI) |
| 37 | +- In the constraint about order comparisons above, we assume that "fast enough" means "comparable to a single `memcmp`". |
| 38 | + Profiling for previous implementations, not based on a single `memcmp` of unescaped names, showed the comparison as a bottleneck, while the current single-`memcmp` implementation does not show the bottleneck. |
| 39 | + That said, it may be possible to find a solution slower than the current one that is still fast enough to avoid the previous bottleneck. |
| 40 | + |
| 41 | +## Considered Alternatives |
| 42 | + |
| 43 | +### Only escaped name |
| 44 | + |
| 45 | +Because the escaped name is a simple `\0`-terminated string, it can be represented as a single `char *`. |
| 46 | + |
| 47 | +Storing the name as a single `char *` would be the most space efficient. |
| 48 | +But resizing would require counting the length every time. |
| 49 | +Therefore, for storage the better solution may be a `char *` and a `size_t`. |
| 50 | + |
| 51 | +However, in the API the name could always be a single `char *`, making for a very easy to use API. |
| 52 | + |
| 53 | +The biggest problem with this approach is that comparing two escaped names is not trivial. |
| 54 | +The comparison needs to account for namespaces, parts and escaping. |
| 55 | +Previous benchmarks showed that it is very hard or even impossible to make the comparison of escaped names fast enough for our use cases. |
| 56 | + |
| 57 | +Similarly, iterating over the individual parts of a name (and/or manipulating them) is non-trivial, because it requires logic to handle escape sequences. |
| 58 | + |
| 59 | +### Only unescaped name |
| 60 | + |
| 61 | +The unescaped name contains `\0` bytes. |
| 62 | +It therefore must be represented as a pointer and a size. |
| 63 | + |
| 64 | +This can make for less convenient API, but there are mitigation strategies using additional types. |
| 65 | +Using unescaped names in code can be inconvenient, especially regarding the namespace. |
| 66 | +Without a namespace a name could be written as e.g., `"foo\0bar"`. |
| 67 | +But with a namespace it would be something like `"\1\0\foo\0bar"` and developers would need to remember what namespace `\1` is. |
| 68 | +Using the `KEY_NS_*` constants like this is not easily possible. |
| 69 | + |
| 70 | +Both order and hierarchy comparisons are very simple in this case and can be implemented with a single `memcmp` and a tiny amount of extra logic (e.g., to handle cascading names). |
| 71 | +Iterating over the individual parts is also trivial, since all parts are separated by `\0` bytes. |
| 72 | + |
| 73 | +### Only unescaped name, with separate namespace |
| 74 | + |
| 75 | +In the above solution, the entire unescaped name (including the namespace) would always be considered one unit. |
| 76 | +As such, there would only be a single pointer and a size in an API that needs a name. |
| 77 | +This can be inconvenient, because it makes using the `KEY_NS_*` constants more difficult. |
| 78 | + |
| 79 | +This solution enhances the above, by considering the namespace a separate thing. |
| 80 | +Above the namespace is intrinsically part of the name. |
| 81 | +It is essentially just a restriction on the first part of the name and sometimes the namespace must be considered specially. |
| 82 | +In this solution, we consider the namespace a separate entity from the start. |
| 83 | +A key does not have a name, which starts with a namespace. |
| 84 | +Instead, a key has a namespace _and_ a name. |
| 85 | + |
| 86 | +This is mostly a theoretical distinction, but it makes it easier to argue in favor of APIs that use separate arguments for the namespace. |
| 87 | +It also makes it more obvious that sometimes the namespace on its own can have an influence on the behavior of a function. |
| 88 | + |
| 89 | +In the API the name could now be given as separated into namespace and the rest of the name. |
| 90 | +Instead of taking a single pointer and size, which receive values like `"\1\0foo\0bar"` and `10`, the API would take a namespace, a pointer, and a size, with values like `KEY_NS_CASCADING`, `"foo\0bar"` and `8`. |
| 91 | + |
| 92 | +Internally, we don't necessarily need to store this as separate fields. |
| 93 | +The namespace could be combined into one buffer with the rest of the name, and stored as a single pointer and size. |
| 94 | +However, depending on the API there can also be benefits to keeping the namespace as a separate field. |
| 95 | + |
| 96 | +Even with a separate namespace field, most benefits of "Only unescaped name" are retained. |
| 97 | +The memory consumption is near minimal (alignment padding can cause a difference). |
| 98 | +Comparisons are exactly the same, just with an additional namespace byte comparison beforehand. |
| 99 | + |
| 100 | +### Both escaped and unescaped name |
| 101 | + |
| 102 | +The previous approach used both to combine the advantages of escaped and unescaped name. |
| 103 | + |
| 104 | +The API could largely rely on the escaped name, while e.g., comparisons can use the unescaped name. |
| 105 | + |
| 106 | +The issue with this approach is the insane memory consumption. |
| 107 | +Keynames can already be quite long and `Key` is at the base of Elektra. |
| 108 | +Storing every name twice in only slightly different forms essentially doubles the memory consumption. |
| 109 | + |
| 110 | +### Both escaped and unescaped name, but only unescaped stored |
| 111 | + |
| 112 | +Instead of storing both escaped and unescaped name, only the unescaped name could be stored. |
| 113 | + |
| 114 | +APIs that use the escaped name would do conversion on the fly. |
| 115 | + |
| 116 | +This approach has several downsides. |
| 117 | +First, while the conversion may be optimized, it will never be free in terms of runtime. |
| 118 | +But more importantly, if an escaped name should be returned by an API, it must be stored somewhere. |
| 119 | +This means extra allocations and crucially somebody needs to do the cleanup. |
| 120 | +In other words, it complicates the API. |
| 121 | + |
| 122 | +### Escaped and unescaped name in single buffer |
| 123 | + |
| 124 | +Another variant of the above. |
| 125 | +The escaped and unescaped name are stored in a single buffer. |
| 126 | +This avoids extra allocations and extra pointers and sizes in structs. |
| 127 | + |
| 128 | +The escaped name could also be stored lazily only when needed. |
| 129 | +This would solve the cleanup problem. |
| 130 | + |
| 131 | +While this may seem like the ideal solution, there are still some downsides. |
| 132 | +The biggest problem is the API design. |
| 133 | +If the API uses escaped names a lot (because it is more convenient), then this essentially degrades into the "Both escaped and unescaped name" solution. |
| 134 | +Even if APIs exist for both escaped and unescaped names, the convenience benefit, will lead to more use of escaped names. |
| 135 | +This means the escaped name will be stored for many keys and therefore the benefit of the lazy allocation is negated. |
| 136 | + |
| 137 | +Without the lazy allocation benefit, the only difference to "Both escaped and unescaped name" is that we have fewer pointers and sizes in structs. |
| 138 | +This saves some amount of memory and allocations, but makes internal code more difficult to write and understand. |
| 139 | + |
| 140 | +## Decision |
| 141 | + |
| 142 | +Go with "Only unescaped name, with separate namespace" from above: |
| 143 | + |
| 144 | +- Store only unescaped name with size inside `struct _Key` |
| 145 | +- API of `libelektra-core` will use unescaped name exclusively |
| 146 | +- Convenience functions using escaped names, will be provided via other libraries |
| 147 | +- Where appropriate the API will take the namespace as a separate argument to allow using `KEY_NS_*` constants. |
| 148 | +- Whether namespace is stored separately in `struct _Key` will be decided at a later point, when the scope of all API changes and changes to `struct _Key` is clear. |
| 149 | + |
| 150 | +## Rationale |
| 151 | + |
| 152 | +- Largest memory savings among the proposed options |
| 153 | +- Option to use separate namespace argument leads to more convenient API (`KEY_NS_*` constants). |
| 154 | +- Simple internal code |
| 155 | +- Escaped name requirements can easily be solved by an additional library (e.g., `libelektra-ease`, `libelektra-extra` or new standalone library for names), because not every caller will need those functions. |
| 156 | +- Full API and internal struct layout aren't designed yet, so deciding how to store namespace is difficult. |
| 157 | + |
| 158 | +## Implications |
| 159 | + |
| 160 | +- `keyNew` needs to change |
| 161 | +- `keyName` returns unescaped name |
| 162 | +- functions for escaped name move out of core |
| 163 | + |
| 164 | +## Related Decisions |
| 165 | + |
| 166 | +## Notes |
| 167 | + |
| 168 | +### Printing unescaped name in GDB |
| 169 | + |
| 170 | +In GDB (and probably others) the unescaped name of a `Key * key` can be printed with (assuming the name is in `key->ukey` and its size in `key->keyUSize`): |
| 171 | + |
| 172 | +``` |
| 173 | +p *key->ukey@key->keyUSize |
| 174 | +``` |
| 175 | + |
| 176 | +This prints `key->ukey` as a fixed-length string of length `key->keyUSize`, e.g., for `user:/abc` it prints: |
| 177 | + |
| 178 | +``` |
| 179 | +$1 = "\006\000abc" |
| 180 | +``` |
0 commit comments