-
-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Make escape/unescape functions linear and not memory-storming. #4860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Signed-off-by: Andrea Cocito <[email protected]>
Signed-off-by: Andrea Cocito <[email protected]>
Clang-Tidy fails due to an update. I have not found the time to look into it - probably just some exclusions needed. |
Oh, I see there are indeed some warnings related to the changed code. Please take a look to fix these. |
0d06a06 Should fix it |
Ok, it did not, more warnings around... will take a look in the evening. |
if (esz == s.size()) | ||
{ | ||
res = s; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you commented about this not being as clear, perhaps some use of algorithms would help
{
using CharT = typename StringType::value_type;
auto results = std::count_if(s.begin(), s.end(), [](CharT ch) { return (ch == CharT('~')) || (ch == CharT('/')); });
if (results == 0) {
return s;
}
StringType res;
res.reserve(s.size() + results);
// for loop here
return res;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did read somewhere that one of the goals was to remove the dependency on <algorithm>
in order to reduce compile times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't speak for @nlohmann here, but one person posted a PR to remove one of the 8 uses of <algorithm>
. That PR hasn't been accepted and is now marked as stale. So it isn't necessarily a goal of the project as a whole. My personal opinion is that precompiled headers and/or using the standard library through modules are the way to go for compiler performance going forward, rather than not using library algorithms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with using <algorithm>
to have readable code. If we can improve performance, we can do it later, but I think removing <algorithm>
was just a first shot at improving compilation speed, but not necessarily the most important thing.
{ | ||
++j; | ||
} | ||
auto i = j; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you commented about this not being as clear, maybe something like this would help, with some explanatory comments:
using CharT = typename StringType::value_type;
auto start = s.find(CharT('~');
if (start == StringType::npos) {
return;
}
for (auto read = s.begin() + start, write = read; read != s.end(); read++, write++) {
auto ch = *read;
if (ch == CharT('~')) {
auto next = read + 1;
if (next != s.end()) {
switch (*next) {
case CharT('0'):
ch = CharT('~');
read++;
break;
case CharT('1'):
ch = CharT('/');
read++;
break;
}
}
}
*write = ch;
}
{ | ||
if (ch == CharT('~')) | ||
{ | ||
res.append(StringType{"~0"}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constructs and then destroys a temporary string, two calls to push_back
would have better performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SHould be optimized out by any decent compiler, but I see the point, going to fix it.
} | ||
|
||
/*! | ||
* @brief string unescaping as described in RFC 6901 (Sect. 4) | ||
* @brief In Place string unescaping as described in RFC 6901 (Sect. 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in-place
} | ||
|
||
// Left out, so far we don't use it, so it just lowers test coverage | ||
// /*! | ||
// * @brief Out Of Place string unescaping as described in RFC 6901 (Sect. 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
returns a copy of a string unescaped as described...
🔴 CI is red due to unrelated issues, see #4869. |
🔴 Amalgamation check failed! 🔴The source code has not been amalgamated. @puffetto |
Please rebase to the latest develop branch as I just merged #4871. |
This pull request has been marked as stale because it has had no activity for 30 days. While we won’t close it automatically, we encourage you to update or comment if it is still relevant. Keeping pull requests active and up-to-date helps us review and merge changes more efficiently. Thank you for your contributions! |
Hello,
I am working on a project which does heavy use of JSON pointers (see #4859 ); while digging into the code I saw that the escape/unescape of JSON pointers was not linear, in example escaping "~~~~~~~~~~" requires 10 reallocations and 200 byte copies, escaping ; "~~~~~~~~~~~~~~~~~~~~" requires 20 reallocations and 800 byte copies.
I rewrote these two functions: one allocation and exactly two cycles on the key for escaping, no allocations and at most two loops for unescaping; I understand they are way less readable in this flavour but also way more efficient,
Existing tests already cover all the new code, I had to add iterators to the unit-alt-string "alt_string" class.
I tried to follow the CONTRIBUTING.md document throughly, but it's my first PR to this project, so forgive any mistake.
A.