Skip to content

Conversation

puffetto
Copy link
Contributor

Hello,

I am working on a project which does heavy use of JSON pointers (see #4859 ); while digging into the code I saw that the escape/unescape of JSON pointers was not linear, in example escaping "~~~~~~~~~~" requires 10 reallocations and 200 byte copies, escaping ; "~~~~~~~~~~~~~~~~~~~~" requires 20 reallocations and 800 byte copies.

I rewrote these two functions: one allocation and exactly two cycles on the key for escaping, no allocations and at most two loops for unescaping; I understand they are way less readable in this flavour but also way more efficient,

Existing tests already cover all the new code, I had to add iterators to the unit-alt-string "alt_string" class.

I tried to follow the CONTRIBUTING.md document throughly, but it's my first PR to this project, so forgive any mistake.

A.

@coveralls
Copy link

coveralls commented Jul 28, 2025

Coverage Status

coverage: 99.177% (-0.01%) from 99.19%
when pulling 6873f63 on puffetto:develop
into d33ecd3 on nlohmann:develop.

@nlohmann
Copy link
Owner

Clang-Tidy fails due to an update. I have not found the time to look into it - probably just some exclusions needed.

@nlohmann
Copy link
Owner

Clang-Tidy fails due to an update. I have not found the time to look into it - probably just some exclusions needed.

Oh, I see there are indeed some warnings related to the changed code. Please take a look to fix these.

@puffetto
Copy link
Contributor Author

Clang-Tidy fails due to an update. I have not found the time to look into it - probably just some exclusions needed.

Oh, I see there are indeed some warnings related to the changed code. Please take a look to fix these.

0d06a06 Should fix it

@puffetto
Copy link
Contributor Author

0d06a06 Should fix it

Ok, it did not, more warnings around... will take a look in the evening.

if (esz == s.size())
{
res = s;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you commented about this not being as clear, perhaps some use of algorithms would help

{
  using CharT = typename StringType::value_type;
  auto results = std::count_if(s.begin(), s.end(), [](CharT ch) { return (ch == CharT('~')) || (ch == CharT('/')); });

  if (results == 0) {
    return s;
  }

  StringType res;
  res.reserve(s.size() + results);

  // for loop here

  return res;
}

Copy link
Contributor Author

@puffetto puffetto Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did read somewhere that one of the goals was to remove the dependency on <algorithm> in order to reduce compile times.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't speak for @nlohmann here, but one person posted a PR to remove one of the 8 uses of <algorithm>. That PR hasn't been accepted and is now marked as stale. So it isn't necessarily a goal of the project as a whole. My personal opinion is that precompiled headers and/or using the standard library through modules are the way to go for compiler performance going forward, rather than not using library algorithms.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with using <algorithm> to have readable code. If we can improve performance, we can do it later, but I think removing <algorithm> was just a first shot at improving compilation speed, but not necessarily the most important thing.

{
++j;
}
auto i = j;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you commented about this not being as clear, maybe something like this would help, with some explanatory comments:

  using CharT = typename StringType::value_type;
  auto start = s.find(CharT('~');

  if (start == StringType::npos) {
    return;
  }

  for (auto read = s.begin() + start, write = read; read != s.end(); read++, write++) {
    auto ch = *read;

    if (ch == CharT('~')) {
      auto next = read + 1;

      if (next != s.end()) {
        switch (*next) {
          case CharT('0'):
            ch = CharT('~');
            read++;
            break;
          case CharT('1'):
            ch = CharT('/');
            read++;
            break;
        }
      }
    }

    *write = ch;
  } 

{
if (ch == CharT('~'))
{
res.append(StringType{"~0"});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructs and then destroys a temporary string, two calls to push_back would have better performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHould be optimized out by any decent compiler, but I see the point, going to fix it.

}

/*!
* @brief string unescaping as described in RFC 6901 (Sect. 4)
* @brief In Place string unescaping as described in RFC 6901 (Sect. 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in-place

}

// Left out, so far we don't use it, so it just lowers test coverage
// /*!
// * @brief Out Of Place string unescaping as described in RFC 6901 (Sect. 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returns a copy of a string unescaped as described...

@nlohmann
Copy link
Owner

🔴 CI is red due to unrelated issues, see #4869.

Copy link

🔴 Amalgamation check failed! 🔴

The source code has not been amalgamated. @puffetto
Please read and follow the Contribution Guidelines.

@nlohmann
Copy link
Owner

Please rebase to the latest develop branch as I just merged #4871.

@nlohmann nlohmann added the please rebase Please rebase your branch to origin/develop label Jul 31, 2025
Copy link

This pull request has been marked as stale because it has had no activity for 30 days. While we won’t close it automatically, we encourage you to update or comment if it is still relevant. Keeping pull requests active and up-to-date helps us review and merge changes more efficiently. Thank you for your contributions!

@github-actions github-actions bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Aug 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
L please rebase Please rebase your branch to origin/develop state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants