Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse URLs using boost::Url #1417

Merged
merged 17 commits into from
Sep 26, 2024

Conversation

Qup42
Copy link
Member

@Qup42 Qup42 commented Jul 25, 2024

This PR cleans up the parsing of HTTP Requests and switches to boost::Url for URL parsing.
This should fix issues with the following types of HTTP requests:

  • POST requests that additionally have query parameters as part of the URL
  • POST requests that store the unescaped query in the query body (there previously was a bug in this case that tried to unescape and already unescaped query.

The only (known) limitation that remains for now is that duplicate HTTP parameters throw an error, because we are not yet sure about their semantics.

Copy link

codecov bot commented Jul 25, 2024

Codecov Report

Attention: Patch coverage is 70.58824% with 15 lines in your changes missing coverage. Please review.

Project coverage is 92.51%. Comparing base (370cb7d) to head (3ddc490).
Report is 14 commits behind head on master.

Files with missing lines Patch % Lines
src/engine/Server.cpp 58.33% 15 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1417      +/-   ##
==========================================
- Coverage   94.13%   92.51%   -1.62%     
==========================================
  Files         347      353       +6     
  Lines       25636    26487     +851     
  Branches     3450     3551     +101     
==========================================
+ Hits        24132    24504     +372     
- Misses       1462     1941     +479     
  Partials       42       42              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An opportunity to also fix another bug.

The real solution is to completely handle the URL parsing using Boos::Url which wasn't yet released at the time we implemented this rather hacky URL parser.

src/engine/Server.cpp Outdated Show resolved Hide resolved
src/util/http/UrlParser.h Outdated Show resolved Hide resolved
@Qup42 Qup42 marked this pull request as draft July 29, 2024 17:53
src/util/http/UrlParser.h Outdated Show resolved Hide resolved
src/engine/Server.cpp Outdated Show resolved Hide resolved
@Qup42 Qup42 marked this pull request as ready for review July 31, 2024 13:23
Copy link

sonarcloud bot commented Jul 31, 2024

@Qup42 Qup42 changed the title Ignore HTTP Query parameters for POST Requests Parse URLs using boost::Url Sep 15, 2024
@Qup42 Qup42 requested a review from joka921 September 17, 2024 18:30
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this should fix some longstanding bugs and makes the code much easier to read.

One major thing:
We should have unit tests for this code
(for the UrlParser, as well as for the (is it static or can it be static) method in Server.cpp that does the dispatch between GET and the various types of POST.

src/util/http/UrlParser.h Outdated Show resolved Hide resolved
src/util/http/UrlParser.h Outdated Show resolved Hide resolved
src/util/http/UrlParser.cpp Outdated Show resolved Hide resolved
src/util/http/UrlParser.cpp Outdated Show resolved Hide resolved
src/util/http/UrlParser.cpp Outdated Show resolved Hide resolved
src/engine/Server.cpp Outdated Show resolved Hide resolved
src/engine/Server.cpp Outdated Show resolved Hide resolved
@Qup42
Copy link
Member Author

Qup42 commented Sep 19, 2024

Some basic tests are now present. They will be extended a little bit more.

  • A thought with Update in mind: I would advocate for having a type level distinction for Queries and Updates (be it with a variant or different fields). With this distinction we can make sure that Updates and Queries are not confused. E.g. passing something in via /?query=... which actually is an Update. This might be helpful if one wants to restrict updates later on. It can also be done after parsing the queries, but I think doing it earlier is less error prone. The standard also don't specify this confusion of queries and updates.
  • To my knowledge multiple graphs generally work but named graphs are not finished yet. default-graph-uri and named-graph-uri can be passed in via the HTTP request itself. Imo this lends itself to a follow-up PR now or some time later. The two parameters can be specified multiple times. Currently parameters can only be specified once due to the usage of a set. Then the parameters should of course also be processed.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to look really nice.
I still have some suggestions and some questions, but we are definitely getting there.

src/engine/Server.cpp Outdated Show resolved Hide resolved
src/util/http/UrlParser.h Outdated Show resolved Hide resolved
src/util/http/UrlParser.h Outdated Show resolved Hide resolved
test/UrlParserTest.cpp Outdated Show resolved Hide resolved
test/UrlParserTest.cpp Show resolved Hide resolved
test/UrlParserTest.cpp Outdated Show resolved Hide resolved
src/util/http/UrlParser.h Outdated Show resolved Hide resolved
test/ServerTest.cpp Outdated Show resolved Hide resolved
test/ServerTest.cpp Show resolved Hide resolved
Comment on lines 49 to 50
// TODO `default-graph-uri` and `named-graph-uri` can appear multiple times.
// This currently throws an exception and is not considered.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, So this means that in HTTP there can be duplicate parameters in general, and the semantics depend on whatever the service does.
So probably, the value type of the parameters should not be string but HashSet<string> or something like this.
Then the consuming code (in this case the Server.cpp has to check, whiich of these can and cannot appear multiple times. At least add a comment in the respective code in the Parser, s.t. we can revisit it at a later time (I currently am working on the named graph support).

Copy link
Member Author

@Qup42 Qup42 Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specs (RFC and similar) actually don't specify semantics for multiple values for a query parameter.
Probably a topic for the followup PR, but anyways: What would be the reason against simply using a multiset instead of a set here? Is it that abseil does not provide a multiset and we'd have to resort to std::multiset?
This comment is just a temporary note for me. For now I'd add some TODOs in appropriate places and warnings to the result if these parameters are used.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a small single comment left, address it and we are ready to merge.

test/ServerTest.cpp Outdated Show resolved Hide resolved
Copy link

sonarcloud bot commented Sep 20, 2024

@Qup42 Qup42 requested a review from joka921 September 24, 2024 15:37
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for improving the URL parser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants