Skip to content
This repository has been archived by the owner on Dec 25, 2024. It is now read-only.

Prevent cleanUrl from adding equal sign to query parameters #612

Closed
wants to merge 1 commit into from

Conversation

StefRe
Copy link
Contributor

@StefRe StefRe commented Feb 28, 2021

According to RFE3986 the query component may be any form, not necessarily
in the form of key=value pairs. The existing implementation of cleanUrl
added an equal sign (=) to data not in the form of key=value pairs. This
lead to infinite redirects in a HEAD request in the SLUB API and is fixed
now to leave query data without an equal sign as is.

Move tests for cleanUrl from ApacheBaseApiTest to BaseApiTest as it's a
method of the BaseApi.

see also #595 (comment)

According to RFE3986 the query component may be any form, not necessarily
in the form of key=value pairs. The existing implementation of cleanUrl
added an equal sign (=) to data not in the form of key=value pairs. This
lead to infinite redirects in a HEAD request in the SLUB API and is fixed
now to leave query data without an equal sign as is.

Move tests for cleanUrl from ApacheBaseApiTest to BaseApiTest as it's a
method of the BaseApi.
@StefRe
Copy link
Contributor Author

StefRe commented Feb 28, 2021

The problem is much more serious than I thought. While the above fix is valid and still should be merged it doesn't resolve my problem.

Everything works fine for unit tests on the JVM. But as soon as I use android the following line again adds an = sign to the end of say http://slubdd.de/katalog?libero_mab21364775:

The reason is that android uses an old implementation of URLEncodedUtils that always appends an = sign after the key and replaces null values by "":

Android version:

public static String format (
		final List <? extends NameValuePair> parameters, 
		final String encoding) {
	final StringBuilder result = new StringBuilder();
	for (final NameValuePair parameter : parameters) {
		final String encodedName = encode(parameter.getName(), encoding);
		final String value = parameter.getValue();
		final String encodedValue = value != null ? encode(value, encoding) : "";
		if (result.length() > 0)
			result.append(PARAMETER_SEPARATOR);
		result.append(encodedName);
		result.append(NAME_VALUE_SEPARATOR);
		result.append(encodedValue);
	}
	return result.toString();
}

Apache version 4.5.12 used by JVM in unit tests:

public static String format(
		final List <? extends NameValuePair> parameters,
		final char parameterSeparator,
		final String charset) {
	final StringBuilder result = new StringBuilder();
	for (final NameValuePair parameter : parameters) {
		final String encodedName = encodeFormFields(parameter.getName(), charset);
		final String encodedValue = encodeFormFields(parameter.getValue(), charset);
		if (result.length() > 0) {
			result.append(parameterSeparator);
		}
		result.append(encodedName);
		if (encodedValue != null) {
			result.append(NAME_VALUE_SEPARATOR);
			result.append(encodedValue);
		}
	}
	return result.toString();
}

I experimented a bit with useLibrary 'org.apache.http.legacy' (see this SO answer) but it didn't help. Not sure if "downloading and including HttpClient jar directly into your project" as recommended here is a good idea.

So the only workaround I could find is to not clean the url in calls to httpHead, i.e. use .url(url) instead of .url(cleanUrl(url)). This is not satisfactory at all but works as SLUB getResultById is (currently) the only code to use httpHead. Can you think of any better solution?

StefRe added a commit to StefRe/opacclient that referenced this pull request Feb 28, 2021
This is a workaround as the only possibility to add links to other detailed
items from a detailed item. It's possible as we can display volumes and
copies at the same time. References are some kind of details which is why
the display order of copies and volumes was swapped: first show volumes
(which may be references) right after the details and only then copies.
Increase volumes text view to maximum two lines high to be able to put
the title of the reference and the reference itself on different lines
for better readability.

Change resolution of bc or rsn and the new libero identifiers to following
head redirects and retrieving the final url to get the id identifier. This
requires one request more than the old method (get location parameter from
non-redirecting head request) but is much shorter to implement.

IDs of references have a single query component (e.g. ?libero_mab21364775)
and no key=value pairs. The URLEncodedUtils.format implementation used by
Android has a bug and adds a trailing equal sign (=) to the query component
which leads to infinite redirects, see opacapp#612 for details. The prevent this
we use cleanUrl only if there are any = signs in the url, otherwise we use
the raw url in head requests.
@raphaelm
Copy link
Member

raphaelm commented Mar 1, 2021

I'm a little bit worried this breaks things for other APIs… Maybe it makes sense to refactor cleanUrl into a non-static function so your API class can override it (to something else or to a no-op)?

@StefRe
Copy link
Contributor Author

StefRe commented Mar 1, 2021

Maybe it makes sense to refactor cleanUrl into a non-static function so your API class can override it (to something else or to a no-op)?

This is certainly an option, although I think a more fundamental solution would be to:

  • make cleanUrl in BaseApi just a function stub that must be implemented in derived classes
  • move the current implementation from BaseApi to ApacheBaseApi and remove all imports of org.apache.http.* from BaseApi
  • provide an implementation of cleanUrl in OkHttpBaseApi using OkHttp's HttpUrl class.

Still we don't know if some API relies on key-only parameters being in the form of ?param=&... instead of ?param&... (all existing unit tests passed but test coverage is a bit sparse for a number of classes).

So making this change in SLUBAPi instead of OkHttpBaseApi is the safest solution. This also has the additional advantage of avoiding unintentional double encoding of the query: I had to change addQueryParameter into addEncodedQueryParameter to avoid issues with spaces in the query value because httpGet once more encodes the query part using cleanUrl - if I remember correctly addQueryParameter converted spaces to + and cleanUrl then converted this + to %2B which resulted in incorrect query results (a similar problem seemed to have been here). Thats's why I'd prefer building the query with addQueryParameter in the first place and then not do any encoding behind the scenes, i.e. using a no-op implementation of cleanUrl.

After all these considerations I agree with you that it's best to make cleanUrl in BaseApi a non-static function and to use my own implementation in SLUBApi. So I'll close this PR and make a new one for just removing the static keyword. Is this OK with you?

@raphaelm
Copy link
Member

raphaelm commented Mar 1, 2021

This is certainly an option, although I think a more fundamental solution would be to:

Agreed

Still we don't know if some API relies on key-only parameters being in the form of ?param=&... instead of ?param&... (all existing unit tests passed but test coverage is a bit sparse for a number of classes).

My gut feeling says that no API relies on this, but it's really just a gut feeling and hard to tell. I'd be more catious of any API relying on some behavior that might be different with OkHttp's HttpUrl. As you mentioned, it's unfortunately really hard to test…

After all these considerations I agree with you that it's best to make cleanUrl in BaseApi a non-static function and to use my own implementation in SLUBApi. So I'll close this PR and make a new one for just removing the static keyword. Is this OK with you?

Yup!

@StefRe StefRe closed this Mar 1, 2021
@StefRe StefRe deleted the fix/cleanUrl branch March 15, 2021 08:50
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants