Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hathitrust error #56

Open
jwang40 opened this issue May 20, 2016 · 28 comments
Open

hathitrust error #56

jwang40 opened this issue May 20, 2016 · 28 comments

Comments

@jwang40
Copy link

jwang40 commented May 20, 2016

We noticed recently that 98% of failed_fatal service responses from Hathitrust with the exceptions like the one listed below:

:class_name: RuntimeError
:message: 'redirection forbidden: http://catalog.hathitrust.org/api/volumes/brief/json/oclc:177062440;lccn:2007042973
-> https://catalog.hathitrust.org/api/volumes/brief/json/oclc:177062440%3blccn:2007042973'
:backtrace:

  • "/usr/local/rbenv/versions/2.2.2/lib/ruby/2.2.0/open-uri.rb:224:in `open_loop'"
  • "/usr/local/rbenv/versions/2.2.2/lib/ruby/2.2.0/open-uri.rb:150:in `open_uri'"
  • "/usr/local/rbenv/versions/2.2.2/lib/ruby/2.2.0/open-uri.rb:716:in `open'"
  • "/usr/local/rbenv/versions/2.2.2/lib/ruby/2.2.0/open-uri.rb:34:in `open'"
  • "/opt/umlaut_jh/shared/bundle/ruby/2.2.0/gems/umlaut-4.1.4/app/service_adaptors/hathi_trust.rb:144:in
    `do_query'"
  • "/opt/umlaut_jh/shared/bundle/ruby/2.2.0/gems/umlaut-4.1.4/app/service_adaptors/hathi_trust.rb:73:in
    `handle'"
  • "/opt/umlaut_jh/shared/bundle/ruby/2.2.0/gems/umlaut-4.1.4/app/service_adaptors/service.rb:92:in
    `handle_wrapper'"
  • "/opt/umlaut_jh/shared/bundle/ruby/2.2.0/gems/umlaut-4.1.4/app/models/service_wave.rb:88:in
    `block (2 levels) in handle'"
@jrochkind
Copy link
Member

jrochkind commented May 20, 2016

Can you say what you mean by 98%? Do you mean 98% of all HathiTrust requests are failing? Or just that of the ones that are failing, 98% have that message? Or that of all the failed responses you get, 98% of them are HathiTrust?

Perhaps HathiTrust changed their API in some way, intentionally or intentionally. @billdueber any thoughts?

I am no longer employed in a position where I work on Umlaut, so have little time to spend on it. (Oh, hi Jing!) Not sure if @kevinreiss has much time?

But we'll definitely review and merge pull requests if you want to submit one!

@jwang40
Copy link
Author

jwang40 commented May 20, 2016

Hi, Jonathan,
Both. 98% of all HathiTrust requests are failing. and 98% of those are failing among all services are from Hathitrust with runtime error.
What is confusing is that there are 2% successful ones. For example:
https://catalog.hathitrust.org/api/volumes/brief/json/lccn:75647497
does not return bibliographic data for "British Library Journal".
However, https://catalog.hathitrust.org/api/volumes/brief/json/issn:03055167
does return bibliographic data.
I wonder whether this has anything to do with xID service. Do umlaut use xID service?

@billdueber
Copy link

None of that seems right. I'll look into it.

On Fri, May 20, 2016 at 2:27 PM, Jonathan Rochkind <[email protected]

wrote:

Can you say what you mean by 98%? Do you mean 98% of all HathiTrust
requests are failing? Or just that of the ones that are failing, 98% have
that message?

Perhaps HathiTrust changed their API in some way, intentionally or
intentionally. @billdueber https://github.com/billdueber any thoughts?

I am no longer employed in a position where I work on Umlaut, so have
little time to spend on it. (Oh, hi Jing!) Not sure if @kevinreiss
https://github.com/kevinreiss has much time?

But we'll definitely review and merge pull requests if you want to submit
one!


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#56 (comment)

Bill Dueber
Library Systems Programmer
University of Michigan Library

@billdueber
Copy link

If you have more examples, could you send a solid handful to me? Is it a
particular type of identifier?

On Fri, May 20, 2016 at 2:38 PM, jwang [email protected] wrote:

Hi, Jonathan,
Both. 98% of all HathiTrust requests are failing. and 98% of those are
failing among all services are from Hathitrust with runtime error.
What is confusing is that there are 2% successful ones. For example:
https://catalog.hathitrust.org/api/volumes/brief/json/lccn:75647497
does not return bibliographic data for "British Library Journal".
However,
https://catalog.hathitrust.org/api/volumes/brief/json/issn:03055167
does return bibliographic data.
I wonder whether this has anything to do with xID service. Do umlaut use
xID service?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#56 (comment)

Bill Dueber
Library Systems Programmer
University of Michigan Library

@jrochkind
Copy link
Member

As far as I can remember, Umlaut does not use the xID service.

Oddly, If I click on the URL yu pasted in the error message, I don't get any error. @billdueber , is it possible it's rate limiting us or something?

Umlaut may indeed have a bug, although obviously one that wasn't triggered until recently. Or HathiTrust may, that is only triggered by a few things like Umlaut.

The HathiTrust plugin code is here:
https://github.com/team-umlaut/umlaut/blob/master/app/service_adaptors/hathi_trust.rb

No xid involved.

The line in the stack trace you posted is here. @billdueber , it looks like it's making a /brief/json request, with certain search params.

Probably this one, from the error @jwang40 pasted: http://catalog.hathitrust.org/api/volumes/brief/json/oclc:177062440;lccn:2007042973

But like I said, i don't get an error myself (or a redirect, I don't think?) on that URL, so that's odd.

@billdueber
Copy link

There are at least two things going on here, I think. The first is that HT recently finished moving everything to https -- http urls automatically redirect. My guess is that's what the "redirection forbidden" is and accounts for most of your errors.

But that LCCN link you posted should totally find the right record -- it's right there in the 010:

https://catalog.hathitrust.org/Record/000544346.marc

I'll try to track it down.

@jrochkind
Copy link
Member

Aha, wait.

@billdueber , does the ; need to be URI-escaped, when it didn't previously (or even couldn't be previously)?

$ curl 'http://catalog.hathitrust.org/api/volumes/brief/json/oclc:177062440;lccn:2007042973'
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="https://catalog.hathitrust.org/api/volumes/brief/json/oclc:177062440%3blccn:2007042973">here</a>.</p>
<hr>
<address>Apache/2.4.10 (Debian) Server at catalog.hathitrust.org Port 80</address>
</body></html>

Looks like it's redirecting to an escaped version, but my client code won't follow the redirect.

If that's what it is, it's an easy fix. Just change this line to join('%3B').

@jwang40 , interested in submitting a pull request?

We perhaps also should change it to https instead of http, yes @billdueber?

@jwang40
Copy link
Author

jwang40 commented May 20, 2016

Here are more examples, some with oclc#:

:class_name: RuntimeError
:message: 'redirection forbidden: http://catalog.hathitrust.org/api/volumes/brief/json/oclc:49817552
-> https://catalog.hathitrust.org/api/volumes/brief/json/oclc:49817552'
:backtrace:

  • "/usr/local/rbenv/versions/2.2.2/lib/ruby/2.2.0/open-uri.rb:224:in `open_loop'"
  • "/usr/local/rbenv/versions/2.2.2/lib/ruby/2.2.0/open-uri.rb:150:in `open_uri'"
  • "/usr/local/rbenv/versions/2.2.2/lib/ruby/2.2.0/open-uri.rb:716:in `open'"
  • "/usr/local/rbenv/versions/2.2.2/lib/ruby/2.2.0/open-uri.rb:34:in `open'"
  • "/opt/umlaut_jh/shared/bundle/ruby/2.2.0/gems/umlaut-4.1.4/app/service_adaptors/hathi_trust.rb:144:in
    `do_query'"
  • "/opt/umlaut_jh/shared/bundle/ruby/2.2.0/gems/umlaut-4.1.4/app/service_adaptors/hathi_trust.rb:73:in
    `handle'"
  • "/opt/umlaut_jh/shared/bundle/ruby/2.2.0/gems/umlaut-4.1.4/app/service_adaptors/service.rb:92:in
    `handle_wrapper'"
  • "/opt/umlaut_jh/shared/bundle/ruby/2.2.0/gems/umlaut-4.1.4/app/models/service_wave.rb:88:in
    `block (2 levels) in handle'"

@jrochkind
Copy link
Member

Ah, then tehre's the fact it's finding 0 hits when it should be finding one. That's not even the bug being reported here, but that's bad too. I wonder if multiple-field searching is broken?

@jwang40
Copy link
Author

jwang40 commented May 20, 2016

another example with isbn

:class_name: RuntimeError
:message: 'redirection forbidden: http://catalog.hathitrust.org/api/volumes/brief/json/isbn:1852789093
-> https://catalog.hathitrust.org/api/volumes/brief/json/isbn:1852789093'
:backtrace:

  • "/usr/local/rbenv/versions/2.2.2/lib/ruby/2.2.0/open-uri.rb:224:in `open_loop'"
  • "/usr/local/rbenv/versions/2.2.2/lib/ruby/2.2.0/open-uri.rb:150:in `open_uri'"
  • "/usr/local/rbenv/versions/2.2.2/lib/ruby/2.2.0/open-uri.rb:716:in `open'"
  • "/usr/local/rbenv/versions/2.2.2/lib/ruby/2.2.0/open-uri.rb:34:in `open'"
  • "/opt/umlaut_jh/shared/bundle/ruby/2.2.0/gems/umlaut-4.1.4/app/service_adaptors/hathi_trust.rb:144:in
    `do_query'"
  • "/opt/umlaut_jh/shared/bundle/ruby/2.2.0/gems/umlaut-4.1.4/app/service_adaptors/hathi_trust.rb:73:in
    `handle'"
  • "/opt/umlaut_jh/shared/bundle/ruby/2.2.0/gems/umlaut-4.1.4/app/service_adaptors/service.rb:92:in
    `handle_wrapper'"
  • "/opt/umlaut_jh/shared/bundle/ruby/2.2.0/gems/umlaut-4.1.4/app/models/service_wave.rb:88:in
    `block (2 levels) in handle'"

@billdueber
Copy link

Clearly something on this end -- getting " Symbolic link not allowed or link target not accessible" in the error logs.

@jwang40
Copy link
Author

jwang40 commented May 20, 2016

let me know if you need more examples

@billdueber
Copy link

Hell, that was only on the dev server. So now I gotta get that fixed so I can see what's going on.

Working on it...

@billdueber
Copy link

OK. So here's what I think is happening:

  • It looks like the redirection is failing, probably because of the unesacped semicolon. I just asked and they recently upgraded the apache server to 2.something, so that might be related. It appears you can fix that on your end.
  • Some of these examples legitimately don't have records in the HT catalog (e.g, oclc:49817552)
  • I've got a ton of LCCNs with leading spaces. Since this hasn't been a problem in the past, I'm guessing the feed I'm getting from California changed. I'll change my indexing and reindex, and I'll take a look and see if I can do a quick fix of any kind.

@billdueber
Copy link

OK, can anyone provide any more examples where a search fails with zero hits but you're sure it should find something (e.g., not a redirect problem with the client, but an actual API problem on my server)?

@billdueber
Copy link

billdueber commented May 20, 2016

OK, I've pushed out a workaround where I just space-expand every lccn:val to include lccn:"_val", lccn:" __val", lccn:"___val", etc. out to five spaces. That should find everything until I get things reindexed.

@jwang40
Copy link
Author

jwang40 commented May 20, 2016

Bill,
Thanks.

@jwang40
Copy link
Author

jwang40 commented May 20, 2016

I will find more examples after we fix the redirect problem, which will exclude lots of examples legitimately don't have records in the HT catalog.

@jrochkind
Copy link
Member

jrochkind commented May 21, 2016

The problem on Umlaut's side is mainly just that it's using http when it should be using https. If it starts with https, then the request is served without a redirect, even with un-escaped ;. (I believe by standards, you aren't actually supposed to escape a ; in a query string, meant as a separator).

@jwang40 , could you try in your local app, in the umlaut_services.yml, set a key for the hathi trust adapter:

api_url: 'https://catalog.hathitrust.org/api/volumes'

If that works -- to get rid of the errors -- we can change the default in umlaut source and release a patch version. A Pull Request would be welcome, it's a very simple one-line (one-letter!) change, so if you've never done a Pull Request before, it would be a good way get familiarity with git and github PR's.

Cases where there is no error (which after this change there shouldn't ever be), but HT reports no results when it should report some, are a different problem, that can only be fixed on HT's side, and it sounds like @billdueber is on it, thanks bill!

@jwang40
Copy link
Author

jwang40 commented May 22, 2016

@jrochkind we did try what you have suggested. However, the change didn't make any difference. We changed the configuration with caching in ./config/environments/demo.rb, but still no use:
config.consider_all_requests_local = true
config.action_controller.perform_caching = false

@jrochkind
Copy link
Member

@jwang40 you made the change to umlaut_config.yml in production, but still got the exact same error message? Can you post an example of an error message you're getting after the change? Are you sure you restarted the app in production?

You should never set consdier_all_requests_local = true or perform_caching = false in production, those can both cause problems.

@jwang40
Copy link
Author

jwang40 commented May 23, 2016

@jrochkind No. all the changes were made in umlaut_demo.

@jrochkind
Copy link
Member

I'm sorry, no what? You did make the change, but still saw errors? If so, can you post an example of an error message you're getting after you made the change?

Are you sure you restarted the app after making the change to the config file? Normally it would be best to make the change on a dev machine, commit to git, and redeploy the app. If you are making changes to config files directly on the deployed machine instead, you will need to restart the app after making changes.

@jwang40
Copy link
Author

jwang40 commented May 23, 2016

@jrochkind Sorry. I meant that all the config changes, including the caching parameters, were made in umlaut_demo, not in production. We will do more testing today.

@jrochkind
Copy link
Member

Okay, as I posted, I believe the only change you should need to make is to config/umlaut_services.yml, , find the block for the hathi trust adapter, and set an api_url value (was probably not set before), as:

 api_url: 'https://catalog.hathitrust.org/api/volumes'

If you are editing the file directly on your deployment machine (not recommended), then you'll need to restart the app after the change.

Based on my current understanding, that is the only change you should need to get rid of the redirection_prohibited errors.

@jrochkind
Copy link
Member

If that does work, also like I said, if you wanted to send a pull request for changing the default in Umlaut, that would be welcome, and a very very simple thing to use to get familiar with git and pull requests.

@farooqsadiq
Copy link

Adding the api_url value to the config worked.
api_url: 'https://catalog.hathitrust.org/api/volumes'
I will submit a pull request

@kevinreiss
Copy link
Contributor

Sorry to come late to this discussion but I'm also confirming @jrochkind's suggested fix to update the api_url value to https solves the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants