Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GATEWAY_TIMEOUT encountered when building full GFE for release 3470 #153

Open
chrisammon3000 opened this issue Feb 1, 2022 · 6 comments

Comments

@chrisammon3000
Copy link

seq-ann 1.1.0 threw an error during the gfe-db build job with the following stacktrace:

2022-02-01 02:36:08 - Logger.seqann.gfe - INFO - GFE = HLA-DPB1w3-2-160-462-223-334-814-127-41-1-25
--
2022-02-01 02:36:08 - root - INFO - Getting GFE data for allele HLA18552.2...
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/seqann/gfe.py", line 244, in get_gfe
feature = self.api.create_feature(body=request)
File "/usr/local/lib/python3.8/site-packages/seqann/feature_client/apis/features_api.py", line 77, in create_feature
(data) = self.create_feature_with_http_info(**kwargs)
File "/usr/local/lib/python3.8/site-packages/seqann/feature_client/apis/features_api.py", line 142, in create_feature_with_http_info
return self.api_client.call_api(resource_path, 'POST',
File "/usr/local/lib/python3.8/site-packages/seqann/feature_client/api_client.py", line 330, in call_api
return self.__call_api(resource_path, method,
File "/usr/local/lib/python3.8/site-packages/seqann/feature_client/api_client.py", line 154, in __call_api
response_data = self.request(method, url,
File "/usr/local/lib/python3.8/site-packages/seqann/feature_client/api_client.py", line 365, in request
return self.rest_client.POST(url,
File "/usr/local/lib/python3.8/site-packages/seqann/feature_client/rest.py", line 212, in POST
return self.request("POST", url,
File "/usr/local/lib/python3.8/site-packages/seqann/feature_client/rest.py", line 184, in request
raise ApiException(http_resp=r)
seqann.feature_client.rest.ApiException: (504)
Reason: GATEWAY_TIMEOUT
HTTP response headers: HTTPHeaderDict({'Content-Length': '0', 'Connection': 'keep-alive'})
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./src/app.py", line 580, in <module>
gfe = gfe_from_allele(
File "./src/app.py", line 371, in gfe_from_allele
features, gfe = gfe_maker.get_gfe(ann, locus)
File "/usr/local/lib/python3.8/site-packages/seqann/gfe.py", line 248, in get_gfe
self.logger.error(self.logname + "Exception when calling DefaultApi->create_feature %e" + e)
TypeError: can only concatenate str (not "ApiException") to str

This occurred in the context of a full build for all alleles for version 3470.

Here is the call in the script that failed:
https://github.com/abk7777/gfe-db/blob/135d179e1f9295c5b8b72bfbd61d8789db30f0e2/gfe-db/pipeline/jobs/build/src/app.py#L580-L582

Using these dependencies:

py-ard==0.6.11
py-gfe==1.1.5
@chrisammon3000
Copy link
Author

@pbashyal-nmdp
I recently updated to the latest py-ard and py-gfe but I'm not sure if that is what caused the issue.

@pbashyal-nmdp
Copy link
Contributor

Looks like it timed out with GATEWAY_TIMEOUT when calling the feature service. The libraries shouldn't affect feature service calls.

Does retrying work ?

@chrisammon3000
Copy link
Author

Retrying as the same effect. I ran it in a debugger and it gave me this for the exact same allele HLA18552.2:

Exception has occurred: TypeError
can only concatenate str (not "ApiException") to str

During handling of the above exception, another exception occurred:

  File "/Users/ammon/Documents/00-Projects/nmdp-bioinformatics/02-Repositories/gfe-db/gfe-db/pipeline/jobs/build/src/app.py", line 372, in gfe_from_allele
    features, gfe = gfe_maker.get_gfe(ann, locus)
  File "/Users/ammon/Documents/00-Projects/nmdp-bioinformatics/02-Repositories/gfe-db/gfe-db/pipeline/jobs/build/src/app.py", line 581, in <module>
    gfe = gfe_from_allele(

The same error is also in the original error message. I'll try working with seq-ann and making individual API calls to the feature service with this allele to see if I can get to the bottom of it.

@chrisammon3000
Copy link
Author

When I built this one allele on its own it worked fine, no timeout encountered. I wonder if gfe-db is overloading the feature service API during the build. I wouldn't be surprised because the build proceeds extremely rapidly processing ~30,000 alleles in 15-20 minutes, I think that's around 20-25 alleles per second. That places a constant, very high load for on the API for the duration of the build.

For alleles that encounter timeouts, I think it might be a good approach to decouple the retry mechanism from the build in gfe-db. I did some math and even minimal retry could drastically increase the build time and cost if even 20 alleles fail out of ~30,000. Decoupling the retry logic would also make it easier to set an alarm threshold if lots of alleles are failing. I'll follow up on this in a separate issue for gfe-db.

@pbashyal-nmdp
Copy link
Contributor

Yes, feature service is getting overloaded. I'll look into adding a caching layer for the service. That should help with other uses as well.

@chrisammon3000
Copy link
Author

I guess another option would be to rate limit requests, but this will increase the build time. The current build is done on a c5d.2xlarge at $0.384 per hour and it takes around 20 minutes, so it could go a lot longer before cost becomes any kind of an issue.

Honestly I think rate-limiting might be easier to implement than caching on the API side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants