Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

app-manager adding 'Connection: close' to response headers of client simulation requests #4

Closed
flawmop opened this issue May 5, 2023 · 3 comments · Fixed by #5
Closed
Assignees
Labels
bug Something isn't working

Comments

@flawmop
Copy link
Member

flawmop commented May 5, 2023

Currently, app-manager is instructing client to close the HTTP connection each time a simulation is invoked, rather than keep it alive, because that was the way it was designed(*) prior to CellML and PKPD being allowed in ap-nimbus. This behaviour is described in server.js, and can be overridden "on-demand" by calling app-manager with the keep_alive URL param, e.g. by changing the env file property AP_PREDICT_ENDPOINT=http://name-app-manager:8080/keep_alive.

I've been running some tests locally and it looks like Django perhaps unsurprisingly has a problem with the disconnecting when passing CellML files to app-manager (although it may also happen with PKPD files). What seems to be happening is that one or two simulations may run successfully, but thereafter they'll start to fail after first displaying a status of "Converting CellML...". I'd be interested to know @MauriceHendrix if you've experienced something like this while you were testing, as Django doesn't appear to behave consistently, i.e. sometimes it may fail on the first simulation, but usually it's within the first five simulations!

I've tried to get some Django trace-level logging set up (to add to the other logging task) to see exactly what Django's doing and how it responds (because in theory it should complain if it can't send the complete HTTP request), but it appears that trace level debugging is not an option, only some basic debug-level request info appears in the logs, which doesn't tell us much.

As we are not going down the route of multiple app-manager availability, I think we may as well just comment out

if (!keep_alive) {
console.log('DEBUG : Adding \'Connection: close\' to response headers');
response_headers.Connection = 'close';
}
and add a comment referring to this issue.

(*) Originally, it was expected that client would call app-manager with an ApPredict invocation instruction and app-manager would respond to the client with a unique identifier and the HTTP header instruction to close the connection (because the expectation was that there may be many app-manager s available to be serviced and we didn't want sticky connections between a client and a single app-manager). Thereafter client and app-manager would communicate only with an independent datastore, to retrieve/send results respectively using the unique identifier.

@flawmop flawmop added the bug Something isn't working label May 5, 2023
@MauriceHendrix
Copy link
Contributor

Although the portal has been tested by a good number of people I don't think may upload cellml files. I haven't seen it myself. I didn't realise that the app-manager doesn't close it's connections, I don't see how that makes any sense to keep them open.

@flawmop
Copy link
Member Author

flawmop commented May 9, 2023

Currently the app-manager is instructing client to close the connection in the response to the simulation request (my understanding is because generally opening connections takes a bit of resource and are therefore kept open/"alive" for a short period (e.g. 90 seconds?) in HTTP in anticipation of another request arriving from the same client).
What I think may cause a problem is in the case of sending a v. large file in a request which a client will break down into chunks and so it may get a bit huffy if it's told to close the connection after sending the first chunk.

@MauriceHendrix
Copy link
Contributor

I found a page about tuning keepalive
It reads like we'd need to set the keep alive and header timeouts just to make sure that the header timeout is greater..

If you're looking to use longer timeouts, or you're using a version of agentkeepalive earlier than 4.2.0, you can  edit your Node.js app default timeouts to be larger as follows:

const express = require("express");
const http = require("http");
const app = express();

const server = http.createServer({}, app).listen(3000);

// This is the important stuff
server.keepAliveTimeout = (60 * 1000) + 1000;
server.headersTimeout = (60 * 1000) + 2000;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants