You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, app-manager is instructing client to close the HTTP connection each time a simulation is invoked, rather than keep it alive, because that was the way it was designed(*) prior to CellML and PKPD being allowed in ap-nimbus. This behaviour is described in server.js, and can be overridden "on-demand" by calling app-manager with the keep_alive URL param, e.g. by changing the env file property AP_PREDICT_ENDPOINT=http://name-app-manager:8080/keep_alive.
I've been running some tests locally and it looks like Django perhaps unsurprisingly has a problem with the disconnecting when passing CellML files to app-manager (although it may also happen with PKPD files). What seems to be happening is that one or two simulations may run successfully, but thereafter they'll start to fail after first displaying a status of "Converting CellML...". I'd be interested to know @MauriceHendrix if you've experienced something like this while you were testing, as Django doesn't appear to behave consistently, i.e. sometimes it may fail on the first simulation, but usually it's within the first five simulations!
I've tried to get some Django trace-level logging set up (to add to the other logging task) to see exactly what Django's doing and how it responds (because in theory it should complain if it can't send the complete HTTP request), but it appears that trace level debugging is not an option, only some basic debug-level request info appears in the logs, which doesn't tell us much.
As we are not going down the route of multiple app-manager availability, I think we may as well just comment out
console.log('DEBUG : Adding \'Connection: close\' to response headers');
response_headers.Connection='close';
}
and add a comment referring to this issue.
(*) Originally, it was expected that client would call app-manager with an ApPredict invocation instruction and app-manager would respond to the client with a unique identifier and the HTTP header instruction to close the connection (because the expectation was that there may be many app-manager s available to be serviced and we didn't want sticky connections between a client and a single app-manager). Thereafter client and app-manager would communicate only with an independent datastore, to retrieve/send results respectively using the unique identifier.
The text was updated successfully, but these errors were encountered:
Although the portal has been tested by a good number of people I don't think may upload cellml files. I haven't seen it myself. I didn't realise that the app-manager doesn't close it's connections, I don't see how that makes any sense to keep them open.
Currently the app-manager is instructing client to close the connection in the response to the simulation request (my understanding is because generally opening connections takes a bit of resource and are therefore kept open/"alive" for a short period (e.g. 90 seconds?) in HTTP in anticipation of another request arriving from the same client).
What I think may cause a problem is in the case of sending a v. large file in a request which a client will break down into chunks and so it may get a bit huffy if it's told to close the connection after sending the first chunk.
I found a page about tuning keepalive
It reads like we'd need to set the keep alive and header timeouts just to make sure that the header timeout is greater..
If you're looking to use longer timeouts, or you're using a version of agentkeepalive earlier than 4.2.0, you can edit your Node.js app default timeouts to be larger as follows:
const express = require("express");
const http = require("http");
const app = express();
const server = http.createServer({}, app).listen(3000);
// This is the important stuff
server.keepAliveTimeout = (60 * 1000) + 1000;
server.headersTimeout = (60 * 1000) + 2000;
Currently, app-manager is instructing client to close the HTTP connection each time a simulation is invoked, rather than keep it alive, because that was the way it was designed(*) prior to CellML and PKPD being allowed in ap-nimbus. This behaviour is described in server.js, and can be overridden "on-demand" by calling app-manager with the
keep_alive
URL param, e.g. by changing the env file propertyAP_PREDICT_ENDPOINT=http://name-app-manager:8080/keep_alive
.I've been running some tests locally and it looks like Django perhaps unsurprisingly has a problem with the disconnecting when passing CellML files to app-manager (although it may also happen with PKPD files). What seems to be happening is that one or two simulations may run successfully, but thereafter they'll start to fail after first displaying a status of "Converting CellML...". I'd be interested to know @MauriceHendrix if you've experienced something like this while you were testing, as Django doesn't appear to behave consistently, i.e. sometimes it may fail on the first simulation, but usually it's within the first five simulations!
I've tried to get some Django trace-level logging set up (to add to the other logging task) to see exactly what Django's doing and how it responds (because in theory it should complain if it can't send the complete HTTP request), but it appears that trace level debugging is not an option, only some basic debug-level request info appears in the logs, which doesn't tell us much.
As we are not going down the route of multiple app-manager availability, I think we may as well just comment out
ap-nimbus-app-manager/server.js
Lines 951 to 954 in 3be2ea2
(*) Originally, it was expected that client would call app-manager with an ApPredict invocation instruction and app-manager would respond to the client with a unique identifier and the HTTP header instruction to close the connection (because the expectation was that there may be many app-manager s available to be serviced and we didn't want sticky connections between a client and a single app-manager). Thereafter client and app-manager would communicate only with an independent datastore, to retrieve/send results respectively using the unique identifier.
The text was updated successfully, but these errors were encountered: