app-manager adding 'Connection: close' to response headers of client simulation requests #4

flawmop · 2023-05-05T21:28:32Z

Currently, app-manager is instructing client to close the HTTP connection each time a simulation is invoked, rather than keep it alive, because that was the way it was designed(*) prior to CellML and PKPD being allowed in ap-nimbus. This behaviour is described in server.js, and can be overridden "on-demand" by calling app-manager with the keep_alive URL param, e.g. by changing the env file property AP_PREDICT_ENDPOINT=http://name-app-manager:8080/keep_alive.

I've been running some tests locally and it looks like Django perhaps unsurprisingly has a problem with the disconnecting when passing CellML files to app-manager (although it may also happen with PKPD files). What seems to be happening is that one or two simulations may run successfully, but thereafter they'll start to fail after first displaying a status of "Converting CellML...". I'd be interested to know @MauriceHendrix if you've experienced something like this while you were testing, as Django doesn't appear to behave consistently, i.e. sometimes it may fail on the first simulation, but usually it's within the first five simulations!

I've tried to get some Django trace-level logging set up (to add to the other logging task) to see exactly what Django's doing and how it responds (because in theory it should complain if it can't send the complete HTTP request), but it appears that trace level debugging is not an option, only some basic debug-level request info appears in the logs, which doesn't tell us much.

As we are not going down the route of multiple app-manager availability, I think we may as well just comment out

ap-nimbus-app-manager/server.js

Lines 951 to 954 in 3be2ea2

    
           if (!keep_alive) { 
        
             console.log('DEBUG : Adding \'Connection: close\' to response headers'); 
        
             response_headers.Connection = 'close'; 
        
           }

and add a comment referring to this issue.

(*) Originally, it was expected that client would call app-manager with an ApPredict invocation instruction and app-manager would respond to the client with a unique identifier and the HTTP header instruction to close the connection (because the expectation was that there may be many app-manager s available to be serviced and we didn't want sticky connections between a client and a single app-manager). Thereafter client and app-manager would communicate only with an independent datastore, to retrieve/send results respectively using the unique identifier.

The text was updated successfully, but these errors were encountered:

MauriceHendrix · 2023-05-06T08:57:25Z

Although the portal has been tested by a good number of people I don't think may upload cellml files. I haven't seen it myself. I didn't realise that the app-manager doesn't close it's connections, I don't see how that makes any sense to keep them open.

flawmop · 2023-05-09T09:51:28Z

Currently the app-manager is instructing client to close the connection in the response to the simulation request (my understanding is because generally opening connections takes a bit of resource and are therefore kept open/"alive" for a short period (e.g. 90 seconds?) in HTTP in anticipation of another request arriving from the same client).
What I think may cause a problem is in the case of sending a v. large file in a request which a client will break down into chunks and so it may get a bit huffy if it's told to close the connection after sending the first chunk.

MauriceHendrix · 2023-05-10T08:31:11Z

I found a page about tuning keepalive
It reads like we'd need to set the keep alive and header timeouts just to make sure that the header timeout is greater..

If you're looking to use longer timeouts, or you're using a version of agentkeepalive earlier than 4.2.0, you can  edit your Node.js app default timeouts to be larger as follows:

const express = require("express");
const http = require("http");
const app = express();

const server = http.createServer({}, app).listen(3000);

// This is the important stuff
server.keepAliveTimeout = (60 * 1000) + 1000;
server.headersTimeout = (60 * 1000) + 2000;

flawmop added the bug Something isn't working label May 5, 2023

flawmop assigned flawmop and MauriceHendrix May 5, 2023

MauriceHendrix mentioned this issue May 10, 2023

Adjust timeouts for connections to prevent errors #5

Merged

MauriceHendrix closed this as completed in #5 May 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

app-manager adding 'Connection: close' to response headers of client simulation requests #4

app-manager adding 'Connection: close' to response headers of client simulation requests #4

flawmop commented May 5, 2023 •

edited

Loading

MauriceHendrix commented May 6, 2023

flawmop commented May 9, 2023

MauriceHendrix commented May 10, 2023

app-manager adding 'Connection: close' to response headers of client simulation requests #4

app-manager adding 'Connection: close' to response headers of client simulation requests #4

Comments

flawmop commented May 5, 2023 • edited Loading

MauriceHendrix commented May 6, 2023

flawmop commented May 9, 2023

MauriceHendrix commented May 10, 2023

flawmop commented May 5, 2023 •

edited

Loading