-
Notifications
You must be signed in to change notification settings - Fork 7
CS Performance Optimization research
We've discovered a number of areas for improving performance in CommitStream. We quickly saw improvements, after we did two patches for the first issue we found, eventStoreClient.js
was firing exceptions off into the ether as a result of how we were calling statusCodeValidator
. But, after we looked deeper and spoke with Prashanth Madi of Microsoft, we realized there was more we can and should do right now.
These discoveries are not in order of discovery, but in order of ease of implementation and incremental benefit, with emphasis on mitigating risk and maximizing opportunity to observe under real-world conditions before applying successive updates.
In csErrorHandler.js
we have this code:
exception: util.inspect(err, {
showHidden: true,
depth: null
}).substr(0, 5000),
Unfortunately, since err is not a simple, limited-depth object, this is a very expensive call that builds up a gigantic string, hence the .substr(0, 5000)
to take just a small portion of it.
This test runs N processes that posts a Bitbucket pull-request M times. Note that Bitbucket pull-request generates a thrown exception, since we do not process such payloads. So, this is testing the ability to process errors, which has been the source of latency and occassional process terminiation in production.
N | M | With util.inspect(err, ...) ? |
Elapsed Time |
---|---|---|---|
1 | 50 | Y | 0m13.832s |
5 | 50 | Y | 10m36.091s |
1 | 50 | N | 0m3.744s |
5 | 50 | N | 0m6.321s |
15 | 50 | N | 0m13.459s |
40 | 50 | N | 0m33.717s |
80 | 50 | N | 1m6.550s |
20 | 150 | N | 0m53.930s |
Elapsed time results for running different types of commands in parallel:
time ./runner.sh 20 "./query-real-instance.sh 150"
-> 1m27.087s
time ./runner.sh 20 "./post-bitbucket-pull-request.sh 150"
-> 1m40.065s
time ./runner.sh 20 "./query-non-existent-instance.sh 150"
-> 8m16.150s
time ./runner.sh 20 "./post-new-digest.sh 150"
-> 3m29.306s
However, a little counter in csErrorHandler
revealed that it was only called 5947 times, when we would have expected an even 6000 times because of the pull-request errors and the non-existent instance errors.
With csErrorHandler change plus catchAsyncErrors
around the middleware validators and commitsCreate
and digestCreate
, without domain-middleware
, without Promise.onPossiblyUnresolveRejection
We didn't discover the util.inspect
bottleneck until 5/16. Prior to that, we thought the other changes in this heading were what we needed. (TODO: explain these other changes above or below)
So, combining all of those plus the util.inspect
change produces this:
time ./runner.sh 20 "./query-real-instance.sh 150"
-> 2m20.092s
time ./runner.sh 20 "./post-bitbucket-pull-request.sh 150"
-> 2m30.510s
time ./runner.sh 20 "./query-non-existent-instance.sh 150"
-> 2m22.575s
time ./runner.sh 20 "./post-new-digest.sh 150"
-> 3m56.215s
While some of those took longer, there is no inordinate slow down for the query-non-existent-instance.sh
script like above. Most importantly, the counter in csErrorHandler
showed an even 6000 calls!
A second trial:
time ./runner.sh 20 "./query-real-instance.sh 150"
-> 2m15.562s
time ./runner.sh 20 "./post-bitbucket-pull-request.sh 150"
-> 2m27.745s
time ./runner.sh 20 "./query-non-existent-instance.sh 150"
-> 2m17.538s
time ./runner.sh 20 "./post-new-digest.sh 150"
-> 3m53.232s
This shows consistency, and as before, the csErrorHandler
was called exactly 6000 times.
- Use catchAsyncErrors around all validator middleware called in
server.js
- Return the eventStoreClient promise from each of the validator middlewares
- Use catchAsyncErrors around
digestCreate
module indigestsController.js
These can be done at any time, independent of the deeper technical work listed below. They are in order of most simple to least simple.
- CS - Optimization: Remove outdated iisnode.yaml settings
- CS Optimization: Serve static files from IIS CS Optimization: Serve static files from IIS
- S-62662: Remove the offensive
util.inspect
line fromcsErrorHandler
-- This gives us the biggest bang for the least work, dramatically reducing the amount of time the Node process needs to create logs incsErrorHandler
- S-62662: Make
instanceAuthenticator
,instanceToDigestValidator
, andinstanceToInboxValidator
return Promises, and wrap them in server.js withcatchAsyncErrors
- TODO (in S-62662): Make
commitsCreate
return its Promise and utilizecatchAsyncErrors
around it, since this is where the vast majority of production-errors take place due to unrecognized payloads, such as Bitbucket Pull-Request - Upgrade Node and NPM version together -- may require some fancy manual removal of
node_modules
folder at the Kudu Console level in order for Azure to re-fetch the modules for the appropriate Node and NPM levels. Crazy, I know.- TODO: combine the two existing stories into one
- Discovery: Will
csErrorHandler
get called when exceptions thrown in non-async code from route handlers, or must we wrap the entire handler function in a Promise?- Can determine by forcing exceptions in code prior to making calls to
eventStoreClient.blah().then(...)
- Can determine by forcing exceptions in code prior to making calls to
- Discovery: Which handlers in
api/instances
can/should return Promises?- Create Stories for returning Promises and wrapping with
catchAsyncErrors
for each
- Create Stories for returning Promises and wrapping with
- Discovery: Which handlers in
api/digests
can/should return Promises?- Create Stories for returning Promises and wrapping with
catchAsyncErrors
for each
- Create Stories for returning Promises and wrapping with
- Discovery: Which additional handlers in
api/inboxes
can/should return Promises?- Create Stories for returning Promises and wrapping with
catchAsyncErrors
for each
- Create Stories for returning Promises and wrapping with
- Discovery: Which handlers in
api/health
can/should return Promises?- Create Stories for returning Promises and wrapping with
catchAsyncErrors
for each
- Create Stories for returning Promises and wrapping with
- Discovery: Which modules in
api/helpers
can/should return Promises?- Create Stories for returning Promises and wrapping with
catchAsyncErrors
for each
- Create Stories for returning Promises and wrapping with
- Discovery: Which modules in
api/translators
can/should return Promises?- Create Stories for returning Promises and wrapping with
catchAsyncErrors
for each
- Create Stories for returning Promises and wrapping with
- Discovery: Is there anywhere else in the app that would benefit from Promisizing or not?
- You know the drill by now...
Once we've discovered each of the appropriately Promisizable handlers and modules, each one can be played in isolation of other changes and released as a patch after appropriate regression testing and performance testing.
Each of these could be done as a Tech Chore item or just done in the line of ordinary delivery one by one. Each one should be small enough that it does not take a whole day to complete. (But, I'll defer estimation promises here!)
All of the changes above can be done without the removal of express-domain-middleware
. Indeed the first table of results where the only change was removing util.inspect(err, ...)
was calculated with express-domain-middleware
still in place. It's important to do it this way so that we do not have to do a big-bang-all-at-once upgrade of all the route handlers and modules. Thus, if we do a push that contains Promise improvements and catchAsyncErrors for routes in /api/instances
only, then errors generated on those routes will be caught by the wrapper, and will be delivered directly to csErrorModule
without going through express-domain-middleware
.
Once we've thoroughly discovered and Promisized / wrapped al the appropriate handlers and modules, then removing this will lay it to rest safely.