new production system is twitchy about running the monitor status job #27

elrayle · 2022-05-13T21:39:08Z

Normal process

See Monitoring Connections to Authorities to understand the normal processing.

Current problem

This process is being twitchy. It runs sometimes and not others. We were not able to pin down why. It worked great in the previous production system for years.

Potential future work

Production is currently configured to run jobs :async. If you set up sidekiq or some other job system, you can update production environment to run in background.
If you switch to background jobs, the Pingdom process should still work. Instead of getting a time out, the page will return an error on the 4:10 am load if one of the authorities is failing. The slight change is that you will no longer see the down/up pattern. You will only see a down when an authority is failing.

elrayle · 2022-05-13T21:45:11Z

I tried the following for debugging.

Redeploy the system - This seemed to work sometimes and have no affect other times. And redeploying has its own issues where the system doesn't come back up sometimes.
Turn off performance calculations by turning off performance related displays in the production env file in S3. The calculations are very processor intensive.

DISPLAY_PERFORMANCE_GRAPH=false
DISPLAY_PERFORMANCE_DATATABLE=false

NOTE: I've never gotten Graph generation to work in Elastic Beanstalk even though they run fine on my laptop. I wasn't able to test them in the new setup because of this issue.

I reset the preferred time zone to a time that expired the cache during the day when I could watch it fail. Only it never failed when I did that. This could mean there is something unique happening during the night that is causing the failure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new production system is twitchy about running the monitor status job #27

new production system is twitchy about running the monitor status job #27

elrayle commented May 13, 2022

elrayle commented May 13, 2022

new production system is twitchy about running the monitor status job #27

new production system is twitchy about running the monitor status job #27

Comments

elrayle commented May 13, 2022

Normal process

Current problem

Potential future work

elrayle commented May 13, 2022