Events collected, but not triggering next process #23

franTarkenton · 2023-04-24T17:50:26Z

The listener has been deployed to openshift and is listening to events successfully.

downloaded the database from the server and can see the events being recorded in the db cache.

Need to figure out why subsequent events are not being triggered properly, either be a restarted process, or when all the data becomes available by the long running process.

What is happening:

events are streaming in and being recorded to the sqlite db.

What should be happening

when all the expected events are available emits a message.

TODO:

use the db pulled from openshift to diagnose whether all the data is actually available.
figure out if all the data is available
if not diagnose the listener config
if yes diagnose the method that detects completion

Def of Done:

events are collected
events are cached
events are acknowledged to message queue
detect all data is now available
triggers next step in pipeline
crash recovery process, checks cached events and recovers correctly

franTarkenton · 2023-05-31T17:21:48Z

The current process is setup to start the listener, and then it just monitors and logs the messages it receives. The process is getting rebooted every 45 minutes because it does not have a healthcheck or a liveliness probe configured. Working on adding a fastapi end point that services the health and liveliness probes. Once this is complete and implemented should get all the message events in the logs and can then start debugging why some messages don't seem to be received. ATM the message events are lost when the pod dies (every 45 minutes).

franTarkenton · 2023-06-06T00:21:37Z

Listener now runs with a readiness and liveliness check which should eliminate the reboot of the pod every 45 minutes. Hoping that this will result in the events that we are expecting to show up in the queue to now show up. Specific events that are not included in the database are the ones for the datasets in this directory:

https://hpfx.collab.science.gc.ca/20230529/WXO-DD/model_gem_global/15km/grib2/lat_lon/00/090/

franTarkenton added the bug Something isn't working label Apr 24, 2023

franTarkenton self-assigned this Apr 24, 2023

franTarkenton mentioned this issue Dec 5, 2023

Create CMC data Pipeline #12

Open

13 tasks

franTarkenton added zen_reorg and removed zen_reorg labels Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Events collected, but not triggering next process #23

Events collected, but not triggering next process #23

franTarkenton commented Apr 24, 2023

franTarkenton commented May 31, 2023

franTarkenton commented Jun 6, 2023

Events collected, but not triggering next process #23

Events collected, but not triggering next process #23

Comments

franTarkenton commented Apr 24, 2023

franTarkenton commented May 31, 2023

franTarkenton commented Jun 6, 2023