Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cache status of dynamic (parent) tasks is not updated properly #3096

Closed
2 tasks done
MorpheusXAUT opened this issue Nov 23, 2022 · 1 comment · Fixed by flyteorg/flytepropeller#501
Closed
2 tasks done
Labels
bug Something isn't working flyteadmin Issue for FlyteAdmin Service flytepropeller untriaged This issues has not yet been looked at by the Maintainers

Comments

@MorpheusXAUT
Copy link
Contributor

Describe the bug

When executing a dynamic task, the cache status in flyteadmin's database is not updated after all child tasks execute successfully even though datacatalog contains stored data.
Relaunching the same workflow/task with identical inputs leads to flytepropeller immediately recognising the stored data in datacatalog and skipping execution in favor of returning cached outputs.

Whilst this is most likely only a "visual" issue (the cache status is not displayed correctly for the first execution), the cache status should still be propagated properly for consistency reasons (as well as the current in-progress cache eviction implementation).

The issue is caused by flytepropeller's dynamicNodeTaskNodeHandler not setting the TaskNodeInfo for the handler.Transition properly here (either assignment or return is missing). Additionally, this assignment drops the OutputInfo already stored in the handler.ExecutionInfo from the TaskNodeInfo as the whole struct is just overwritten with the newly provided data.


Output of node_executions flyteadmin API (http://localhost:8088/api/v1/node_executions/flytesnacks/development/aglb9tc88qtwh22qfw7t?filters=&limit=10000&sort_by.direction=ASCENDING&sort_by.key=created_at) after first execution - note the lack of cache_status in task_node_metadata:

{"node_executions":[{"id":{"node_id":"start-node","execution_id":{"project":"flytesnacks","domain":"development","name":"aglb9tc88qtwh22qfw7t"}},"closure":{"output_uri":"s3://my-s3-bucket/metadata/propeller/flytesnacks-development-aglb9tc88qtwh22qfw7t/start-node/data/0/outputs.pb","phase":"SUCCEEDED","created_at":"2022-11-23T09:21:33.408090713Z","updated_at":"2022-11-23T09:21:33.408090713Z"},"metadata":{"spec_node_id":"start-node"}},{"id":{"node_id":"n0","execution_id":{"project":"flytesnacks","domain":"development","name":"aglb9tc88qtwh22qfw7t"}},"input_uri":"s3://my-s3-bucket/metadata/propeller/flytesnacks-development-aglb9tc88qtwh22qfw7t/n0/data/inputs.pb","closure":{"output_uri":"s3://my-s3-bucket/metadata/propeller/flytesnacks-development-aglb9tc88qtwh22qfw7t/n0/data/0/outputs.pb","phase":"SUCCEEDED","started_at":"2022-11-23T09:21:33.495433360Z","duration":"103.799682193s","created_at":"2022-11-23T09:21:33.440014294Z","updated_at":"2022-11-23T09:23:17.295115193Z","task_node_metadata":{}},"metadata":{"is_parent_node":true,"spec_node_id":"n0","is_dynamic":true}},{"id":{"node_id":"n1","execution_id":{"project":"flytesnacks","domain":"development","name":"aglb9tc88qtwh22qfw7t"}},"input_uri":"s3://my-s3-bucket/metadata/propeller/flytesnacks-development-aglb9tc88qtwh22qfw7t/n1/data/inputs.pb","closure":{"output_uri":"s3://my-s3-bucket/metadata/propeller/flytesnacks-development-aglb9tc88qtwh22qfw7t/n1/data/0/outputs.pb","phase":"SUCCEEDED","started_at":"2022-11-23T09:23:17.467513162Z","duration":"428.315427327s","created_at":"2022-11-23T09:23:17.427310874Z","updated_at":"2022-11-23T09:30:25.782940327Z","task_node_metadata":{}},"metadata":{"is_parent_node":true,"spec_node_id":"n1","is_dynamic":true}},{"id":{"node_id":"end-node","execution_id":{"project":"flytesnacks","domain":"development","name":"aglb9tc88qtwh22qfw7t"}},"input_uri":"s3://my-s3-bucket/metadata/propeller/flytesnacks-development-aglb9tc88qtwh22qfw7t/end-node/data/inputs.pb","closure":{"phase":"SUCCEEDED","created_at":"2022-11-23T09:30:26.029542815Z","updated_at":"2022-11-23T09:30:26.095959742Z"},"metadata":{"spec_node_id":"end-node"}}]}

Output of node_executions flyteadmin API (http://localhost:8088/api/v1/node_executions/flytesnacks/development/abj76fsjzd7dw6d42bjm?filters=&limit=10000&sort_by.direction=ASCENDING&sort_by.key=created_at) after second execution - note the cache_status in task_node_metadata is now set:

{"node_executions":[{"id":{"node_id":"start-node","execution_id":{"project":"flytesnacks","domain":"development","name":"abj76fsjzd7dw6d42bjm"}},"closure":{"output_uri":"s3://my-s3-bucket/metadata/propeller/flytesnacks-development-abj76fsjzd7dw6d42bjm/start-node/data/0/outputs.pb","phase":"SUCCEEDED","created_at":"2022-11-23T09:32:53.009699805Z","updated_at":"2022-11-23T09:32:53.009699805Z"},"metadata":{"spec_node_id":"start-node"}},{"id":{"node_id":"n0","execution_id":{"project":"flytesnacks","domain":"development","name":"abj76fsjzd7dw6d42bjm"}},"input_uri":"s3://my-s3-bucket/metadata/propeller/flytesnacks-development-abj76fsjzd7dw6d42bjm/n0/data/inputs.pb","closure":{"output_uri":"s3://my-s3-bucket/metadata/propeller/flytesnacks-development-abj76fsjzd7dw6d42bjm/n0/data/0/outputs.pb","phase":"SUCCEEDED","created_at":"2022-11-23T09:32:53.045188298Z","updated_at":"2022-11-23T09:32:53.114586476Z","task_node_metadata":{"cache_status":"CACHE_HIT","catalog_key":{"dataset_id":{"resource_type":"DATASET","project":"flytesnacks","domain":"development","name":"flyte_task-house_price_prediction.multiregion_house_price_predictor.generate_and_split_data_multiloc","version":"0.1-mrXwKV5z-qfQrdt0B"},"artifact_tag":{"artifact_id":"a069d005-9e5c-472c-a421-062a8500043c","name":"flyte_cached-V4OtgE1CaV1xyj1Y-cLAQlff4baY3VbAHXKhKeIuhVU"},"source_task_execution":{"task_id":{"resource_type":"TASK","project":"flytesnacks","domain":"development","name":"house_price_prediction.multiregion_house_price_predictor.generate_and_split_data_multiloc","version":"v0.3.153"},"node_execution_id":{"node_id":"n0","execution_id":{"project":"flytesnacks","domain":"development","name":"aglb9tc88qtwh22qfw7t"}}}}}},"metadata":{"spec_node_id":"n0"}},{"id":{"node_id":"n1","execution_id":{"project":"flytesnacks","domain":"development","name":"abj76fsjzd7dw6d42bjm"}},"input_uri":"s3://my-s3-bucket/metadata/propeller/flytesnacks-development-abj76fsjzd7dw6d42bjm/n1/data/inputs.pb","closure":{"output_uri":"s3://my-s3-bucket/metadata/propeller/flytesnacks-development-abj76fsjzd7dw6d42bjm/n1/data/0/outputs.pb","phase":"SUCCEEDED","created_at":"2022-11-23T09:32:53.168870591Z","updated_at":"2022-11-23T09:32:53.322696432Z","task_node_metadata":{"cache_status":"CACHE_HIT","catalog_key":{"dataset_id":{"resource_type":"DATASET","project":"flytesnacks","domain":"development","name":"flyte_task-house_price_prediction.multiregion_house_price_predictor.parallel_fit_predict","version":"0.1-PUQpW58G-y3c5pfDL"},"artifact_tag":{"artifact_id":"baea0d77-a64f-474a-aeb2-44816e40f183","name":"flyte_cached-1eiNn-O9L5uzp2cMNTkTeAN2vMukfrcQGJy9syBsOPM"},"source_task_execution":{"task_id":{"resource_type":"TASK","project":"flytesnacks","domain":"development","name":"house_price_prediction.multiregion_house_price_predictor.parallel_fit_predict","version":"v0.3.153"},"node_execution_id":{"node_id":"n1","execution_id":{"project":"flytesnacks","domain":"development","name":"aglb9tc88qtwh22qfw7t"}}}}}},"metadata":{"spec_node_id":"n1"}},{"id":{"node_id":"end-node","execution_id":{"project":"flytesnacks","domain":"development","name":"abj76fsjzd7dw6d42bjm"}},"input_uri":"s3://my-s3-bucket/metadata/propeller/flytesnacks-development-abj76fsjzd7dw6d42bjm/end-node/data/inputs.pb","closure":{"phase":"SUCCEEDED","created_at":"2022-11-23T09:32:58.430004159Z","updated_at":"2022-11-23T09:32:58.486528483Z"},"metadata":{"spec_node_id":"end-node"}}]}

Expected behavior

Running a cacheable dynamic task should update the parent node's cache status in flyteadmin accordingly.

Additional context to reproduce

Using latest versions of flyte:

  1. Run cacheable dynamic task/workflow containing cacheable dynamic tasks (e.g. house_price_prediction.multiregion_house_price_predictor.multi_region_house_price_prediction_model_trainer from flytesnacks examples)
  2. Observe child nodes reporting cached state properly (either via flyteconsole or flyteadmin node_executions API)
  3. Observe parent node reporting no cached data
  4. Launch new execution of same dynamic task/workflow
  5. Observe parent nodes reporting cached data being returned immediately

Screenshots

First execution, parent nodes not cached
First execution, parent nodes not cached

First execution, child nodes cached
First execution, child nodes cached

Second execution, parent nodes cached
Second execution, parent nodes cached

Second execution, child nodes unknown
Second execution, child nodes unknown

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@MorpheusXAUT MorpheusXAUT added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers flyteadmin Issue for FlyteAdmin Service flytepropeller labels Nov 23, 2022
@MorpheusXAUT
Copy link
Contributor Author

I've encountered this bug while implementing the second part of the proposed cache eviction - the simple fix is already implemented in my caching branch, however I figured it might be worth to mention as a separate issue/fix independently before the rest of caching is completed. Will open a small propeller PR for this shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flyteadmin Issue for FlyteAdmin Service flytepropeller untriaged This issues has not yet been looked at by the Maintainers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant