You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Problem:
The data transformer currently faces performance issues because every time we reopen it, all code cells are re-executed to restore the previous state. As the number of code cells in a notebook increases, this issue compounds, leading to longer loading times. Additionally, re-running all cells may not always replicate the exact state from when the notebook was last closed. For instance, if a random function was used to generate a column in a DataFrame, reopening the notebook would yield different results each time. Although we save the notebook's history, the kernel is shut down when we leave the node, necessitating the rerun of the entire history upon reopening. Checkpointing is not a viable solution, as the state is lost when the kernel session is terminated.
A Solution:
To address this, we can save the notebook history (as we already do) and, upon reopening the node, display the history without re-executing it. The Beaker kernel would be initialized with the input dataset(s) and their respective output dataset(s). For example:
A dataset (dataset1) is attached to the data transformer.
The transformer is opened, and the Beaker kernel is initialized with dataset1.
The agent modifies dataset1 (e.g., adds a column), resulting in a new dataset (dataset2).
Dataset2 is saved, the node is closed, and the session is terminated.
Upon reopening the node, the history is displayed but not re-executed.
A new session is initialized with dataset1 and dataset2 already defined.
This approach preserves the notebook's state without the need to rerun all cells, enhancing performance and ensuring consistency.
The text was updated successfully, but these errors were encountered:
@YohannParis@mwdchang Any thoughts above this above would be great 😄. This is from my understanding of whats happening at the moment and a potential solution
Thoughts on Data Transformer Performance
The Problem:
The data transformer currently faces performance issues because every time we reopen it, all code cells are re-executed to restore the previous state. As the number of code cells in a notebook increases, this issue compounds, leading to longer loading times. Additionally, re-running all cells may not always replicate the exact state from when the notebook was last closed. For instance, if a random function was used to generate a column in a DataFrame, reopening the notebook would yield different results each time. Although we save the notebook's history, the kernel is shut down when we leave the node, necessitating the rerun of the entire history upon reopening. Checkpointing is not a viable solution, as the state is lost when the kernel session is terminated.
A Solution:
To address this, we can save the notebook history (as we already do) and, upon reopening the node, display the history without re-executing it. The Beaker kernel would be initialized with the input dataset(s) and their respective output dataset(s). For example:
This approach preserves the notebook's state without the need to rerun all cells, enhancing performance and ensuring consistency.
The text was updated successfully, but these errors were encountered: