Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a view to hold the latest variable timestamps #171

Closed
wants to merge 1 commit into from

Conversation

JamesWrigley
Copy link
Member

Part of #151. Example:

image

@JamesWrigley JamesWrigley added the enhancement New feature or request label Jan 17, 2024
@JamesWrigley JamesWrigley self-assigned this Jan 17, 2024
@takluyver
Copy link
Member

I've been assuming that these views are a temporary measure to limit how much code we needed to change in one go, and we're aiming to get rid of them. So I'm a bit surprised to see another one added. 👀

I have a few (admittedly vague) concerns about relying on making views like this:

  • Modifying the database schema whenever a new variable was added has worked pretty well for the v0 design, but it's undeniably an unusual choice - I spent some of our 'innovation tokens' on that. With v1, I thought we were aiming to have a more conventional database setup, where schema modifications are rare.
  • It means more mutable state which can get out of sync - if there's a bug, you might end up with a variable in the data but not in the views or vice versa. E.g. I think that could happen in .add_user_variable() if you got really unlucky with timing.
  • IDK what the performance is like as the data in use grows, especially once we store multiple versions. I could easily imagine it hits some $n^2$ or $n^3$ case repeatedly iterating through the same data (I can just as easily imagine it's fine, but I don't know).
  • Looking at the SQL again, I'm also not sure it actually does the right thing once we store multiple versions - would a clause like max(CASE WHEN name = 'pulses' THEN value END) AS pulses give us the latest version, or the maximum value?

What I might do instead is arrange the data like this on demand in Python code. We could also define a view in the database called something like run_variables_latest with the same columns as run_latest but only including the latest version of each variable - this wouldn't need to be recreated all the time. I haven't thought through exactly how these things would work, though.

All that said, I think the definition of this new view looks correct, and in line with the ones we already created, so if you feel it's needed, LGTM.

@JamesWrigley
Copy link
Member Author

JamesWrigley commented Jan 18, 2024

Yeah the existing views were added purely for compatibility, though I will say that I would like to keep the runs view anyway because it's very useful for getting an overview of what's in the database if you don't have the GUI handy (or it's not working). I don't have a strong opinion on whether we should add more or not. @CammilleCC, would you be ok with relying on run_variables? From what I remember of the web interface the timestamps are only displayed when you click on a variable to look at its metadata, so maybe it'd be cleaner to just query for timestamps as needed?

  • Looking at the SQL again, I'm also not sure it actually does the right thing once we store multiple versions - would a clause like max(CASE WHEN name = 'pulses' THEN value END) AS pulses give us the latest version, or the maximum value?

Oh yeah, this will definitely break with multiple versions 😀 I was going to cross that bridge when we get to it.

@CammilleCC
Copy link

I was thinking to also use the timestamps to watch updates on the variable as a whole: we'd ideally like to replot the saved data plot once there are changes on the variable, which can be done by watching timestamps.

I don't mind using run_variables to query for the latest timestamp, just thought that we could sneak it in like max_diffs. It's alright to not add it if that's not really wanted.

@JamesWrigley
Copy link
Member Author

By watching the timestamps, do you mean querying the database for them in a loop?

@JamesWrigley
Copy link
Member Author

Alrighty, I'll close this then. I think Thomas' idea of having a view of only the latest variables is probably the best solution if we still want a simple view to query in the future, that way there's no need to modify the schema.

@JamesWrigley JamesWrigley deleted the variable-timestamps branch January 19, 2024 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants