Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing inputs for the time between steps in the file #66

Open
arm61 opened this issue Jul 4, 2024 · 29 comments
Open

Confusing inputs for the time between steps in the file #66

arm61 opened this issue Jul 4, 2024 · 29 comments
Labels
question Further information is requested

Comments

@arm61
Copy link
Collaborator

arm61 commented Jul 4, 2024

kinisi uses the same terminology as pymatgen to define the time between the steps in an input.
That is a time_step (the simulation time step for the integrator) and the step_skip (the output frequency to the file).

This is confusing and a stumbling block when new to kinisi.

Therefore, I propose that the meaning of time_step is changed to the time step between trajectory steps in the file and that step_skip is removed entirely.
I welcome any thoughts on this.

@arm61 arm61 changed the title Confusion inputs for the time between steps in the file Confusing inputs for the time between steps in the file Jul 4, 2024
@user200000
Copy link

Yeah, this can be a source of confusion, and the change is a good move. Still, I would introduce a new terminology perhaps time_gap or similar to prevent confusion with people working from notebooks given to them by others as examples without anyone being aware of the change!

@jd15489
Copy link
Contributor

jd15489 commented Jul 4, 2024

Yeah, this can be a source of confusion, and the change is a good move. Still, I would introduce a new terminology perhaps time_gap or similar to prevent confusion with people working from notebooks given to them by others as examples without anyone being aware of the change!

I agree. I think it is best to only take one value from the user, that being the time between frames in the trajectory. This may be future proofing for introducing something like pint.

How will this play with subsampling frames?

@arm61
Copy link
Collaborator Author

arm61 commented Jul 4, 2024

Still, I would introduce a new terminology perhaps time_gap or similar to prevent confusion

My preference would be to keep the name time_step but if the user provides a step_skip then an error will be raised. Hopefully, removing the confusion.

How will this play with subsampling frames?

I think this would not effect the operation of the subsampling frames.

@user200000
Copy link

I think the error plan would help a lot, but I worry that time_step will have a particular MD-based meaning for the user base, and may cause confusion. I could see less experienced scientists getting tangled up with this and not realising it (some trajectory formats include a timestep number and they could assume this is read somehow). There are edge cases with aimd where people could end up publishing incorrect MSDs and diffusion coefficients because they're not so far off as to be preposterous and so the error doesn't get caught.

It wouldn't be Kinisi's fault in this circumstance but I think we owe it to the broader community to guard against such errors.

@bjmorgan
Copy link
Owner

bjmorgan commented Jul 4, 2024

time_step and step_skip are good names IMO. They both mean what they say. And they have the advantage of being the same as in pymatgen.

@arm61
Copy link
Collaborator Author

arm61 commented Jul 4, 2024

And they have the advantage of being the same as in pymatgen.

Not everyone is using pymatgen, and hopefully eventually everyone is using kinisi before using pymatgen.

@jd15489
Copy link
Contributor

jd15489 commented Jul 4, 2024

I'm not sure having the same names as another package is advantageous. It only helps a subset of the user base (those familiar with pymatgen), and if pymatgen changes the names in the future this will be moot. Further, I agree that time_step could be confused for the MD integration timestep. timestep is also used by some packages (pymatgen included) to refer to the current timestep, this could cause further confusion.

@arm61
Copy link
Collaborator Author

arm61 commented Jul 4, 2024

I am coming round to time_step not being a good choice. I don’t really like time_gap, any other ideas?

@user200000
Copy link

time_spacing
temporal_spacing
stride_length (inspired by i-pi)

@osoulas
Copy link

osoulas commented Jul 4, 2024

Potentially step_write_frequency? It's a little long winded, but i think it gets the point across.

@jd15489
Copy link
Contributor

jd15489 commented Jul 4, 2024

dt
frame_step
frame_time
frame_time_step
frame_gap

@arm61
Copy link
Collaborator Author

arm61 commented Jul 4, 2024

frame_step is my favourite so far. Followed by stride_length and stride_step.

@osoulas
Copy link

osoulas commented Jul 4, 2024

frame_interval

@arm61 arm61 added the question Further information is requested label Jul 4, 2024
@arm61
Copy link
Collaborator Author

arm61 commented Jul 4, 2024

I am not a fan of the *_interval options. This is because I often talk about $\Delta t$ as the "time interval", and in the past, some people have been confused and think that the ballistic regime is times before the simulation has equilibrated (this is not the case). So, I want to keep simulation time and MSD time descriptors separate.

@osoulas
Copy link

osoulas commented Jul 4, 2024

My issue with frame_step is that, to me, it reads as a number of steps (basically step_skip) rather than a timeframe. Maybe @jd15489's suggestion of frame_time would make more sense?

@bjmorgan
Copy link
Owner

bjmorgan commented Jul 4, 2024

  1. timestep meaning time per integration step is standard MD terminology.
  2. step skip (number of steps skipped per output frame) seems clear.

Maybe we are rediscovering why pymatgen splits these into two arguments?

@bjmorgan
Copy link
Owner

bjmorgan commented Jul 4, 2024

I would argue for keeping these separate, but potentially renaming step_skip. If you run a MD simulation, these are separate inputs with different units. As a user, I always think about my output frequency in units of timesteps, not in units of simulation time ( = timestep × output frequency)

@arm61
Copy link
Collaborator Author

arm61 commented Jul 4, 2024

Based on the number of confused user questions about this problem (95 % of user problems have been around this), I think you might be alone in that @bjmorgan.

@arm61
Copy link
Collaborator Author

arm61 commented Jul 4, 2024

From @bjmorgan:

I agree with the principle that we just multiply these numbers together, but, this means the user needs to remember to multiply together two numbers from their MD input (which they have already shown they don’t understand) and put that number into kinisi. So the current approach you just use the numbers straight from the MD setup instead of doing the multiplication yourself as an extra step.

I think this is a good argument.

@jd15489
Copy link
Contributor

jd15489 commented Jul 4, 2024

I agree that this is a good argument.

What do MD packages call these things?
LAMMPS calls these: timestep and N.
DLPOLY calls them: timestep and timestep interval.
VASP calls them: POTIM (or time step) and NBLOCK.

@bjmorgan
Copy link
Owner

Returning to this … an additional comment would be that it is not uncommon to have datasets where the MD timestep is consistent, but the output frequency changes (maybe you started with a shorter output frequency, increased it later, and do not want to rerun or post-process your original trajectory data). With the existing separation of these two inputs, it is transparent how changes in MD set-up map to changes in kinisi input parameters. Combining these parameters obfuscates how the combined parameter is linked to the MD data. (output files may not report output frequency in time units, but instead will report step frequency, or leave the user to pull this information from the MD input).

@arm61
Copy link
Collaborator Author

arm61 commented Jul 30, 2024

I have an implementation of kinisi that uses scipp and therefore requires the user to set the values as:

time_step = 2.0 * sc.Unit('fs')
step_skip = 50 * sc.Unit('dimensionless')

I think that this would help with this confusion. But this won't be ready until kinisi version 2.

@arm61
Copy link
Collaborator Author

arm61 commented Jul 30, 2024

Documentation for scipp: https://scipp.github.io (xarray with units and variances).

@bjmorgan
Copy link
Owner

In my opinion, requiring scipp makes usage more complicated, and only users who already understand the difference between MD timestep and write frequency will understand what this is doing.

@arm61
Copy link
Collaborator Author

arm61 commented Jul 30, 2024

The logic is that declaring the units forces the user to think more about what they are doing and, therefore, understand it.

@arm61
Copy link
Collaborator Author

arm61 commented Jul 30, 2024

We want everyone to understand the difference between MD timestep and write frequency.

@bjmorgan
Copy link
Owner

And forcing units is aligned with the general principle for scientific code that you want your code to be explicit rather than implicit.

@arm61
Copy link
Collaborator Author

arm61 commented Jul 30, 2024

Yup. There are additional benefits to the use of scipp (e.g., generally explicit unit handling, objects that hold variances alongside the values, etc.)

@arm61
Copy link
Collaborator Author

arm61 commented Sep 21, 2024

This is being fixed in the scipp branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants