Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove phys prop #164

Draft
wants to merge 27 commits into
base: develop
Choose a base branch
from
Draft

Remove phys prop #164

wants to merge 27 commits into from

Conversation

ndaelman-hu
Copy link
Collaborator

Make data solely reliant on sections and quantities, reserving the use of Dataframe (new PhysicalProperty) for results.
Re-group properties to with a focus on shared normalized quantities, as well as visualization.

Closes #143 #85 #80

@ndaelman-hu ndaelman-hu added the improvement/fix Improvement or fix of a previous feature label Feb 4, 2025
@ndaelman-hu ndaelman-hu self-assigned this Feb 4, 2025
@ndaelman-hu ndaelman-hu marked this pull request as draft February 4, 2025 09:44
@ndaelman-hu
Copy link
Collaborator Author

ndaelman-hu commented Feb 4, 2025

Regarding re-grouping, I'm undecided on the final interface. Using electronic properties (e.g. eigenvalues, DOS, band structure), I'll illustrate the 2 options I see.

Generically, any of these electronic property can be identified using any of the following attributes: particle kind*, semantic group**, k-point, spin. This list is not necessarily exhaustive.
*: different kinds of particle models computed by the various diagonalization routines, e.g. Kohn-Sham, self-energy exchange or correlation.
**: groupings typically used to project / decompose the electronic states in. These are quite varied, but examples include: element, ion, (atomic) orbital like Ce(4f).

Overall, we'd like to present any user with an as similar as possible interface (similar idea in the old PhysicalProperty, just differently executed). As you can see in the hierarchies below, these setups mostly follow the same ordering, though they may miss attributes is these are nonsensical:

  • eigenvalues: particle kind -> (semantic group?) -> k-point -> spin
  • DOS: semantic group -> spin
  • band structure: semantic group -> spin -> particle life time (k-point is on the x-axis)
  • Fermi surface: (semantic group?) -> band -> spin (mapped into k-space)

(note: will add example figures later on)

Hierarchies like these are easy to express in JSON key-value pairs, but mean deeper and different paths to traverse.
total would then stand out as a privileged group with a short-cut. This is most similar in setup to the current state of the schema (with the old PhysicalProperty).

{
  'band_structure': {
    'kpoints': [{...}],
    'total': [
       {
        'spin_channel': [
          {
            'spin': {...},
            'energies': [...],
            'occupations': [...]
          }
    ],
    'groups': [
      {
        'group_label': {...},
        'spin_channel': [
          {
            'spin': {...},
            'energies': [...],
            'occupations': [...]
          }
        ]
      }
    ]
  }
}

Alternatively, we could have a container section with repeating subsections that annotate all identifying attributes together, at the same level. This means more homogeneity in structure. This also mimics the plots (which are just overlays) better.

It does make data traversal harder (especially without the metainfo browser to show name), as you have to scan identifiers to know what you're dealing with. This can be mitigated if we ensure that groupings is properly sorted by the end of the normalization.

{
  'band_structure': {
    'kpoints': [{...}],
    'groupings': [
      {
        'identifiers': {...},
        'energies': [...],
        'occupations': [...]
      }
    ]
  }
}

@JosePizarro3
Copy link
Collaborator

Regarding re-grouping, I'm undecided on the final interface. Using electronic properties (e.g. eigenvalues, DOS, band structure), I'll illustrate the 2 options I see.

Generically, any of these electronic property can be identified using any of the following attributes: particle kind*, semantic group**, k-point, spin. This list is not necessarily exhaustive. *: different kinds of particle models computed by the various diagonalization routines, e.g. Kohn-Sham, self-energy exchange or correlation. **: groupings typically used to project / decompose the electronic states in. These are quite varied, but examples include: element, ion, (atomic) orbital like Ce(4f).

Overall, we'd like to present any user with an as similar as possible interface (similar idea in the old PhysicalProperty, just differently executed). As you can see in the hierarchies below, these setups mostly follow the same ordering, though they may miss attributes is these are nonsensical:

  • eigenvalues: particle kind -> (semantic group?) -> k-point -> spin
  • DOS: semantic group -> spin
  • band structure: semantic group -> spin -> particle life time (k-point is on the x-axis)
  • Fermi surface: (semantic group?) -> band -> spin (mapped into k-space)

I am not sure I understand what "particle kind" and "semantic group" mean. I think particle kind might be info stored in ModelMethod, is that so? Can you put an example?

And the same goes with semantic group: is this the degrees of freedom? Only corresponds to some index for orbital or plane-wave index depending on the basis?

Furthermore, I don't follow your hierarchy for the different properties. How I consider this, from purely the perspective of the property per se is that we have: $E_{k \sigma m}$ for the eigenvalues, where k is the k-points dof, $\sigma$ is the spin dof, and $m$ is the orbital/planewave dof. All the properties can then be derived from this information:

  1. The DOS in the integral of that over $k$
  2. The band structure is that over a specific $k$ path
  3. The Fermi surface is those eigenvlaues close to $E_F$
  4. The spectral function (which you didn't add, though you pointed to something with the life time) is an intensity for each of the eigenvalues $I(E_{k \sigma m})$

The first JSON looks good.


Now, also for @JFRudzinski: is the Variables idea of PhysicalProperty deprecated?

@EBB2675
Copy link
Collaborator

EBB2675 commented Feb 4, 2025

I find it more intuitive and easier to follow when there is a structured tree where i can conceptually drill down

@ndaelman-hu
Copy link
Collaborator Author

ndaelman-hu commented Feb 4, 2025

I find it more intuitive and easier to follow when there is a structured tree where i can conceptually drill down

Sure, but imagine now having 5 of these trees side-by-side that have mostly similar, but not identical structures.
The advantage of the 2nd approach is that the core structure remains identical, making some common normalization / plotting easier.
But I admit, both come with issues, hence why I'm collecting opinions.

@ndaelman-hu
Copy link
Collaborator Author

ndaelman-hu commented Feb 4, 2025

Regarding re-grouping, I'm undecided on the final interface. Using electronic properties (e.g. eigenvalues, DOS, band structure), I'll illustrate the 2 options I see.
Generically, any of these electronic property can be identified using any of the following attributes: particle kind*, semantic group**, k-point, spin. This list is not necessarily exhaustive. *: different kinds of particle models computed by the various diagonalization routines, e.g. Kohn-Sham, self-energy exchange or correlation. **: groupings typically used to project / decompose the electronic states in. These are quite varied, but examples include: element, ion, (atomic) orbital like Ce(4f).
Overall, we'd like to present any user with an as similar as possible interface (similar idea in the old PhysicalProperty, just differently executed). As you can see in the hierarchies below, these setups mostly follow the same ordering, though they may miss attributes is these are nonsensical:

  • eigenvalues: particle kind -> (semantic group?) -> k-point -> spin
  • DOS: semantic group -> spin
  • band structure: semantic group -> spin -> particle life time (k-point is on the x-axis)
  • Fermi surface: (semantic group?) -> band -> spin (mapped into k-space)

I am not sure I understand what "particle kind" and "semantic group" mean. I think particle kind might be info stored in ModelMethod, is that so? Can you put an example?

And the same goes with semantic group: is this the degrees of freedom? Only corresponds to some index for orbital or plane-wave index depending on the basis?

Furthermore, I don't follow your hierarchy for the different properties. How I consider this, from purely the perspective of the property per se is that we have: E k σ m for the eigenvalues, where k is the k-points dof, σ is the spin dof, and m is the orbital/planewave dof. All the properties can then be derived from this information:

1. The DOS in the integral of that over 
     k

2. The band structure is that over a specific 
     k
    path

3. The Fermi surface is those eigenvlaues close to 
     
       E
       F

4. The spectral function (which you didn't add, though you pointed to something with the life time) is an intensity for each of the eigenvalues 
     I
     (
     
       E
       
         k
         σ
         m
       
     
     )

The first JSON looks good.

Now, also for @JFRudzinski: is the Variables idea of PhysicalProperty deprecated?

You're pointing out the same observations as I did:

  • E k σ m: there are several indices / identifiers / variables for a single property. Note that both spin and orbital (or rather, band) can be much more complex structures. Depending on the context (you know which, not going to iterate them here), they may become sections in their own right.
  • The relevant indices will change between properties, as you (and I) showed in our listing.

The 1st approach puts up a preferential structure / order to run over these indices. The 2nd approach groups them all together at the same level. This helps consistency and visualization. That's their main difference.
Approach 2 is in that respect similar to the PhysicalProperty concept, but does not allow for open-ended contexts (such as potentially additional variables). Context is relevant both for normalization and visualization. Here, it is extended via inheritance.

PhysicalProperty is being reworked by Area D with feedback from us. For both technical reasons and again, context, it will be used in results and worklfow2, rather than data.

ndaelman added 2 commits February 4, 2025 14:32
- Apply standard template to `DensityOfStates`
- Add naming convention to `OrbitalsState`
- Add few comments + correct typos
@JFRudzinski
Copy link
Collaborator

@JosePizarro3 Thank you for keeping an eye out and giving feedback.

Sorry for the delay in updating you about PhysicalProperty, I hope this MR did not come as too much of a shock. There has been much movement in recent weeks and we are trying to make quick movement now in terms of schema dev and parser migration.

I want to just slightly expand upon what Nathan shared: Starting from your prototype, Markus made his own implementation of PhysicalProperty, a bit more in a dataframe-like style. If you are interested we will be able to share more information in the near future and there will also be a cafe about it.

Markus, Laurie, Hampus, and Nathan have been working to test and improve this implementation, and it is pretty close to usable now. Nathan did many tests of the new implementation into our schema. In the end, we came to the conclusion that applying this structure to every single property in our detailed schema was both tedious and probably not practical in the long run.

Instead we focus for now on applying this new structure/tool to try to improve interoperability at a higher level, hence Nathan's mention of results and workflow2. In principle, it could also be used in data if an appropriate use case comes along. It's just that we don't implement it everywhere as default. This also helps us to make more progress on our schema immediately, which is our top priority.

I'm happy to discuss further with you, also to see if there are specific use cases you have in mind that could be useful for further testing.

Otherwise, we plan to tag you on any relevant MRs for potential input as we go through here. We appreciate your continued input!

@JosePizarro3
Copy link
Collaborator

Sure, I just asked if PhysicalProperty was being reworked, and Variables deprecated. Whatever you and the others decide is ok for me.

So the idea is to just do whatever in data in order to push forward the schema development? And then it is results and workflow2.results the ones responsible of interoperability? This sounds good for me anyways, just making sure I understand.

- Correct eigenvalues and DOS
@JFRudzinski
Copy link
Collaborator

Sure, I just asked if PhysicalProperty was being reworked, and Variables deprecated. Whatever you and the others decide is ok for me.

So the idea is to just do whatever in data in order to push forward the schema development? And then it is results and workflow2.results the ones responsible of interoperability? This sounds good for me anyways, just making sure I understand.

Of course, I want to keep you updated 👍

I wouldn't exactly say that the idea is to do "whatever" in data. We are trying to develop some standardization within the schema and also some quasi templates for people to easily use when extending later. You will see here that Nathan kept several aspects of your PP implementation. It's just that we don't allow for the flexibility in data, at least not by default for all properties. If a use cases arises, PP can also be used in data.

And yes, exactly, results and workflow2.results are responsible for interoperability. There are still some decisions exactly what is done with the results section.

@JFRudzinski
Copy link
Collaborator

From my side, the second structure that Nathan suggested has some advantages for standardization and plotting. I think this will become more clear with some concrete examples...

@coveralls
Copy link

coveralls commented Feb 5, 2025

Pull Request Test Coverage Report for Build 13209227390

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-81.0%) to 0.0%

Totals Coverage Status
Change from base Build 13008515978: -81.0%
Covered Lines: 0
Relevant Lines: 0

💛 - Coveralls

@ndaelman-hu
Copy link
Collaborator Author

Okay, feel free to give some preliminary feedback.
The main guiding approach is producing accompanying plots. This means grouping together the necessary data as well as context (i.e. metadata) to understand / retrieve the plot. For example:

  • overlaying some kind of labelled plot several times is very common, and the base template is in SemanticGroup and SemanticGroupContainer.
  • most ground state electronic structures require a common alignment, so they are grouped together. For testing purposes, I currently directly use this section under outputs. I also still have to fine-tune the normalization order there (actually, via deactivation + explicit calling, rather than levels).

The most fleshed out example is the DensityOfStates under properties/solid_state_electronics.py, you can trace the other changes back from there. There is a script for generating an example under tests/properties/visualize_electronic.py. If I upload it to a NOMAD server, I get the following result below. Some corrections are still in order here (toggle legend, single plot, show on overview), but the skeleton stands.

Screenshot from 2025-02-07 23-50-03

from nomad.metainfo import Quantity, Reference


class ModelBaseSection(ArchiveSection):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a small note: the terminology "model" here is a bit confusing to me considering the other uses of model within our schema

@JFRudzinski
Copy link
Collaborator

I'm starting to set this up for testing but already having some import problems.

../nomad-simulations/src/nomad_simulations/schema_packages/properties/decomposable.py:1: in <module>
    from nomad.metainfo import placeholder, Quantity, SubSection, Reference

I can only find placeholder defined in the javascript part of nomad-FAIR. Perhaps it's due to the older mapping annotation branch. @ladinesa could you let me know when you rebase your nomad-FAIR branch, I tried myself, but the conflicts are too complicated/unknown to me...also I guess if you have any alternative insight into the placeholder import in general 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement/fix Improvement or fix of a previous feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PhysicalProperty rank checks broken
5 participants