-
Notifications
You must be signed in to change notification settings - Fork 95
Updated raw parser with newer format for Hubbard U standard output for pw.x calculation (QE> 7.2) #1096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thanks @alberto-carta! @bastonero as our resident Hubbard expert, can you have a look at this? @alberto-carta I'd also like to add some tests for the new parser functionality. The setup is a bit tricky, so I will take care of it, but it would be very helpful to have a minimal example, i.e. a run for a small structure with few k-points etc, so we don't have to add a file with 100k lines. ^^ |
Thanks @alberto-carta for the contribution! Some time ago we decided with @sphuber to NOT store the occupation matrices, as these are already saved in the XML file. This to avoid to duplicate data. A better solution I think it would be to add an extra function to the A dedicated function to
def get_occupations(node) -> list:
"""Return the occupations for a PwCalculation/PwBaseWorkChain node."""
import numpy as np
from xmlschema import XMLSchema
from xml.etree import ElementTree
from aiida_quantumespresso.parsers.parse_xml.versions import get_schema_filepath
with node.outputs.retrieved.base.repository.open('data-file-schema.xml') as xml_file:
xml_parsed = ElementTree.parse(xml_file)
schema_filepath = get_schema_filepath(xml_parsed)
# schema_filepath = "/home/bastonero/.conda/envs/aiida/lib/python3.9/site-packages/aiida_quantumespresso/parsers/parse_xml/schemas/qes_220603.xsd"
xsd = XMLSchema(schema_filepath)
xml_dictionary = xsd.to_dict(xml_parsed, validation='skip')
# --- Play around here and re-arrange data as you wish
ns_matrices, species, indices, spins = [], [], [], []
for hubbard_ns in xml_dictionary['output']['dft']['dftU']['Hubbard_ns']:
shape = hubbard_ns['@dims']
order = hubbard_ns['@order']
ns = np.array(hubbard_ns['$'], order=order).reshape(shape)
ns_matrices.append(ns)
species.append(hubbard_ns['@specie'])
indices.append(hubbard_ns['@index']-1)
spins.append(hubbard_ns['@spin'])
return np.array(ns_matrices) # we should agree here how to give it in output So ideally we would like to implement this in the pw_calc_node.tools.get_occupations()
pw_calc_node.tools.get_occupations(kind_name='Fe1')
pw_calc_node.tools.get_occupations(symbol='Fe')
pw_calc_node.tools.get_occupations(atom_index=0)
pw_calc_node.tools.get_occupations(atom_index=0, spin=1) Eigenvectors can be obtained simply by diagonalizing "on-the-fly" the small matrices, using e.g. numpy: import numpy as np
occupations = pw_calc_node.tools.get_occupations(atom_index=0, spin=1)
eigenvectors = np.linalg.diag(occupations) We should also agree on how to return the matrices. Probably it would be great to devise a new class to facilitate the usage and put it in I won't have time for several months to work on this, but I strongly recommend that we use this approach rather than parsing the stdout. Happy if somebody wants to go ahead with my recommendations, and happy to discuss! |
Hey @bastonero, thanks for the thorough review and the proposed alternative! I understand the design philosophy behind avoiding data duplication and the benefits of using a PwCalculationTools function for a more robust and flexible solution. However, I'd like to highlight a couple of points regarding the current implementation and user expectations. My PR extends an existing feature (which for me was broken? maybe because it was based on the previous DFT+U output) that requires users to explicitly enable the parsing of these occupation matrices. Given this explicit user action, it's reasonable for them to expect these values to be readily available in the output node, just as they were before. These occupation matrices are a crucial observable in Hubbard calculations. Directly exposing them in the output node, perhaps under output_atomic_occupations (as is done now) or nested within output_parameters (I'd personally lean towards the former for clarity), to me aligns with the current workflow for users who specifically request this data. I totally agree about the part of moving this parser to the XML reader instead of the raw file. I am basing a plugin of mine on my fork's modifications, as soon as I bring it out of the testing phase I can rework it with that in mind. Once we agree on the optimal format for these occupation matrices/dictionaries, I'd still advocate for them to be exposed directly in the output node (on request), making them easily accessible and discoverable.
I appreciate the proposal for PwCalculationTools.get_occupations(), and I see its potential for flexibility. However, as a user, this approach for accessing fundamental outputs doesn't seem simple at all to me. Will other common observables, like total energy, site magnetization, or site charge, also be retrieved exclusively through similar tools functions in the future? To me, key results, especially those explicitly requested by the user, should be readily accessible directly from the calculation's output nodes for simplicity and consistency. We can avoid to store redundant stuff like eigenvectors and eigenvalues if you really want, but the latter for instance are crucial at a quick glance to see if one has orbital order in a system. Also the trace of the matrix is an important quantity. I'm happy to discuss this further. |
Here's the comment you asked for, formatted for GitHub Markdown: So for instance, this now returns a big dictionary where each entry looks like this: { This mimics the QE structure of the std.out, where "2" at the beginning refers to "ATOM 2". Can you specify how you want it? For instance, if the atom is Nickel, should it be something like this? { In the case of SOC (Spin-Orbit Coupling), you will also have an off-diagonal down-up part. |
Many thanks for the comments! I agree that the functionality is broken and we should totally restore it. The problem with extending it is that the format of the outputs is a In passing, indices should be referred to python indices (i.e. related to StructureData), not fortran indices. I don't know if the old implementation was keeping fortran indices, but that would be bad as we should keep consistency with python and AiiDA, not with QE (see e.g. all the Hubbard implementation).
Could you be more specific? Maybe you mean "intuitive"? Length wise is as long as accessing a usual output. Ideally, this will be more powerfull and useful than returning a dictionary that than one needs to understand how to use with respect to a simple function (or more) that already does the job. Of course, this should be accompanied with suitable documentation. On the one hand, I understand your point on having the outputs ready where one should expect them. On the other hand, I think many other information that QE spits out are not "properly" stored and showed in the Probably yes, but i would still avoid the way it is extended at the moment in this PR and devise e.g. a new pw_calc_node.outputs.occupations.asdict()
pw_calc_node.outputs.occupations.get_occupations(kind_name='Fe')
pw_calc_node.outputs.occupations.get_eigenvalues(kind_name='Fe')
pw_calc_node.outputs.occupations.get_eigenvectors(kind_name='Fe')
... |
@bastonero @alberto-carta what's the status on this? Any way I can help get this merged? |
No description provided.