Skip to content

Use calcfunction for getitem in For loop workgraph#283

Closed
agoscinski wants to merge 7 commits intomainfrom
for-loop-provenance
Closed

Use calcfunction for getitem in For loop workgraph#283
agoscinski wants to merge 7 commits intomainfrom
for-loop-provenance

Conversation

@agoscinski
Copy link
Copy Markdown
Contributor

@agoscinski agoscinski commented Aug 28, 2024

To track the provenance in the for loop we now use a calcfunction to track provenance for the For loop workgraph. I would like to allow more types than integers for the sequence, but I am not sure how to generically test for types that are supported by aiida core. Some orm types have a _type class attribute, but not all of them.

TODO:

  • add notebook reason why to choose this over a normal for loop (less tasks created), or at least move this to an issue

@superstar54
Copy link
Copy Markdown
Member

Hi @agoscinski , thanks for the work.

There is a function orm.to_aiida_type which can serialize the data automatically, and you can check which Python data types it support.

It's also would be good if the user can supply a list of AiiDA data as the input for sequence, e.g.,

[structure0, structure1, ...]

Because AiiDA does not support a list of AiiDA data as input, internally, we can change it to a dict with keys, the keys keep the order information.

{"sequence_0": structure0, ...}

and then, in the WorkGraph engine, if the sequence is a dict, we get the AiiDA data directly without serialization.

@agoscinski
Copy link
Copy Markdown
Contributor Author

Thanks @superstar54 applied your suggestion. The test pass, but it does not test the other use cases you mentioned. I tried to implement a dictionary, but it is a bit opaque to me how to change the iterator to be compatible with dictionaries that use hashable keys. It seems the iterator is hard coded to only work for collections that can be accessed by incremental increasing numbers like lists. Maybe we should generalize this for arbitrary generators. Not sure if this PR is the right place.

@agoscinski agoscinski marked this pull request as ready for review September 2, 2024 10:09
Copy link
Copy Markdown
Member

@superstar54 superstar54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @agoscinski , thanks for the work!

Please test for the changes.

As mentioned in my previous comment, if you use a list of orm.data as the sequence, it will raise an error, because we need to change the list to a dict.

Therefore, you need also modify the aiida_workgraph.workgraph.WorkGraph to convert it into a dict. Then inside the engine, the sequence is a dict.

Comment thread aiida_workgraph/engine/workgraph.py Outdated
@calcfunction
def __getitem__(sequence, count):
value = sequence[count.value]
# only convert value f not already orm type because sequence
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# only convert value f not already orm type because sequence
# only convert value if not already orm type because sequence

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Sep 17, 2024

Codecov Report

❌ Patch coverage is 78.37838% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.70%. Comparing base (5937b88) to head (81b5bac).
⚠️ Report is 270 commits behind head on main.

Files with missing lines Patch % Lines
aiida_workgraph/workgraph.py 65.21% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #283      +/-   ##
==========================================
+ Coverage   75.75%   80.70%   +4.95%     
==========================================
  Files          70       66       -4     
  Lines        4615     5157     +542     
==========================================
+ Hits         3496     4162     +666     
+ Misses       1119      995     -124     
Flag Coverage Δ
python-3.11 80.62% <78.37%> (+4.96%) ⬆️
python-3.9 80.66% <78.37%> (+4.92%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@agoscinski
Copy link
Copy Markdown
Contributor Author

agoscinski commented Sep 17, 2024

As mentioned in my previous comment, if you use a list of orm.data as the sequence, it will raise an error, because we need to change the list to a dict.

That is for non json serializable data types like orm.StructureData for others it will automatically convert it and is not a problem. My problem with the dynamic workaround is that in this case it is shown differently in the provenance graph and also will create a lot of links.

That was the initial idea

@calcfunction
def __getitem__(seq, idx):
    return orm.to_aiida_type(seq[idx.value])

node = __getitem__([1,2,3], 0)
generate_node_graph(node.pk)
Screenshot 2024-09-17 at 11 34 09

but the dynamic input ports approach gives

import numpy as np

@calcfunctione
def __getitem__(**kwargs):
    idx = kwargs.pop('idx')
    list_of_orms = list(kwargs.values())
    return list_of_orms[idx.value].clone()

inp = {f'list_of_orms_{i}': orm.StructureData(cell=np.random.rand(3,3)) for i in range(3)}
inp['idx'] = orm.Int(0)
node = __getitem__(**inp)
generate_node_graph(node.pk)
Screenshot 2024-09-17 at 11 35 36

There is also the new StructureData object that might be json serializable because of the BaseModel that resolves this issue (EDIT: no Miki tried). I am also concerned by performance, since we now have to convert a Generator to a list which can be quite time consuming. I feel more and more that we should drop this PR. With the For workgraph we can maybe go more into a similar direction as with the while loop and just focus and a nice GUI representation.

@superstar54
Copy link
Copy Markdown
Member

The For and While WorkGraph types are dropped and not used anymore. Instead, the Map and While context managers are used. Hence, I am closing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants