Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging empty partitions branch #256

Open
gnikit opened this issue Feb 3, 2020 · 2 comments
Open

Merging empty partitions branch #256

gnikit opened this issue Feb 3, 2020 · 2 comments

Comments

@gnikit
Copy link
Member

gnikit commented Feb 3, 2020

Hi, I was wondering you could merge #162 into master? I did a merge locally and it appears to be fine, no conflicts, no test/unittest failures. I think this would be a great feature to have, although I am a bit biased, since my whole PhD has to do with adaptivity and empty partitions are a real pain to deal with.

I also have a question with regards to this implementation, which I was hoping someone could answer. If you load balance and zoltan returns at least one partition that is empty, does that still cause fluidity to abort?

@jrper
Copy link
Contributor

jrper commented Feb 5, 2020

As far as I remember, this was left still needing code review. I suspect that @stephankramer may still have views on the topic.

The intention (and certainly what I originally coded up the change set to do) is that this version prints a warning when load balancing generates an empty partition, but then carries on with the empty process essentially doing nothing, in hopes that this will eventually be fixed on a subsequent load balancing event. Obviously, this is suboptimal (and somewhat antisocial) on large shared systems.

@gnikit
Copy link
Member Author

gnikit commented Feb 5, 2020

It would be great if you would consider merging it into master. I am certainly willing to help with any additional work that might be required for this merge to happen.

With regards to Zoltan and load balancing. I think that having an idle process is orders of magnitude better when you compare it to the alternative of your run randomly aborting after an hour on 10k cores (which was my luck...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants