Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LocalTaskQueue Hanging #20

Open
william-silversmith opened this issue Sep 20, 2019 · 4 comments
Open

LocalTaskQueue Hanging #20

william-silversmith opened this issue Sep 20, 2019 · 4 comments
Labels

Comments

@william-silversmith
Copy link
Contributor

Some people are reporting problems with parallel operation hanging. We'll have to figure out how to reproduce.

@william-silversmith
Copy link
Contributor Author

Seems like one cause of this can be when threads are mixed with forked processes as locks get copied in memory without a way to release them.

@j6k4m8
Copy link

j6k4m8 commented Feb 9, 2022

I think this is perhaps related: On a run of about ~1.1M tasks on a local fq, RAM gradually fills up over time. I think this is not a practical issue (after over a million task completions, I was only seeing about ~140GB of RAM used up; so this is a VERY gradual leak, if it is indeed a leak). I don't think this has anything to do with the Igneous tasks (that spawned the queue) themselves, since the memory per job there would have filled up RAM much more rapidly if it hadn't been deallocating, so I suspect this is something queue-side. Wish I had more details for you but the execution isn't running anymore; I still have the queue filesystem and am happy to do some digging if helpful there!)

If nothing else, hopefully this gives you a feel for the rate of the memory growth..? Anyhow, feel free to ignore this! Just wanted to give you an extra data point.

@william-silversmith
Copy link
Contributor Author

Thanks Jordan! That's still 121 kB per task, not just a few bytes. I can look into this myself at some point soon, but if you have time would you mind running a taskqueue with an empty task using mprof (memory profiler) and post the .dat file? That's usually where I would start.

Someone once reported a similar but critical issue with Igneous that I wasn't able to reproduce. I suspected it had something to do with their system configuration. They did say that using a different filesystem didn't help though.

seung-lab/igneous#79

@j6k4m8
Copy link

j6k4m8 commented Feb 10, 2022

Unfortunately it doesn't look like the mprof output has much to say: Here's a profile of another set of workers I spawned on the same job as where I saw the issues (i.e., a younger set of workers, but otherwise same exact conditions). It's mostly just a monotonic increase in memory.

mprof.dat.txt

I will run an mprof on a completely empty task on the same machine + same conditions the next time I'm on it (probably later this week!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants