-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with running FoF Halo Finder in parallel #161
Comments
Hi @anirban1196, This is interesting, I've not seen behavior like this before. My first thought is that you might want to be careful using 9, 11, 12 numbers of processors because this result in odd partitioning of the domain. For example, 11, being prime, will result in 11 divisions along a single axis. It's better to go with factors or 2, or even better, powers of 2, to get simpler geometries for the domain decomposition. This doesn't explain the behavior you're seeing, just something to keep in mind. As you point out, the total number of particles filled is telling. The place I would check out first is in the yt source. In |
Hi @brittonsmith, Another thing I realized is that changing the value of the variable " padding " also changes the number of particles being filled into each processor. Earlier I was using the yt-default value of 0.02 - which may have been too small for halo-finding purposes inside the 160 cMpc/h box [ = 160 * 1e3 in code units (comoving kpc/h) ]. I have now made new runs with large values of " padding " ( ~ 10* 1e3) and parallelized over 16 processors. I will share that output once those runs are completed. |
Hi @brittonsmith, As I had mentioned above, I inserted print log statements after line 954 and line 985 in halo_objects.py and made new runs with large values passed to the I found that the value passed to the Hence, I wonder whether one should even expect the total number of particles being filled in (summed over all used procs) to match 1024**3 because the particles being filled in to a particular processor is the particles belonging to that corresponding "padded" subregion (right?). Therefore, if one simply sums over these values for each processor, it will involve the over-counting of particles lying in the overlapping padded sub-regions. So this possibly explains why changing the padding value also changes the particles being filled in. Unfortunately, this also changes the no. of halos identified - which is also worrying. Another thing that I missed out to mention in my very first comment was that for the parallel runs, I was overriding internal calculations of the linking length and passing it directly during the function call. The reason behind doing this was the internally calculated linking length was dependent on the no. of procs used. For nprocs=16, it was calculated as 2.503e-04, while for nprocs=8, it was 2.064e-04, and so on. Only for the serial run case, the internally calculated linking length was correct (i.e. 1.953e-04 = 0.2*1/1024). In the new runs with increased values of padding, I reverted to letting the code calculate the linking length on its own. Additionally, I inserted a print log statement after line 965 in halo_objects.py to display the total number of particles over all processors (hereafter, called "npart") and used in the linking length calculation. It turns out that his number "npart" is not dependent on the padding value used (unlike the earlier case) and is probably a better reflector of the fact that all the 1024^3 particles are not being sent out during the parallelization step. However, "npart" is dependent on the number of processors used.
The detailed terminal outputs for these runs (displaying the left and right edge of each of the subregions, nparts, halo counts etc.) can be found in this folder @brittonsmith , I would very much appreciate your suggestion/clarifications to the last two comments posted above (by me). Thanks ! |
I can see several possibilities for potential issues, but I am not certain of any of them. The problem may be in the io or possibly how the python wrapper to the halo finder converts positions into the correct units for the internal c code, or perhaps even an issue with periodicity. I doubt the issue is with the core halo finding code as that has not been altered in many years and has never shown issues like this before. We can start with the io by confirming that reading data from a series of subvolumes similar to what the halo finder would do results in reading all the particles. Can you put together a code that makes 8 region objects (8 sub-cubes half the size of the box, together spanning the full domain), reads particles in each, counts them, and sums them? With no padding, the sum should be exactly the total number of particles in the box. If not, then we have a problem. Then, you might try padding the regions in the same way the halo finder does and confirm that each region still reads the same number as it does within the halo finder. Please, let me know what you find. Also, please note that I am currently quite unwell and may take some time to respond. I will also be traveling and on leave from the middle of next week. |
I have Gadget-2 snapshots (each snapshot is saved over 16 files as snapshot_XXX.0 to snapshot_XXX.15 ) containing 1024^3 particles inside a periodic box of side length 160 cMpc/h.
I want to run the FoF halo finder on these snaps with a criteria of a minimum 10 dm particles to identify the halo - hence, I modified the “nMembers” variable from 8 to 10 (in line 34 of EnzoFOF.c and of fof_main.c ) while installing from source
I ran yt-FoF Halo Finder on my snaps - first, in serial mode (i.e. without yt.enable_parallelsim( ) ) and then, in parallel mode (by adding the line - yt.enable_parallelsim( ) ) with different numbers of processors each time (nprocs = 8, 9, 11, 12).
The python code being used for the parallel/serial runs is mentioned below
SERIAL yt-FOF Code :
PARALLEL yt-FOF Code :
The no. of halos identified from serial mode runs is different from the total no. of halos identified in the parallel mode runs.
In fact, among the various parallel mode runs, the number of identified halos is dependent on the number of processors used for the parallelisation.
The no of halos identified in the yt-FOF serial mode halo catalogs and the halo statistics (halo mass function) match very accurately with those returned by other FoF halo finders (like P-FoF) working on the same snapshot file. So the serial runs are fine, the problem lies in the parallel mode runs.
From the terminal outputs (see attached files here), I noticed the total number of particles being “filled in” (summed over all processors used) is not the same as the original particle number count ( = 1024^3 = 1073741824) and this may be causing the discrepancy in the determined halo counts.
I am unable to understand how to fix this issue or where exactly I am going wrong in the parallelism implementation (if that is the case). Any suggestions / help in this regard will be highly appreciated.
Thanks and Regards,
Anirban
The text was updated successfully, but these errors were encountered: