The second memory node not working when trying 1P-2M-1S with a GMM #10

fyc1007261 · 2019-08-29T15:09:41Z

We have successfully deployed 1P-1M-1S on CloudLab and we are now trying to do some experiments on multiple processor/memory nodes. We tried with 5 nodes with #0 as processor; #1, #4 as memory; #2 as storage and #3 as global resource monitor. We have also correctly configured linux-modules/monitor/include/monitor_config.h to let the GMM know the IDs of the memory nodes. After rebooting processor and memory nodes, we tried make fit_install on storage and GMM, then make monitor_install on GMM node and make storage_install on storage. However, when we tried to run an application which required large memory, the #1 node (the default memory node configured on all machines) used up all its memory and panicked, while the #4 node seemed not to be working. Is there anything that we have left not configured or is there anything that we did wrong?

Thanks very much for your help!

The text was updated successfully, but these errors were encountered:

fyc1007261 · 2019-08-29T15:18:54Z

By the way, when we tried multiple processor nodes with GPM configured as 'Y', LegoOS failed to compile. It seemed that some code is using the ibapi_reply_message function, which is defined in fit/ibapi.h. But the defination code only works when CONFIG_COMP_MEMORY is set (in line 88). Is there any way to solve this problem? Thanks so much!

lastweek · 2019-08-30T05:11:29Z

@fyc1007261, I'm traveling this week, will get back on it next Wed/Thur. Sorry for the late. Meanwhile, @hythzz will you have time help out?

fyc1007261 · 2019-09-13T20:41:20Z

Hi @lastweek ,

We are still struggling with the multiple P/M problem. Could you please help check out where our configurations might be wrong or provide some instructions on this?

lastweek · 2019-09-13T21:41:37Z

Hi @fyc1007261,

Sorry for the late. I was moving a lot recently. According to your first post, it seems at least all machines are connected. You mentioned "the #1 node (the default memory node configured on all machines) used up all its memory and panicked", did you see a OOM message? I might have a clue where the issue, but need to take a look at your .config files.

Could you share your P and M's .config files with me? Thank you.

fyc1007261 · 2019-09-14T21:45:03Z

Hi @lastweek ,

I have put all the config files and logs that I consider important on this link. We are running the programming test.cc which tries to allocate 20GB memory. As we have two memory nodes with 16GB memory each, it should have returned gracefully. Unfortunately, it failed. If there are any additional files that I should provide, please tell me.

Thanks a lot for your help!!

Config files and logs:
https://1drv.ms/u/s!ApeLgKxbjBKilr544vQG_9UQKxctfA?e=iCWf4I

dothyt · 2019-09-16T16:58:37Z

Hi @fyc1007261 ,

I just checked your config files, looks like you didn't enable the CONFIG_DISTRIBUTED_VMA on both processor node and memory nodes, which is necessary for multiple Ms to run. Here are some sample configs needed for two Ms to work:

Processor node side:

CONFIG_DISTRIBUTED_VMA=y
CONFIG_DISTRIBUTED_VMA_PROCESSOR=y
CONFIG_VM_GRANULARITY_ORDER=30
CONFIG_MEM_NR_NODES=2

Memory node side:

CONFIG_DISTRIBUTED_VMA=y
CONFIG_DISTRIBUTED_VMA_MEMORY=y
CONFIG_VM_GRANULARITY_ORDER=30
CONFIG_MEM_NR_NODES=2
CONFIG_VMA_CACHE_AWARENESS=n

Please keep the CONFIG_VM_GRANULARITY_ORDER on all the nodes consistent. 30 means 2^30 which is the default settings.

CONFIG_VMA_CACHE_AWARENESS is an optional config which will make VMA allocation cache aware but increase virtual address fragmentation.

CONFIG_MEM_NR_NODES is used for each memory node awares the number of memory nodes exist in the cluster.

fyc1007261 · 2019-09-16T17:59:06Z

Hi @lastweek @hythzz ,

Thanks for your help! It works now!

lastweek · 2019-09-16T18:23:44Z

Hi @fyc1007261, we are back on schedule and will update the repo more recently.

dothyt · 2019-09-17T13:40:59Z

Hi @fyc1007261, If your issue has been solved, please close this thread.

fyc1007261 · 2019-09-17T13:41:56Z

Hi @hythzz,
Sorry I forgot it. I'll close it.

fyc1007261 closed this as completed Sep 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The second memory node not working when trying 1P-2M-1S with a GMM #10

The second memory node not working when trying 1P-2M-1S with a GMM #10

fyc1007261 commented Aug 29, 2019

fyc1007261 commented Aug 29, 2019

lastweek commented Aug 30, 2019

fyc1007261 commented Sep 13, 2019

lastweek commented Sep 13, 2019

fyc1007261 commented Sep 14, 2019

dothyt commented Sep 16, 2019 •

edited

Loading

fyc1007261 commented Sep 16, 2019

lastweek commented Sep 16, 2019

dothyt commented Sep 17, 2019

fyc1007261 commented Sep 17, 2019

The second memory node not working when trying 1P-2M-1S with a GMM #10

The second memory node not working when trying 1P-2M-1S with a GMM #10

Comments

fyc1007261 commented Aug 29, 2019

fyc1007261 commented Aug 29, 2019

lastweek commented Aug 30, 2019

fyc1007261 commented Sep 13, 2019

lastweek commented Sep 13, 2019

fyc1007261 commented Sep 14, 2019

dothyt commented Sep 16, 2019 • edited Loading

fyc1007261 commented Sep 16, 2019

lastweek commented Sep 16, 2019

dothyt commented Sep 17, 2019

fyc1007261 commented Sep 17, 2019

dothyt commented Sep 16, 2019 •

edited

Loading