-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue changing the memory value #97
Comments
Looking at the actor logs I don't see anything suspicious. @vlimant any idea where I should try to track down the issue? https://cms-unified.web.cern.ch/cms-unified//logs/actor/2018-03-12_14:00:48.log |
I think, and @amaltaro will confirm, that for memory to have an effect on ACDC, it has to be set at assignment time. |
and it used to "work" because the MaxRSS was updated at assignment time (using "Memory": "8000") while now it's slaved to the spec "memoryRequirement = 4695.0" |
It works during creation as well, but mind this small detail: which means, if you're ACDCing a TaskChain workflow, then Memory argument has to have a dictionary value. |
wait. do you mean that the value in the nested TaskX do not matter, but the base Memory Parameter has to be a dict with Task:Memory ? |
For Resubmission, yes, that's correct! We don't re-evaluate all the parameters and call the setters, ACDC simply truncates the original workload (so there are no attributes changed). |
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMSpec/StdSpecs/Resubmission.py#L44 and the following lines makes -no sense- what so ever, except that it is a copy paste from the assignment code. because of a time de-correlation between creation time and assignement time, the change at creation should be allowed and supported without having to do unnatural conversions. Can you please motivate why it should only be done at assignement time (if not only for practical reason of coding this in wmcore) ? Bo, at least we know why the ACDC are failing now. we dropped the maxrss overriding by unified. @areinsvo can you please go ahead and change actor so that it creates a dictionary TaskName:Memory and set payload['Memory'] = that_dictionnary. |
I'm pretty sure there was a reason to make it that complicated, unfortunately I don't remember and can't find what was that. I'll look at it again and see if we can remove this over-complication. The reason it shouldn't be supported during creation time is:
I think these are pretty good reasons ;) |
CMSCompOps/WmAgentScripts@0becde6#diff-699b8f6dbca6e1b3cf8365e884aaaf0e |
@areinsvo, I created this ACDC using our tool https://cmsweb.cern.ch/reqmgr2/config?name=vlimant_ACDC1_task_HIN-HINPbPbSpring18GS-00001__v1_T_180316_112116_2010 I checked it and I saw: |
You're right. I was missing some int() values. Can you resubmit the action? |
We have a problem with this ACDC https://cmsweb.cern.ch/reqmgr2/fetch?rid=vlimant_ACDC0_task_HIN-HINPbPbSpring18GS-00001__v1_T_180312_140104_4651
I changed the memory using the recovery tool. When I check the request's JSON in reqmgr, this is the task configuration:
But in config:
https://cmsweb.cern.ch/reqmgr2/config?name=vlimant_ACDC0_task_HIN-HINPbPbSpring18GS-00001__v1_T_180312_140104_4651
vlimant_ACDC0_task_HIN-HINPbPbSpring18GS-00001__v1_T_180312_140104_4651.tasks.HIN-HINPbPbSpring18GS-00001_0.input.splitting.performance.memoryRequirement = 4695.0
There might be something broken at the actor side, @vlimant, @areinsvo could you please help me to take a look?
The text was updated successfully, but these errors were encountered: