Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue changing the memory value #97

Open
paorozo opened this issue Mar 14, 2018 · 12 comments
Open

Issue changing the memory value #97

paorozo opened this issue Mar 14, 2018 · 12 comments

Comments

@paorozo
Copy link

paorozo commented Mar 14, 2018

We have a problem with this ACDC https://cmsweb.cern.ch/reqmgr2/fetch?rid=vlimant_ACDC0_task_HIN-HINPbPbSpring18GS-00001__v1_T_180312_140104_4651
I changed the memory using the recovery tool. When I check the request's JSON in reqmgr, this is the task configuration:

"Task2": {
    "KeepOutput": true, 
    "PrepID": "HIN-HINPbPbSpring18DR-00001", 
    "InputFromOutputModule": "RAWSIMoutput", 
    "GlobalTag": "100X_upgrade2018_realistic_v10", 
    "TimePerEvent": 16.7631, 
    "FilterEfficiency": 1, 
    "SplittingAlgo": "EventAwareLumiBased", 
    "ProcessingString": "NoPU_100X_upgrade2018_realistic_v10", 
    "ScramArch": [
      "slc6_amd64_gcc630"
    ], 
    "SizePerEvent": 16234.1829, 
    "ConfigCacheID": "76ef2a5da99194730d69d73d8b52df82", 
    "Memory": "8000", 
    "Multicore": 4, 
    "TaskName": "HIN-HINPbPbSpring18DR-00001_0", 
    "AcquisitionEra": "HINPbPbSpring18DR", 
    "PrimaryDataset": "Hydjet_Quenched_Cymbal5Ev8_PbPbMinBias_5020GeV_2018", 
    "Campaign": "HINPbPbSpring18DR", 
    "InputTask": "HIN-HINPbPbSpring18GS-00001_0", 
    "CMSSWVersion": "CMSSW_10_0_3"
  }, 

But in config:
https://cmsweb.cern.ch/reqmgr2/config?name=vlimant_ACDC0_task_HIN-HINPbPbSpring18GS-00001__v1_T_180312_140104_4651

vlimant_ACDC0_task_HIN-HINPbPbSpring18GS-00001__v1_T_180312_140104_4651.tasks.HIN-HINPbPbSpring18GS-00001_0.input.splitting.performance.memoryRequirement = 4695.0

There might be something broken at the actor side, @vlimant, @areinsvo could you please help me to take a look?

@areinsvo
Copy link

Looking at the actor logs I don't see anything suspicious. @vlimant any idea where I should try to track down the issue? https://cms-unified.web.cern.ch/cms-unified//logs/actor/2018-03-12_14:00:48.log

@vlimant
Copy link

vlimant commented Mar 14, 2018

I think, and @amaltaro will confirm, that for memory to have an effect on ACDC, it has to be set at assignment time.

@vlimant
Copy link

vlimant commented Mar 14, 2018

and it used to "work" because the MaxRSS was updated at assignment time (using "Memory": "8000") while now it's slaved to the spec "memoryRequirement = 4695.0"

@amaltaro
Copy link

It works during creation as well, but mind this small detail:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMSpec/StdSpecs/Resubmission.py#L44

which means, if you're ACDCing a TaskChain workflow, then Memory argument has to have a dictionary value.

@vlimant
Copy link

vlimant commented Mar 14, 2018

@vlimant
Copy link

vlimant commented Mar 14, 2018

wait. do you mean that the value in the nested TaskX do not matter, but the base Memory Parameter has to be a dict with Task:Memory ?

@amaltaro
Copy link

For Resubmission, yes, that's correct! We don't re-evaluate all the parameters and call the setters, ACDC simply truncates the original workload (so there are no attributes changed).
Honestly, I think ACDCs should not support any updates during creation, only during assignment (as it already does).

@vlimant
Copy link

vlimant commented Mar 14, 2018

https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMSpec/StdSpecs/Resubmission.py#L44 and the following lines makes -no sense- what so ever, except that it is a copy paste from the assignment code.

because of a time de-correlation between creation time and assignement time, the change at creation should be allowed and supported without having to do unnatural conversions.

Can you please motivate why it should only be done at assignement time (if not only for practical reason of coding this in wmcore) ?

Bo, at least we know why the ACDC are failing now. we dropped the maxrss overriding by unified.

@areinsvo can you please go ahead and change actor so that it creates a dictionary TaskName:Memory and set payload['Memory'] = that_dictionnary.

@amaltaro
Copy link

https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMSpec/StdSpecs/Resubmission.py#L44 and the following lines makes -no sense- what so ever, except that it is a copy paste from the assignment code.

I'm pretty sure there was a reason to make it that complicated, unfortunately I don't remember and can't find what was that. I'll look at it again and see if we can remove this over-complication.

The reason it shouldn't be supported during creation time is:

  • the concept of ACDC workflow is simply a resubmission of a previously created workflow (read, a clone)
  • we made several of these parameters modifiable during assignment over the last years, as Ops was requesting to have this flexibility
  • having it in two locations, doubles the effort of maintaining that code and making sure it works as expected.

I think these are pretty good reasons ;)

@areinsvo
Copy link

CMSCompOps/WmAgentScripts@0becde6#diff-699b8f6dbca6e1b3cf8365e884aaaf0e
Memory is now passed as a dict for task chains. @prozober please submit a test workflow and let me know if it doesn't work.

@paorozo
Copy link
Author

paorozo commented Mar 16, 2018

@areinsvo, I created this ACDC using our tool https://cmsweb.cern.ch/reqmgr2/config?name=vlimant_ACDC1_task_HIN-HINPbPbSpring18GS-00001__v1_T_180316_112116_2010

I checked it and I saw:
vlimant_ACDC1_task_HIN-HINPbPbSpring18GS-00001__v1_T_180316_112116_2010.tasks.HIN-HINPbPbSpring18GS-00001_0.input.splitting.performance.memoryRequirement = '8000'
The memory requirement is a string, this is not OK it should be an integer. I am not quite sure what kind of failure it will bring @vlimant

@areinsvo
Copy link

You're right. I was missing some int() values. Can you resubmit the action?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants