Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Slurm API logic #50

Open
djperrefort opened this issue Nov 19, 2023 · 9 comments
Open

Add Slurm API logic #50

djperrefort opened this issue Nov 19, 2023 · 9 comments

Comments

@djperrefort
Copy link
Member

There are placeholder functions in the allocations/tasks.py module for interfacing with Slurm, but the underlying logic is still missing.

@Comeani
Copy link
Contributor

Comeani commented Dec 11, 2023

To give us an idea of what functions we'll need, the table below provides a set of functionality that the bank uses to interact with SLURM, and the corresponding command line tools/syntax we've been wrapping to accomplish that:

Bank Functionality SLURM Tool Command
Gather Cluster Names sacctmgr show clusters format=Cluster –noheader –parsable2
Gather Partition Names sinfo -M {cluster} -o “%P” –noheader
Check if a SLURM account exists sacctmgr -n show assoc account={account_name}
Get the locked state of a cluster sacctmgr -n -P show assoc account={account_name} format=GrpTresRunMins clusters={cluster}, check whether “cpu=0” is inside the output
Set the locked state of a cluster sacctmgr -i modify account where account={account_name} cluster={cluster} set GrpTresRunMins=cpu=0 or -1 for CPUs and sacctmgr -i modify account where account={account_name} cluster={cluster} set GrpTresRunMins=gres/gpu=0 or -1 for GPUs
Get usage per user on a cluster sreport cluster AccountUtilizationByUser -Pn -T Billing -t Hours cluster={cluster} Account={account_name} start={start.strftime(‘%Y-%m-%d’)} end={end.strftime(‘%Y-%m-%d’)} format=Proper,Used
Get the total usage on a cluster Same command as getting the user usage, just sum over the users

I haven't spent a significant amount of time looking at the API specification yet to verify how much of these are already available to us in that format.

Some questions/comments that came up while I was looking into this:

  • If we are already going to be setting GrpTresRunMins for locking/unlocking, we should consider just using GrpTresRunMins to serve as the SUs remaining and let SLURM keep track, instead of an adhoc account status update.

  • Sreport only gives usage for users with usage over 1 hour of TRES usage, they are not listed otherwise. The user list may only be a subset of the following:

[root@moss ~]# sacctmgr show association Account=sam where cluster=smp format=User
      User 
---------- 
           
  bmooreii 
     chx33 
     djp81 
  fangping 
      fis7 
   gnowmik 
      jar7 
     ketan 
   kimwong 
    leb140 
    mak189 
     nlc60 
    sak236 
    shs159 
     yak73

We may need to get the full user list associated with a SLURM allocation separately like the above, and attribute usage only to the users who actually have it in a "usage per user" overview of the allocations in keystone.

  • In some video content I found demoing the API, Nick Ihli explains that the API is intended for distributed systems, not websites, so it API calls don't use HTTPS by default and should correspondingly be TLS wrapped.

  • In the same video, Nick explains that much of the functionality around interacting with associations and job accounting is available in the API, but that roll-up report generation (sreport) is not yet present. The video is from Jan 27, 2022 so it's possible they've been working on this since then (I can't remember if he explicitly mentioned their status on this at SC) We may also be able to hack together the info from individual job accounting across a time duration for all users in a specific association.

@Comeani
Copy link
Contributor

Comeani commented Dec 11, 2023

Slides from the most recent SLURM User Group meeting talk on the state of the REST API: https://slurm.schedmd.com/SLUG23/REST-API-SLUG23.pdf
(no mention of sreport ☹️)

@chnixi
Copy link
Contributor

chnixi commented Dec 12, 2023

Slides from the most recent SLURM User Group meeting talk on the state of the REST API: https://slurm.schedmd.com/SLUG23/REST-API-SLUG23.pdf (no mention of sreport ☹️)

We could just use sacctmgr show to display the information instead of sreport?

@Comeani
Copy link
Contributor

Comeani commented Dec 12, 2023

The sacctmgr functionality can show you all the information about associations, the users within them, and any properties/limits that have been imposed. It does not have any interaction with the job accounting.

A possible workaround to use the API to gather info equivalent to sacct which can show you individual job accounting, which we could build custom logic around to sum across a user in the duration of their groups proposal:

[nlc60@moss sam_scripts] main : sacct -p -M smp -X -u chx33 -S 2023-04-26 --format=AllocTres | grep billing
billing=9,cpu=12,mem=48216M,node=1|
billing=9,cpu=12,mem=48216M,node=1|
billing=9,cpu=12,mem=48216M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=9,cpu=12,mem=48216M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=9,cpu=12,mem=48216M,node=1|
billing=9,cpu=12,mem=48216M,node=1|
billing=9,cpu=12,mem=48216M,node=1|
billing=9,cpu=12,mem=48216M,node=1|
billing=9,cpu=12,mem=48216M,node=1|
billing=9,cpu=12,mem=48216M,node=1|
billing=9,cpu=12,mem=48216M,node=1|
billing=9,cpu=12,mem=48216M,node=1|
billing=9,cpu=12,mem=48216M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=3,cpu=4,mem=1G,node=1|
billing=9,cpu=12,mem=1G,node=1|
billing=9,cpu=12,mem=1G,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=1,cpu=2,mem=1G,node=1|
billing=25,cpu=32,mem=1G,node=1|
billing=25,cpu=32,mem=1G,node=1|
billing=25,cpu=32,mem=1G,node=1|
billing=25,cpu=32,mem=1G,node=1|
billing=25,cpu=32,mem=1G,node=1|
billing=1,cpu=2,mem=12G,node=1|
billing=25,cpu=32,mem=128576M,node=1|
billing=25,cpu=32,mem=128576M,node=1|
billing=3,cpu=4,mem=16072M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=9,cpu=12,mem=48216M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=11,cpu=14,mem=56252M,node=1|
billing=12,cpu=15,mem=60270M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=12,cpu=16,mem=1G,node=1|
billing=25,cpu=32,mem=1G,node=1|
billing=19,cpu=24,mem=1G,node=1|
billing=12,cpu=16,mem=1G,node=1|
billing=19,cpu=24,mem=96432M,node=1|
billing=25,cpu=32,mem=128576M,node=1|
billing=9,cpu=12,mem=48216M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=12,cpu=15,mem=60270M,node=1|
billing=14,cpu=18,mem=72324M,node=1|
billing=12,cpu=15,mem=60270M,node=1|
billing=14,cpu=18,mem=72324M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=12,cpu=15,mem=60270M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=19,cpu=24,mem=96432M,node=1|
billing=25,cpu=32,mem=128576M,node=1|
billing=12,cpu=16,mem=1G,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=25,cpu=32,mem=128576M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=12,cpu=8,mem=120G,node=1|
billing=3,cpu=4,mem=1G,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=8,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=25,cpu=32,mem=128576M,node=1|
billing=25,cpu=32,mem=128576M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=1,cpu=2,mem=8036M,node=1|
billing=3,cpu=4,mem=16072M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=3,cpu=4,mem=16072M,node=1|
billing=3,cpu=4,mem=16072M,node=1|
billing=3,cpu=4,mem=16072M,node=1|
billing=3,cpu=4,mem=16072M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=3,cpu=4,mem=16072M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=3,cpu=4,mem=16072M,node=1|
billing=6,cpu=8,mem=32144M,node=1|
billing=1,cpu=2,mem=8036M,node=1|
billing=3,cpu=4,mem=16072M,node=1|
billing=3,cpu=4,mem=16072M,node=1|
billing=3,cpu=4,mem=16072M,node=1|
billing=51,cpu=64,mem=257152M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=3,cpu=4,mem=1G,node=1|
billing=3,cpu=4,mem=1G,node=1|
billing=3,cpu=4,mem=1G,node=1|
billing=3,cpu=4,mem=1G,node=1|
billing=3,cpu=4,mem=1G,node=1|
billing=3,cpu=4,mem=1G,node=1|
billing=3,cpu=4,mem=1G,node=1|
billing=25,cpu=32,mem=1G,node=1|
billing=6,cpu=8,mem=1G,node=1|
billing=6,cpu=8,mem=1G,node=1|
billing=25,cpu=32,mem=128576M,node=1|
billing=3,cpu=4,mem=16072M,node=1|
billing=1,cpu=2,mem=8036M,node=1|
billing=9,cpu=12,mem=48216M,node=1|
billing=9,cpu=12,mem=48216M,node=1|
billing=9,cpu=12,mem=48216M,node=1|
billing=9,cpu=12,mem=48216M,node=1|
billing=9,cpu=12,mem=48216M,node=1|
billing=4,cpu=6,mem=1G,node=1|
billing=19,cpu=24,mem=1G,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=51,cpu=64,mem=1G,node=1|
billing=3,cpu=4,mem=1G,node=1|
billing=3,cpu=4,mem=16072M,node=1|
billing=3,cpu=4,mem=16072M,node=1|
billing=9,cpu=12,mem=48216M,node=1|
billing=3,cpu=4,mem=1G,node=1|
billing=3,cpu=4,mem=1G,node=1|
billing=3,cpu=4,mem=1G,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=22,cpu=28,mem=112504M,node=1|
billing=22,cpu=28,mem=112504M,node=1|
billing=22,cpu=28,mem=112504M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=12,cpu=16,mem=64288M,node=1|
billing=25,cpu=32,mem=128576M,node=1|
billing=38,cpu=48,mem=192864M,node=1|
billing=51,cpu=64,mem=257152M,node=1|
billing=26,cpu=32,mem=256G,node=1|
billing=26,cpu=12,mem=256G,node=1|
billing=48,cpu=32,mem=1T,node=1|
billing=25,cpu=32,mem=128576M,node=1|
billing=25,cpu=32,mem=128576M,node=1|
billing=1,cpu=2,mem=8036M,node=1|
billing=1,cpu=2,mem=8036M,node=1|
billing=1,cpu=2,mem=8036M,node=1|
billing=26,cpu=12,mem=256G,node=1|
billing=26,cpu=12,mem=256G,node=1|
billing=26,cpu=12,mem=256G,node=1|
billing=26,cpu=12,mem=256G,node=1|
billing=26,cpu=12,mem=256G,node=1|
billing=26,cpu=12,mem=256G,node=1|
billing=6,cpu=8,mem=12G,node=1|
billing=26,cpu=12,mem=256G,node=1|
billing=4,cpu=6,mem=1G,node=1|

I believe we'd have to sum the billing number here to get usage in TRES hours since the start of the proposal (although this could include cancelled jobs, I haven't verified against the sreport output yet), and them sum across users to get the cluster total for the group.

The ideal would be SchedMD incorporating the rollup stats into the API, we'd simply display the Used column, and sum them to the cluster total.

[nlc60@moss ~] : sreport cluster AccountUtilizationByUser -T Billing -t Hours cluster=smp Account=sam start=2023-04-26
--------------------------------------------------------------------------------
Cluster/Account/User Utilization 2023-04-26T00:00:00 - 2023-12-11T23:59:59 (19875600 secs)
Usage reported in TRES Hours
--------------------------------------------------------------------------------
  Cluster         Account     Login     Proper Name      TRES Name     Used 
--------- --------------- --------- --------------- -------------- -------- 
      smp             sam                                  billing     1212 
      smp             sam     chx33           chx33        billing      816 
      smp             sam     djp81           djp81        billing        0 
      smp             sam  fangping        fangping        billing        4 
      smp             sam      jar7            jar7        billing        0 
      smp             sam   kimwong         kimwong        billing       41 
      smp             sam    leb140          leb140        billing      201 
      smp             sam     nlc60           nlc60        billing        0 
      smp             sam    sak236          sak236        billing        0 
      smp             sam     yak73           yak73        billing      150

@Comeani
Copy link
Contributor

Comeani commented Dec 12, 2023

greping and awking my way to a sum of those values after billing= does not yield 816 billing hours... We'd need to do some further digging to see how sreport comes to it's total.

@djperrefort
Copy link
Member Author

If we use trackable resource limits to enforce account usage limits, the following equation will give the new resource limit when a proposal expires:

new_limit = old_limit - min(0, proposal_sus + historical_usage + initial_usage - total_usage)

Where:

  • proposal_sus is the number of resources awarded by the expiring proposal
  • historical_usage is the total number of su's used under previous (already expired) proposals
  • initial_usage is the number of resources used by the account before the application was deployed
  • total_usage is the total resources used by the account

@djperrefort
Copy link
Member Author

@Comeani
Copy link
Contributor

Comeani commented Jan 23, 2024

Bank Functionality SLURM Tool Command Corresponding Rest API endpoint
Gather Cluster Names sacctmgr show clusters format=Cluster –noheader –parsable2 /slurmdb/v0.0.38/clusters
Check if a SLURM account exists sacctmgr -n show assoc account={account_name} /slurmdb/v0.0.40/associations
Get the GrpTresRunMins limit for an account on a cluster sacctmgr -n -P show assoc account={account_name} format=GrpTresRunMins clusters={cluster}, check whether “cpu=0” is inside the output /slurmdb/v0.0.38/associations
Set the locked state of a cluster sacctmgr -i modify account where account={account_name} cluster={cluster} set GrpTresRunMins=cpu=0 or -1 for CPUs and sacctmgr -i modify account where account={account_name} cluster={cluster} set GrpTresRunMins=gres/gpu=0 or -1 for GPUs /slurmdb/v0.0.38/associations
Get usage per user on a cluster sreport cluster AccountUtilizationByUser -Pn -T Billing -t Hours cluster={cluster} Account={account_name} start={start.strftime(‘%Y-%m-%d’)} end={end.strftime(‘%Y-%m-%d’)} format=Proper,Used TODO: see if associations endpoint can provide this somewhere with non-null usage (teach?)
Get the total usage on a cluster Same command as getting the user usage, just sum over the users TODO: see if associations endpoint can provide this somewhere with non-null usage (teach?)

@djperrefort
Copy link
Member Author

Looks like the Slurm API isn't mature enough for our needs yet. See #140 for a temporary workaround. Moving this to backlog until the API is further along.

@Comeani Comeani added the enhancement New feature or request label Mar 5, 2024
@djperrefort djperrefort removed the enhancement New feature or request label Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants