Slurm

Qlustar is missing the Slurm man pages. You'll (have to) find information onlineâ„¢.

The bulk part of the cluster are the 2021-era EPYC nodes. They are grouped in the rome queue. Ever since their initial installation, there were (probably thermal) issues that most nodes can endure full load for 24/7, but not all (see logbook for more details). On the slurm side of affairs, we are taking care of this fact by modifying their Weight (scheduling priority). In QluMan, we have the Slurm Node Groups EPYC and EPYC-HiPrio. Latter are scheduled to compute jobs first, and only when all are occupied, Slurm will start to assign jobs to EPYC nodes.

There is hn-1:/apps/local/setup/nodes/restart_stale_nodes.sh. However, sometimes nodes also go into DRAIN state without crashing, so they are online from the IPMI perspective, but don't get scheduled for users. UNDRAIN them (sometimes a ssh sun-XX systemctl restart slurmd is necessary too).

It is important to realise that Slurm has its own user accounting, which is more or less independent from the Unix accounts (as supplied by FZJ AD or the Qlustar LDAP server on the head node).

Accessibility to queues is regulated on a Slurm account basis. Each Slurm account usually contains many Unix users.

You have to populate these to both directions (configure partitions to obey to group regulations, and affiliate user accounts with these groups).

Convention:

Slurm Account LDAP group Scope
hi-ern hi-ern People with FZJ contracts (PhD & higher, HiWi), HI ERN students (official project, BSc, MSc)
fau-puls fau-puls People from the PULS group (Ana Smith, FAU)
guests guests People without FZJ contracts (external cooperations)

Configuration

Keep in mind:

  • Membership in a Unix/LDAP group is not sufficient for cluster access
  • Always check the → Accounting → Manage accounts dialog if users are associated with the right Slurm account(s)
  • Don't forget to upgrade user accounts when the type of affiliation ascends (i. e. visitor → Postdoc with contract)
  • compflu/backstage/slurm.txt
  • Last modified: 2023-06-09 14:16
  • by j.hielscher