Click on the link for the updated version of the Lab

Lab - Build a Cluster: install Scheduler

Objective: SLURM scheduler installation and configuration on the head and compute nodes by using Ansible playbook.

Steps:

  • Create SLURM user & group.

  • Create directories for SLURM services and log files.

  • Build & Install SLURM packages on the head node.

  • Setup munge on the head node.

  • Create mysql database for SLURM accounting on the head node.

  • Start slurmdbd and slurmctld on the head node.

  • Copy munge key to the compute nodes.

  • Install the SLURM packages on the compute nodes.

  • Start slurmd on the compute nodes.


Create SLURM user, group, and directories

The SLURM services, including slurmctld, slurmdbd, and slurmd, should run as user slurm. Therefore, we need to create group and account slurm on all the nodes.

The SLURM services need the following directories with user slurm ownership:

  • /var/spool/slurm

  • /var/spool/slurmctld

  • /var/spool/slurm/cluster_state

  • /var/log/slurm

In your build.yml playbook, add slurm-user role:

    - slurm-user

Run the playbook:

ansible-playbook build.yml

Check if user slurm has been created:

ansible all -m ansible.builtin.shell -a 'id slurm'

Build and install SLURM on the head node

The latest SLURM source can be downloaded from schemd.com, compiled, and packaged into rpm or deb on the head node, then installed.

The head node already has all the development tools for that.

Add the role for the SLURM package build. On Rocky head node:

    - slurm-rpm-build

On Ubuntu head node:

    - slurm_build

Run build.yml playbook:

ansible-playbook build.yml

Setup munge on the head node

Munge is used for SLURM service authentication between the head and compute nodes.

Munge can be installed with dnf or apt.

The key can be generated by the command below:

/usr/bin/dd if=/dev/urandom bs=1 count=1024 of=munge.key

Then copied into /etc/munge directory, and assigned ownership munge with 400 permission.

This is accomplished by role head-node_munge_key in the playbook:

    - head-node_munge_key

Commit it with

ansible-playbook build.yml

Check if munge is running on the cluster:

ansible all -m ansible.builtin.shell -a 'systemctl status munge'

Create mysql database for SLURM accounting

For SLURM to store the associations, accounts, QOS, job accounting, etc, it needs database slurm_acct_db created in MySQL.

Ansible playbook entry for the head node play:

   - slurmdbd

Run the playbook:

ansible-playbook build.yml

Copy munge key and install the SLURM packages on the compute nodes

Copy the munge key to the compute nodes.

Copy the rpm/deb slurm packages to the compute nodes, install them.

Copy the configuration files into /etc/slurm.

Start slurmd on the nodes.

The roules to accomplish these tasks:

    - compute-node_munge_key
    - slurmd

The final playbook, build.yml on Rocky linux:

On Ubuntu linux:

Run the playbook:

ansible-playbook build.yml

Check if you can run SLURM commands:

sinfo -Nl
scontrol show node compute2

Your HPC cluster is completely functional, ready for HPC application installation.