Configuration Management




Linux Cluster Institute:

Introduction to Configuration Management



Alexei Kotelnikov, PhD
School of Engineering, Rutgers University
https://ecs.rutgers.edu

Outline of the topic:


  • Major HPC installation types and configuration management challenges

  • Detail what configuration management is and why it is useful

  • Review the current landscape of available tools

  • Ansible basics and examples:

    • configuration files

    • playbooks

    • commands for remote tasks

Major HPC installation types and management challenges

Traditional diskful (“stateful”) compute nodes

  • The operating system and applications reside on the local system drive.

  • They are preserved at system reboot.

  • Configuration management challenge:

    • keeping the OS and apps identical across the cluster nodes.

  • Configuration tools: ansible, puppet, chef, salt, cfengine.

  • This is the type of installation we have here, at the LCI workshop.

Network booted diskless (“stateless”) compute nodes

  • Operating system boots up from network.

  • The root file system / resides either in ramdisk or nfs-root (ro-mounted).

  • Applications can be either in ramdisk, or on nfs (ro-mounted), or on local disk (satelite installation).

  • Configuration management challenges:

    • ramdisk (initrd) configuration

    • possible excessive network traffic

    • nfs-root redundancy and caching

    • maintain identical software content on a local disk for satelite systems.

  • Configuration tools: initramfs-tools, dracut, xCAT, Warewulf3 (outdated).

Defining Configuration Management


At its broadest level, the management and maintenance of operating system and service configuration via code instead of manual intervention.

More formally:

  • Declaring the system state in a repeatable and auditable fashion and using tools to impose state and prevent systems from deviating

State


All system have a ‘state’ comprised of:

  • Files on Disk

  • Running services

State can be supplied by:

  • Installation / provisioning systems

  • Golden Images

  • Manual steps including direct configuration changes and setup scripts

Modern Configuration Management Features


  • Idempotency

    • Declaration and management of files and services to reach a ‘desired state’

  • Revision Control

    • Systems are managed with an ‘Infrastructure as code’ model

  • Composable and flexible

Benefits of configuration management


  • Centralized catalog of all system configuration

  • Automated enforcement of system state from an authoritative source

  • Ensured consistency between systems

  • Rapid system provisioning from easily-composed components

  • Preflight tests to ensure deployments generate expected results

  • Collection of system ‘ground truths’ for better decision making

Modern configuration-management systems


  • Puppet

    • Ruby based

  • Chef Infra

    • Ruby based

  • CFEngine

    • C based

  • Salt

    • Python based

  • Ansible

    • Python based

How Ansible works


  • Ansible connects to compute nodes via ssh as a regular user.

    • Needs either public-private key for the user running ansible, or host based authentication configured.

  • Forks several instances to ssh concurrently to multiple nodes.

  • Elevates root privileges via sudo.

    • Needs sudo privilege on the nodes for a user running ansible.

  • Runs configuration/management tasks via python modules.

    • Needs python3 as well as Ansible python modules (ansible-core) installed.

  • The tasks are defined in Ansible playbooks (yaml files).

    • The admin needs to understand yaml syntax for ansible tasks.

  • Doesn’t touch the system and configuration if they are already in the desired final target state.

A simple Ansible setup example


  • Ansible file structure for MPI installation on the compute nodes. All the files are under:

Lab_MPI
       \Ansible
        ├── ansible.cfg
        ├── Files
        │   └── openmpi-4.1.5-1.el8.x86_64.rpm
        ├── hosts.ini
        ├── install_mpi.yml
        └── setup_mpiuser.yml

The main config file, ansible.cfg on our cluster



[defaults]
inventory = hosts.ini
remote_user = instructor
host_key_checking = false
remote_tmp = /tmp/.ansible/tmp
interpreter_python = /bin/python3
forks = 4

[privilege_escalation]
become = true
become_method = sudo
become_user = root
become_ack_pass = false

Inventory (hosts) file hosts.ini



[all_nodes]
compute1
compute2
compute3
compute4

[head]
localhost ansible_connection=local

Package installation example, install_mpi.yml


  • The file can have any name.

  • The extension is .yaml.

  • The configuration syntax is Yaml.

  • Ansible spools the rpm file, openmpi-4.1.5-1.el8.x86_64.rpm, into local /tmp directory on the nodes, then installs it.

  • All the work is done by the tasks:

---
- name: Install a package on the head and compute nodes
  hosts: head, all_nodes
  gather_facts: no
  tasks:

    - name: copy mpi rpm file
      ansible.builtin.copy:
        src: Files/openmpi-4.1.5-1.el8.x86_64.rpm
        dest: /tmp
        owner: root
        group: root
        mode: '0644'

    - name: install openmpi
      ansible.builtin.dnf:
        name: /tmp/openmpi-4.1.5-1.el8.x86_64.rpm
        disable_gpg_check: yes
        state: present

Ansible organization: playbook, play, role, and task.



  • Plays are associated with groups of hosts in the inventory.

  • Roles contain collections of reusable tasks.

  • Tasks perform all the work by utilizing modules.

Ansible modules


  • The modules are used by tasks to do work.

  • The most commonly used modules:

    • copy files: ansible.builtin.copy

    • set file attributes: ansible.builtin.file

    • install packages: ansible.builtin.dnf and ansible.builtin.apt

    • execute shell commands: ansible.builtin.shell

    • restart a service: ansible.builtin.service

    To see all installed on your system modules:

    ansible-doc -l
    

    Read the info on a specific module, for example, ansible.builtin.file:

    ansible-doc   ansible.builtin.file
    

Examples of utilizing ansible.builtin.shell module for a remote command



  • Use module ansible.builtin.shell for a remote command.

  • For example, check the status of slurmd on all the nodes:

ansible all -m ansible.builtin.shell -a "systemctl status slurmd"
  • Restart slurmd on compute1:

ansible compute1 -m ansible.builtin.shell -a "systemctl restart slurmd"

Ansible playbook development steps



  • Set Configuration files: ansible.cfg and hosts.ini.

  • Identify groups of hosts to execute identical tasks on (plays)

  • Define the top-level playbook tasks (roles).

  • Add features you need in new yaml files.

  • Place configuration files and packages in the files directories for the roles.

  • Tag the tasks for debugging purposes.

  • Check the playbooks for syntactic errors:

ansible-playbook playbook.yml --syntax-check
  • Perform a dry run:

ansible-playbook playbook.yml --check
  • List the tagged tasks:

ansible-playbook --list-tags playbook.yml
  • Run only the tagged task in the playbook with tag compilation, for example:

ansible-playbook --with-tags compilation playbook.yml
  • Run the playbook:

ansible-playbook playbook.yml

Questions and discussion: