Configuration Management

Configuration Management¶

Linux Cluster Institute:
Introduction to Configuration Management

Alexei Kotelnikov, PhD School of Engineering, Rutgers University https://ecs.rutgers.edu

Outline of the topic:¶

Major HPC installation types and configuration management challenges
Detail what configuration management is and why it is useful
Review the current landscape of available tools
Ansible basics and examples:
- configuration files
- playbooks
- commands for remote tasks

Major HPC installation types and management challenges¶

Traditional diskful (“stateful”) compute nodes¶

The operating system and applications reside on the local system drive.
They are preserved at system reboot.
Configuration management challenge:
- keeping the OS and apps identical across the cluster nodes.
Configuration tools: ansible, puppet, chef, salt, cfengine.
This is the type of installation we have here, at the LCI workshop.

Network booted diskless (“stateless”) compute nodes¶

Operating system boots up from network.
The root file system / resides either in ramdisk or nfs-root (ro-mounted).
Applications can be either in ramdisk, or on nfs (ro-mounted), or on local disk (satelite installation).
Configuration management challenges:
- ramdisk (initrd) configuration
- possible excessive network traffic
- nfs-root redundancy and caching
- maintain identical software content on a local disk for satelite systems.
Configuration tools: initramfs-tools, dracut, xCAT, Warewulf3 (outdated).

Defining Configuration Management¶

At its broadest level, the management and maintenance of operating system and service configuration via code instead of manual intervention.

More formally:

Declaring the system state in a repeatable and auditable fashion and using tools to impose state and prevent systems from deviating

State¶

All system have a ‘state’ comprised of:

Files on Disk
Running services

State can be supplied by:

Installation / provisioning systems
Golden Images
Manual steps including direct configuration changes and setup scripts

Modern Configuration Management Features¶

Idempotency
- Declaration and management of files and services to reach a ‘desired state’
Revision Control
- Systems are managed with an ‘Infrastructure as code’ model
Composable and flexible

Benefits of configuration management¶

Centralized catalog of all system configuration
Automated enforcement of system state from an authoritative source
Ensured consistency between systems
Rapid system provisioning from easily-composed components
Preflight tests to ensure deployments generate expected results
Collection of system ‘ground truths’ for better decision making

Modern configuration-management systems¶

Puppet
- Ruby based
Chef Infra
- Ruby based
CFEngine
- C based
Salt
- Python based
Ansible
- Python based

How Ansible works¶

Ansible connects to compute nodes via ssh as a regular user.
- Needs either public-private key for the user running ansible, or host based authentication configured.
Forks several instances to ssh concurrently to multiple nodes.
Elevates root privileges via sudo.
- Needs sudo privilege on the nodes for a user running ansible.
Runs configuration/management tasks via python modules.
- Needs python3 as well as Ansible python modules (ansible-core) installed.
The tasks are defined in Ansible playbooks (yaml files).
- The admin needs to understand yaml syntax for ansible tasks.
Doesn’t touch the system and configuration if they are already in the desired final target state.

A simple Ansible setup example¶

Ansible file structure for MPI installation on the compute nodes. All the files are under:

Lab_MPI
       \Ansible
        ├── ansible.cfg
        ├── Files
        │   └── openmpi-4.1.5-1.el8.x86_64.rpm
        ├── hosts.ini
        ├── install_mpi.yml
        └── setup_mpiuser.yml

The main config file, `ansible.cfg` on our cluster¶

[defaults]
inventory = hosts.ini
remote_user = instructor
host_key_checking = false
remote_tmp = /tmp/.ansible/tmp
interpreter_python = /bin/python3
forks = 4

[privilege_escalation]
become = true
become_method = sudo
become_user = root
become_ack_pass = false

Inventory (hosts) file `hosts.ini`¶

[all_nodes]
compute1
compute2
compute3
compute4

[head]
localhost ansible_connection=local

Package installation example, `install_mpi.yml`¶

The file can have any name.
The extension is .yaml.
The configuration syntax is Yaml.
Ansible spools the rpm file, openmpi-4.1.5-1.el8.x86_64.rpm, into local /tmp directory on the nodes, then installs it.
All the work is done by the tasks:

---
- name: Install a package on the head and compute nodes
  hosts: head, all_nodes
  gather_facts: no
  tasks:

    - name: copy mpi rpm file
      ansible.builtin.copy:
        src: Files/openmpi-4.1.5-1.el8.x86_64.rpm
        dest: /tmp
        owner: root
        group: root
        mode: '0644'

    - name: install openmpi
      ansible.builtin.dnf:
        name: /tmp/openmpi-4.1.5-1.el8.x86_64.rpm
        disable_gpg_check: yes
        state: present

Ansible organization: playbook, play, role, and task.¶

Plays are associated with groups of hosts in the inventory.
Roles contain collections of reusable tasks.
Tasks perform all the work by utilizing modules.

Ansible modules¶

The modules are used by tasks to do work.
The most commonly used modules:
- copy files: ansible.builtin.copy
- set file attributes: ansible.builtin.file
- install packages: ansible.builtin.dnf and ansible.builtin.apt
- execute shell commands: ansible.builtin.shell
- restart a service: ansible.builtin.service
To see all installed on your system modules:
```
ansible-doc -l
```
Read the info on a specific module, for example, ansible.builtin.file:
```
ansible-doc   ansible.builtin.file
```

Examples of utilizing `ansible.builtin.shell` module for a remote command¶

Use module ansible.builtin.shell for a remote command.
For example, check the status of slurmd on all the nodes:

ansible all -m ansible.builtin.shell -a "systemctl status slurmd"

Restart slurmd on compute1:

ansible compute1 -m ansible.builtin.shell -a "systemctl restart slurmd"

Ansible playbook development steps¶

Set Configuration files: ansible.cfg and hosts.ini.
Identify groups of hosts to execute identical tasks on (plays)
Define the top-level playbook tasks (roles).
Add features you need in new yaml files.
Place configuration files and packages in the files directories for the roles.
Tag the tasks for debugging purposes.
Check the playbooks for syntactic errors:

ansible-playbook playbook.yml --syntax-check

Perform a dry run:

ansible-playbook playbook.yml --check

List the tagged tasks:

ansible-playbook --list-tags playbook.yml

Run only the tagged task in the playbook with tag compilation, for example:

ansible-playbook --with-tags compilation playbook.yml

Run the playbook:

ansible-playbook playbook.yml

References¶

Ansible documentation and HOWTOs

Questions and discussion:¶

OpenACC

Lmod

Introductory LCI workshop 2023

Configuration Management

Contents

Configuration Management¶

Outline of the topic:¶

Major HPC installation types and management challenges¶

Traditional diskful (“stateful”) compute nodes¶

Network booted diskless (“stateless”) compute nodes¶

Defining Configuration Management¶

State¶

Modern Configuration Management Features¶

Benefits of configuration management¶

Modern configuration-management systems¶

How Ansible works¶

A simple Ansible setup example¶

The main config file, `ansible.cfg` on our cluster¶

Inventory (hosts) file `hosts.ini`¶

Package installation example, `install_mpi.yml`¶

Ansible organization: playbook, play, role, and task.¶

Ansible modules¶

Examples of utilizing `ansible.builtin.shell` module for a remote command¶

Ansible playbook development steps¶

References¶

Questions and discussion:¶

Introductory LCI workshop 2023

Configuration Management

Contents

Configuration Management¶

Outline of the topic:¶

Major HPC installation types and management challenges¶

Traditional diskful (“stateful”) compute nodes¶

Network booted diskless (“stateless”) compute nodes¶

Defining Configuration Management¶

State¶

Modern Configuration Management Features¶

Benefits of configuration management¶

Modern configuration-management systems¶

How Ansible works¶

A simple Ansible setup example¶

The main config file, ansible.cfg on our cluster¶

Inventory (hosts) file hosts.ini¶

Package installation example, install_mpi.yml¶

Ansible organization: playbook, play, role, and task.¶

Ansible modules¶

Examples of utilizing ansible.builtin.shell module for a remote command¶

Ansible playbook development steps¶

References¶

Questions and discussion:¶

The main config file, `ansible.cfg` on our cluster¶

Inventory (hosts) file `hosts.ini`¶

Package installation example, `install_mpi.yml`¶

Examples of utilizing `ansible.builtin.shell` module for a remote command¶