Skip to main content

Ansible Linux OS patching

· 7 min read
VoidQuark
Open-source enthusiast

Ansible Linux OS patching

Everyone knows that patching packages are not fun. I decided to create an Ansible role that can patch Enterprise Linux ( RHEL ) and other Red Hat derivatives (e.g. CentOS, Rocky, Alma, Fedora ). This role support 3 modes for patching.

  1. Patch all OS packages to the latest version
  2. Apply all security patches
  3. Apply all bugfix patches

Sometimes restart is necessary. In that case, yum-utils (dnf-utils) provide an excellent utility “needs-restarting”. This utility can inform us if a restart is required. If we mix all this we realize that automated patching is not that hard.

Why is OS patching essential?

Everyone knows that the easiest path is to not patch anything. It does not require any action and if the application is running everyone is happy. My suggestion is to create the correct patching strategy and stick to it. Patching strategy is good practice and from a long-term perspective, you will have less pain than updating a legacy system that was not touched for a few years.

I prepared four easy rules which help you to prepare a patching strategy:

  1. Ensure that you have visibility of which systems require patching.
  2. Make sure you have calculated risk and potential impact if you patch or do not patch your systems.
  3. Test patches in the development/staging environment before pushing them into production.
  4. Patch often.

I want to show you that automating OS patching on multiple systems is not magic. Let's do some checks on what will be required to accomplish that.

  1. Ansible Inventory
  2. Ansible EL_Patching role
  3. Playbook
  4. AWX/Red Hat Automation Platform or Ansible controler host or Workstation

Directory structure example

The example should illustrate where is located playbook, inventory, and role.

.
├── automation
│   ├── function_el-patching_deploy.yml #Playbook
│   └── roles
│   └── el_patching
│   ├── defaults
│   │   └── main.yml
│   ├── LICENSE
│   ├── meta
│   │   └── main.yml
│   ├── README.md
│   └── tasks
│   └── main.yml
└── inventory
├── group_vars
│   ├── el_patching_all
│   │   └── el_patching_vars.yml
│   ├── el_patching_all_reboot
│   │   └── el_patching_vars.yml
│   ├── el_patching_bugfix
│   │   └── el_patching_vars.yml
│   └── el_patching_security
│   └── el_patching_vars.yml
├── hosts
└── host_vars
├── apache1.voidquark.com
│   └── host_vars.yml
├── nginx1.voidquark.com
│   └── host_vars.yml
├── nginx2.voidquark.com
│   └── host_vars.yml
└── postgresql1.voidquark.com
└── host_vars.yml

Ansible inventory example

I decided to use four groups to demonstrate what this role does.

Content of inventory/hosts:

[el_patching_all]
nginx1.voidquark.com

[el_patching_bugfix]
nginx2.voidquark.com

[el_patching_security]
postgresql1.voidquark.com

[el_patching_all_reboot]
apache1.voidquark.com

Let me explain the purpose of each group:

  • el_patching_all Patch all packages to the latest version and do not reboot the host
  • el_patching_bugfix Apply all bugfix patches and do not reboot the host
  • el_patching_security Apply all security patches and do not reboot the host
  • el_patching_all_reboot Patch all packages to the latest version and perform a reboot if required

Keep in mind these groups are pure examples and production can look differently based on strategy and specific requirements.

#CONTENT OF: inventory/group_vars/el_patching_all/el_patching_vars.yml
---
el_patching_method: "all"
#CONTENT OF: inventory/group_vars/el_patching_bugfix/el_patching_vars.yml
---
el_patching_method: "bugfix"
#CONTENT OF: inventory/group_vars/el_patching_security/el_patching_vars.yml
---
el_patching_method: "security"
#CONTENT OF: inventory/group_vars/el_patching_all_reboot/el_patching_vars.yml
---
el_patching_method: "all"
el_patching_auto_reboot: true

Ansible Enterprise Linux patching role

This role is simple as it does patching and then verifies if a reboot is required. Once the auto reboot is enabled and verification report that reboot is required then the host is rebooted. The Ansible dnf module does patching.

Role also support check_mode. It is useful if you want to simulate patching. Keep in mind check_mode can't predict if a reboot is required. It only shows you how this role is executed on multiple hosts.

I recommended reading README.md which is part of the role repository!

Ansible Playbook example

---
- name: Apply OS Patches
hosts: el_patching_all:el_patching_bugfix:el_patching_security:el_patching_all_reboot
gather_facts: false
become: true
roles:
- el_patching

This playbook is configured to run only on the following groups:

  • el_patching_all
  • el_patching_bugfix
  • el_patching_security
  • el_patching_all_reboot

So this is like a handbrake if you use a different inventory where you have multiple groups and hosts. It only targets hosts inside these four groups.

Execute OS patching

I decided to trigger patching from CLI on my workstation as it should show you how it works. In the real world, you should have at least a cronjob (if you want auto-patch) or use the AWX/Red Hat Automation platform. Keep in mind that you can have multiple playbooks which target different groups at different times.

Executed playbook with ansible-playbook -i inventory/hosts automation/function_el-patching_deploy.yml

PLAY [Apply OS Patches] **********************************************************************

TASK [el_patching : Ensure that need-restarting binary is present] ***
ok: [nginx2.voidquark.com]
ok: [nginx1.voidquark.com]
ok: [postgresql1.voidquark.com]
ok: [apache1.voidquark.com]

TASK [el_patching : Update all packages] ***************************
skipping: [nginx2.voidquark.com]
skipping: [postgresql1.voidquark.com]
changed: [apache1.voidquark.com]
changed: [nginx1.voidquark.com]

TASK [el_patching : Apply security patches only] *******************
skipping: [nginx1.voidquark.com]
skipping: [nginx2.voidquark.com]
skipping: [apache1.voidquark.com]
changed: [postgresql1.voidquark.com]

TASK [el_patching : Apply bugfix patches only] *********************
skipping: [nginx1.voidquark.com]
skipping: [postgresql1.voidquark.com]
skipping: [apache1.voidquark.com]
changed: [nginx2.voidquark.com]

TASK [el_patching : Verify if restart is required] *****************
changed: [nginx1.voidquark.com]
changed: [apache1.voidquark.com]
changed: [postgresql1.voidquark.com]
changed: [nginx2.voidquark.com]

TASK [el_patching : Inform user if reboot is required] *************
skipping: [nginx1.voidquark.com]
skipping: [nginx2.voidquark.com]
skipping: [postgresql1.voidquark.com]
ok: [apache1.voidquark.com] => {
"msg": "Reboot is required to apply patches."
}

TASK [el_patching : Reboot host] ***********************************
skipping: [nginx1.voidquark.com]
skipping: [nginx2.voidquark.com]
skipping: [postgresql1.voidquark.com]
changed: [apache1.voidquark.com]

PLAY RECAP ************************************************************************************************************
apache1.voidquark.com : ok=5 changed=3 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0
nginx1.voidquark.com : ok=3 changed=2 unreachable=0 failed=0 skipped=4 rescued=0 ignored=0
nginx2.voidquark.com : ok=3 changed=2 unreachable=0 failed=0 skipped=4 rescued=0 ignored=0
postgresql1.voidquark.com : ok=3 changed=2 unreachable=0 failed=0 skipped=4 rescued=0 ignored=0

Patching is done and all hosts inside the group were patched as we defined in the inventory.

Result recap in table format:

HostnameApplied patchesRebooted
nginx1.voidquark.comallNo
nginx2.voidquark.combugfixNo
postgresql1.voidquark.comsecurityNo
apache1.voidquark.comallYes

One more thing

Is this role ready for production use?

It depends. Each company has a different patching strategy and the workflow is different. I can’t develop a general role for each patching strategy as it is a specific topic. Take this role as a template. Maybe it fits into your patching strategy and maybe you must tweak the code. The role is free and open source for everyone.

How it can work if we extend this role or combine it with a more complex workflow?

I assume that you have monitoring in place and you should not be notified once auto-patching is in progress. Rather you should be notified if patching was not successful or the host is stuck in the boot process after reboot. How it can work in the real world:

  1. You can have some pre-tasks which could be different per env or application.
  2. Automated silence notifications during reboot to prevent fake alerts from monitoring. Generate metrics from this role for node-exporter textfile collector. This can help us to create a Grafana dashboard about patching status. It can also help to develop new alerting rules.
  3. Once patching is done notifications should be enabled. You can even perform post-tasks that can perform various checks on your system or applications.

Happy Linux patching 🚀🚀🚀🚀

Recommended documentation


Thanks for reading. I'm entering the void. 🛸 ➡️ 🕳️