Ansible at scale 2 of 2

Template (with Jinja2) and files

In an Ansible role, we can use files or templates to achieve similar results for configuration files. If the configuration file is the same across all targets then we can place it in files directory to push out. If the content of configuration file varies depending on the cluster size, we use Jinja2 template. For example, when you configure zookeeper configuration, a first entry may require total number of nodes in the cluster, a second entry may require the hostname of the server itself; and a third entry may require a comma separated line with hostnames of all nodes in the cluster. This is a typical use case of Jinja template.

We need to make sure Jinjas version is above 2.11.2 (as of May 2020) because older version such as 2.7.2 has known issues with namespaces. To check version and then upgrade Jinja2, we need to use pip:

pip show Jinja2
pip install -U Jinja2

The Ansible template module takes Jinja2 file as input and delivers result file on target host. Note that if the template references host variables from Ansible playbook, then you need to gather facts about host. This means you will have to use a basic playbook like below instead of adhoc command.

A basic playbook to test Jinja2 template is:

- hosts: '{{ansible_limit}}'
  gather_facts: yes
  tasks:
  - template:
      src: cassandra_xml.j2
      dest: /tmp/cassandra.xml

Although Jinja2 offers a lot of flexibility with loop and if-else statement, it is just a templating language and not a programming language. It requires some tricks to achieve what you may otherwise easily do with programming language. One example is persisting a variable outside of a loop. As per the document, it is not possible to set variables inside a block and have them show up outside of it. This also applies to loops. The only exception to that rule are if statements which do not introduce a scope. To achieve that, you would have to use namespace, for each loop where you need to access the variable afterwards from outside of the loop.

{% block db_cluster_config_nobackup %}
{% set ns=namespace(nodeid=0) %}
{% for host in groups[my_db_group]|sort %}
   <var name="DBHost{{ns.nodeid+1}}" value="{{hostvars[host].inventory_hostname}}" />
{% set ns.nodeid=ns.nodeid+1 %}
{% endfor %}
   <var name="DBClusterHosts" value="{% for i in range(ns.nodeid) %}
${DBHost{{i+1}}}{% if not loop.last %},{% endif %}
{% endfor %}" />
{% endblock cass_cluster_config_nobackup %}

For the same reason, you might as well clearly define the start and end of each block in order to not run into trouble with scoping behaviours of variables. These limitations makes Jinja2 template not easy to read and may take several rounds of playbook runs to troubleshoot.

Handler vs regular task

Sometimes you only want to run a task when its previous task results a change. Although it can be achieved with variable from previous task and a when clause to specify condition, an easier way to implement this is a handler. The handler can be implemented in yml file in handlers directory in role and the task needs to notify handlers. Document here. Handler is a great way to shorten the size of task or playbook.

Ansible commands

In operation, our engineer needs to run a command on a group of servers. I encourage the use of Ansible adhoc command whenever possible. I recommend start with the following two commands:

ansible-inventory --graph
ansible all -m ping

The ping module triggers an “Ansible ping” to targets in the specified group. Over the years, Ansible community developed many helpful modules, such as yum, yum_repository, apt_rpm, uri, synchronize, fine, copy, etc and many can be used instead of bash command. However, sometimes, the expected Ansible module is either unavailable or missing function. For example, Ansible’s uri module cannot replace curl command with the following switches:

curl -s -XGET http://{{inventory_hostname}}:8080/objects/{{object_id}}/binary/all -o /dev/null -w '%{response_code} %{size_download} %{time_total} %{speed_download}\n' | awk '{if ($1==200) print "size="$2/1048576"MB,time="$3"s,speed="$4/1048576"MB/s"; else if($1==404) print "Cannot find study {{study_iuid}}"; else print "Unknown error. Code "$1 " when retrieving object {{object_id}}";}'

To leverage all these curl options, we still need to use the shell module in Ansible to call the command in shell.

Other helpful Ansible commands include ansible-pull for pulling playbooks from VCS repo, and ansible-console for interactive adhoc command execution.

Tags and extra variables

Both tags(-t) and extra variables (-e) are great ways to achieve flow control in playbooks. You can specify to run tasks with certain tags or skip tasks with certain tags. Extra variables can override the default variables from the host or the group.

Speed up execution

To speed up execution of Ansible tasks, there are several ways. For example, we can disable fact gathering by default so it only gathers fact if explicitly specified. This can be set in gathering=explicit under defaults section of ansible configuration file. If you have to gather facts, you may cache the facts using the following:

[defaults]
gathering = smart
fact_caching_timeout = 86400
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_fact_cache

Other than caching, Ansible allows you to select from several execution strategies for playbook. The linear strategy introduces configurable parallelization per task. The free strategy introduces parallelization per play.

  • linear (by default): Up to the fork limit of hosts will execute each task at the same time and then the next series of hosts until the batch is done, before going on to the next task. This mode ensures the progress is synchronized at each task.
  • free: as specified above, this is preferred when there is no need to coordinate the progress between each host target. It is a “free run” for each host all the way till the end of the playbook.
  • debug: essentially linear strategy except that the progress is controlled by an interactive debug session

The fork limit, with a conservative default of 5, can be adjusted in Ansible configuration. The execution strategy can be either specified in Ansible configuration, or specified per play. For example, the following snippet sets the strategy to free for the current play:

---
- hosts: all
    strategy: free
  tasks:
...

Ansible documentation also mentions some play-level keywords to control execution. The serial keyword, is one of them. It can be set along with any strategy above, and it introduces the effect of hosts batching. The value can be a single number, a percentage, or even a list of numbers (if size for each batch is different). Note that the batch size should not exceed the fork limit. This is particularly useful in rolling upgrades. For example:

---
- name: test play
  hosts: webservers
  serial: "30%"

With the parallelization capacity outlined above, a potential concern is some heavy-lifting task may consume a lot of resources, if being executed for all hosts at the same time. Luckily, Ansible has a task/block level keyword throttle, which “de-parallelize” the multi-host progress at a particular task, or block. Here is an example provided by Ansible documentation:

tasks:
- command: /path/to/cpu_intensive_command
  throttle: 1

If there are long running tasks, we can specify async and poll values so Ansible leaves a task running and check back later. For example, the following task allows Ansible to move on and check back every 5 seconds, if the task takes longer than 45 seconds, it is considered failed:

---
  - hosts: all
    remote_user: root
    tasks:
      - name: simulate long running task for 15 sec, wait for up to 45 sec, poll every 5 sec
        command: /bin/sleep 15
        async: 45
        poll: 5

Python Version

The recommendation is to use Python3 for any new development because there is no dependency. If there is no preference specified, Ansible tries to find out the appropriate interpreter and it can be seen in the response of ansible ping module. You can also force the interpreter by providing additional parameter ansible_python_interpreter. To change default interpreter, specify interpreter_python in ansible.cfg. For example:

[defaults]
inventory=~/ansible/inventories/site.yml
library=~/ansible/library/
vault_password_file = ~/ansible/.vault_key
host_key_checking = False
display_skipped_hosts = False
retry_files_enabled = False
interpreter_python=/usr/bin/python3

[privilege_escalation]
become_method=sudo

[ssh_connection]
ssh_args = -C -o ControlMaster=auto -o ControlPersist=1h
pipelining = True

My open issues

I have some minor details that I have not been able to address, after a lot of time googling around. So I have to leave them for future reference.

If an Ansible playbook involves multiple plays, each with their own host, I cannot persist variable across these plays. Although I could use multiple variables and make all variables available for every single host (under all directory), this is really messy.

In Jinja2 template, if I need to access the group of a target host (as defined in inventory), and the target belongs to multiple groups, I cannot filter to match the group I need.