Ansible at scale 1 of 2

Our original automation scheme involves several Ansible playbooks that started off simple but have been sprawling. So recently I spent some time to revamp the entire set of playbooks, following some of the best practices from the official documentation. The goal is to:

  1. Reduce the number of playbooks;
  2. Improve code re-usability (by implementing roles);
  3. Increase portability across different customer environment;
  4. Improve security;
  5. Re-organize the directory so that more team members can contribute to different parts of it.

This set of playbook is used by our customer service engineers some are ingrained with the way they have been using certain playbooks. They are used for configuration management instead of system provisioning. This means that our engineers will use the Ansible playbooks repeatedly in their daily tasks. The target host may be different every time and our engineers are used to specify host pattern with -l for each playbook run. Throughout the automation development I need to make sure their commands are kept as short as possible.

Inventory and variables

We cannot guarantee that every custom environment are identical, but what we can do is make sure that for a new environment, the only change to make is inventory and variables. This is how we define portability. No changes should be made to tasks, roles or playbooks when the Ansible directory is deployed at a different customer environment.

Although the example in Ansible documentation comes with multiple inventory files. I do not want to use multiple inventory file in our implementation because this means one more switch to specify in each command. Luckily the server fleet is less than 100 and all can fit within a single inventory file, under different groups.

all:
  children:
    prod_dc1_app:
      hosts:
        apphost01:
        apphost03:
        apphost05:
        apphost07:
        apphost09:
        apphost11:
        apphost13:
    prod_dc2_app:
      hosts:
        apphost02:
        apphost04:
        apphost06:
        apphost08:
        apphost10:
        apphost12:
        apphost14:
    test_dc1_app:
      hosts:
        tapphost01:
    test_dc1_db:
      hosts:
        tapphost01:
    test_dc2_app:
      hosts:
        tapphost02:
    test_dc2_db:
        tapphost02:
    prod_dc1_db:
      hosts:
        dbhost01:
        dbhost03:
        dbhost05:
        dbhost07:
        dbhost09:
        dbhost11:
    prod_dc2_db:
      hosts:
        dbhost02:
        dbhost04:
        dbhost06:
        dbhost08:
        dbhost10:
        dbhost12:

User of ansible playbook can use -l to specify a group.

Use of roles vs playbooks

The entire Ansible community advocates the use of roles in place of playbooks. The concept of Ansible “role” seems fairly abstract and confusing at the beginning. “Role” sounds like a static server state semantically, whereas our existing playbooks are full of actions (think of shell scripts). How would one convert an action list into static states? After some thought, I came to the understanding that roles should be thought of as desired end state. Yes, the end state is static, but that’s all we care about. This is essentially the whole idea of Ansible roles: you start from the end state and then design what needs to be done first to reach that state. The concept of role perfectly reflects how Ansible wants you to think about solving an infrastructure problem – stop thinking about what you need to do. Instead, think about what you ultimately want, start from the desired state and work backwards.

In our own setup, the best practice turned out to be: if the playbook involves a single play with less than 5 tasks, just stick to playbook. We don’t want to get rid of playbooks just for the sake of it. Otherwise, if a playbook has grown into more than 5 tasks, we need to think about our desired state, and either implement a new role, or incorporate it into an existing role. This is the time we have to transition from the playbook oriented thinking to the role oriented thinking. Each role directory can include a task sub-directory with main.yml that references the rest of the tasks. Each role can define its own role-related variables. If there’s a lot in common between two roles, we can even have a common role with or without its main.yml.

A simplified version of our Ansible directory structure looks like this:

├── deploy-app.yml
├── deploy-db.yml
├── inventories
│   ├── group_vars
│   │   ├── all
│   │   │   ├── all.yml
│   │   │   └── vault_all.yml
│   │   ├── prod_dc1_db.yml
│   │   ├── prod_dc2_db.yml
│   │   ├── test_dc1_db.yml
│   │   └── test_dc2_db.yml
│   ├── host_vars
│   └── site_inventory.yml
├── roles
│   ├── common
│   │   ├── files
│   │   └── tasks
│   │       ├── log.yml
│   │       ├── skip_self.yml
│   │       └── validate_path.yml
│   ├── db_conf
│   │   ├── defaults
│   │   │   └── main.yml
│   │   ├── files
│   │   ├── handlers
│   │   │   └── main.yml
│   │   ├── meta
│   │   │   └── main.yml
│   │   ├── README.md
│   │   ├── tasks
│   │   │   ├── main.yml
│   │   │   ├── start_db.yml
│   │   │   ├── stop_db.yml
│   │   │   └── update_cluster_var.yml
│   │   ├── templates
│   │   │   ├── myid.j2
│   │   │   ├── db_properties.j2
│   │   │   └── zookeeper_properties.j2
│   │   └── vars
│   │       └── main.yml
│   └── app_conf
│       ├── defaults
│       │   └── main.yml
│       ├── files
│       ├── handlers
│       │   └── main.yml
│       ├── meta
│       │   └── main.yml
│       ├── README.md
│       ├── tasks
│       │   ├── bk_app_conf.yml
│       │   ├── empty_app_conf.yml
│       │   ├── main.yml
│       │   ├── push_app_conf.yml
│       │   ├── start_app.yml
│       │   ├── stop_app.yml
│       │   ├── tar_app_conf.yml
│       │   ├── untar_app_conf.yml
│       │   └── update_cluster_var.yml
│       ├── templates
│       │   └── dbref_xml.j2
│       └── vars
│           └── main.yml
├── service-app.yml
└── service-db.yml

Variables specific to a group of hosts or individual hosts can be included in different yml files. When the entire directory is moved to a different customer environment, our engineers will need to update the inventory and variable files. The task, roles and playbooks should build their logics using those variables.

Vault

Our previous implementation of Ansible playbook stores sudo password base64 encoded and use no_log to avoid displaying values. Now we move those to encrypted variable yml file using ansible-vault. We reference the value to encrypt as regular variable:

ansible_become_pass: '{{passtoencrypt}}'
ansible_become_method: sudo
ansible_become: yes

Then we run the following:

ansible-vault create vault_all.yml

This prompt for a key, and once you type in the key it opens a text editor where we can store the real password. For example:

passtoencrypt: MyP@ssw0rd4real!

Use the text editor to save file. The file is now saved encrypted and must be open with correct key (aka vault password). If we call Ansible playbook with –ask-vault-pass switch then the playbook will prompt for key input, or use include_vars to include variable from vault file (example). If we want to even skip this, we can store the key in a file and reference them from vault_password_file in ansible.cfg

Optimize connection

OpenSSH 5.6 and later supports multiplexing where multiple SSH sessions share a TCP connection. This can be turned on so that the following SSH connections save the time of TCP handshake. This can be configured in ansible configuration file under ssh_connection. Below is an example of this option with ControlPersist=1h. So the TCP connection is torn down after 1 hour.

[ssh_connection]
ssh_args = -C -o ControlMaster=auto -o ControlPersist=1h

The other option we can leverage is pipelining. Ansible takes three steps to execute a task:

  1. build a python script based on module used
  2. copy the python script to remote host
  3. execute the python script on the remote host

If pipelining is turned on, the python script is passed in along with the SSH session, this would save a roundtrip and increase performance. Pipelining can be configured under ssh_connection in Ansible configuration file:

[ssh_connection]
pipelining = True

In the example below we can see by pipelining we cut the number of connection in half:

# with pipelining
[ghunch@control-host ~]$ ansible remote-host -vvvv -m ping | grep EST
<remote-host> ESTABLISH SSH CONNECTION FOR USER: ghunch
<remote-host> ESTABLISH SSH CONNECTION FOR USER: ghunch
<remote-host> ESTABLISH SSH CONNECTION FOR USER: ghunch


# without pipelining
[ghunch@control-host ~]$ ansible remote-host -vvvv -m ping | grep EST
<remote-host> ESTABLISH SSH CONNECTION FOR USER: ghunch
<remote-host> ESTABLISH SSH CONNECTION FOR USER: ghunch
<remote-host> ESTABLISH SSH CONNECTION FOR USER: ghunch
<remote-host> ESTABLISH SSH CONNECTION FOR USER: ghunch
<remote-host> ESTABLISH SSH CONNECTION FOR USER: ghunch
<remote-host> ESTABLISH SSH CONNECTION FOR USER: ghunch
<remote-host> ESTABLISH SSH CONNECTION FOR USER: ghunch

Note that if we use sudo command, then we need to disable requiretty in /etc/sudoers on the remote host.

Custom Module

It’s fairly straightforward to build a custom module in Ansible. Just place the module file (modulename.py) in inventory directory and use it as you would with regular Ansible module. The module file needs to be completed in Python with certain return value. Before creating custom module, you should look for existing modules to avoid re-inventing the wheel. You may also need to determine whether you simply need to run a python script on target host (with Ansible’s script module), or you really need an Ansible module. The former is procedural, and the latter focus on desired state. Custom module is more used in proprietary development.