Automating Nutanix with Ansible

I’ve been begging for Nutanix Ansible modules and dynamic inventory for a long time. Hopefully one day we’ll see them. With a couple of day jobs and a 3 year old, I don’t have much time to develop it on my own. While we wait for our lives to get even better, I did what I could with some custom roles and Powershell.

Coming from a life of maintaining Windows servers, Powershell is my comfort zone. The Nutanix cmdlets for Windows Server are very, very good. I’ll walk through some very brief demonstrations of how I’m using Powershell and Ansible to drive some very common Nutanix administration tasks.

Before we start with that, let’s look at the environment. In order for you to do what I’m doing, you’ll need a few things:

  • Jenkins (https://jenkins.io/)
  • Ansible (https://www.ansible.com/)
  • Ansible-friendly credential manager (such as Ansible Vault)
  • A Nutanix platform of some sort
  • Intermediate Powershell skills

Putting these components together is out of the scope of this article, however, I will summarize my environment very quickly. Jenkins runs in AWS. We have a direct connection from AWS to our datacenter. Ansible is built to run in a container, on any host configured as an Ansible host. We use SecretServer for credential management, which has a pretty decent API. Every playbook has a step to retrieve any required credentials before executing the defined role.

Starting with Jenkins, here’s what almost every single job looks like for us. I’ve obfuscated some of the private information, but this example runs our Let’s Encrypt playbook:

#!/bin/sh
echo ${JOB_NAME}
echo ${BUILD_NUMBER}
JOB_NAME=$(tr -s '/' '-' <<< "$JOB_NAME")
DOCKERIMG="<# Docker Hub account name #>/<# Docker Hub image name #>"
PLAYBOOK="playbooks/<# environment #>/centos-letsencrypt.yml"
INVENTORY="environments/<# environment #>/centos-inventory"
ANSIBLE_HOST="<# Ansible host #>"

set -ex
ssh -l root -o StrictHostKeyChecking=no -t $ANSIBLE_HOST "docker pull ${DOCKERIMG} && docker run --rm -i --name ${JOB_NAME}-${BUILD_NUMBER} ${DOCKERIMG} /bin/bash -xec 'cd /etc/ansible && git pull && ansible-playbook -i ${INVENTORY} ${PLAYBOOK}'"

What is nice about this format, is that you can clone this Jenkins job 100 times and only have to update a couple things. Primarily, the playbook and the environment. Ansible host will vary depending on environment. Primarily, we run Ansible out of AWS, but have use cases where a local Ansible host is more effective.

Now, let’s take a look at some playbooks. What I’ve done is created a Nutanix role that has a number of different functions. It’s actually unlimited… I just pass a variable at run time (nutanix_task_type) which tells the role which set of tasks to use.

The tasks that will be used below are for creating new VMs. We’ll look at portions of the script later on. You can see that there are a lot of variables being passed here. WinRM credentials for Ansible to run commands on Windows. Nutanix credentials for the API connection. Nutanix service account credentials for domain join(s) and other domain related activities. All of the create_* variables are defined using “Build with Parameters” in Jenkins, which are then fed to Ansible as environmental variables.

## Host to run commands from.
## It can be any Windows server with Nutanix cmdlets installed.
- hosts: <# Windows server fqdn #>
  gather_facts: no
  roles:
    - nutanix
  vars:
    # WinRM Credentials for Prod
    prod_winrm_user: "{{ hostvars['localhost']['prod_winrm_user'] }}"
    prod_winrm_password: "{{ hostvars['localhost']['prod_winrm_password'] }}"
    # Require for role
    nutanix_task_type: "create-vm"
    ntnxusername: "{{ hostvars['localhost']['ntnx_local_username'] }}"
    ntnxpw: "{{ hostvars['localhost']['ntnx_local_password'] }}"
    prod_ntnx_user: "{{ hostvars['localhost']['prod_ntnx_user'] }}"
    prod_ntnx_password: "{{ hostvars['localhost']['prod_winrm_password'] }}"
    create_os: "{{ ntnx_create_os | string }}"
    create_host: "{{ ntnx_create_host | string }}"
    create_docker: "{{ ntnx_docker | bool }}"
    create_cpu: "{{ ntnx_cpu | string }}"
    create_mem: "{{ ntnx_mem | string }}"
    create_ip: "{{ ntnx_ip | string }}"
    create_template: "{{ ntnx_template | replace('_', ' ') }}"

From here, we’ll go into the Nutanix role. Here’s a look at our tasks/main.yml. Depending on a few variables, we’ll execute certain task lists.

---
- include_tasks: decom-windows.yml
  when:
    - ansible_os_family == "Windows"
    - nutanix_task_type == "decom"

- include_tasks: ngt-windows.yml
  when:
    - ansible_os_family == "Windows"
    - nutanix_task_type == "ngt"

- include_tasks: create-vm-windows.yml
  when:
    - ansible_os_family == "Windows"
    - nutanix_task_type == "create-vm"

- include_tasks: dr-testing-windows.yml
  when:
    - ansible_os_family == "Windows"
    - nutanix_task_type == "dr-testing"

- include_tasks: decom-centos.yml
  when:
    - ansible_distribution == "CentOS"
    - nutanix_task_type == "decom"

- include_tasks: ngt-centos.yml
  when:
    - ansible_distribution == "CentOS"
    - nutanix_task_type == "ngt"

- include_tasks: powershell-cleanup-tasks.yml
  when:
    - nutanix_task_type == "cleanup"
...

Let’s continue with the theme and roll with “create-vm-windows.yml” which will run some Powershell code to deploy a new server from template. The task will download the latest “Create-VM” script from Github every time it runs, and apply the variables passed through the playbook to execute the “Create-VM” script.

---
- name: Check Variables
  debug:
    msg: "OS: {{ create_os }} Template: {{ create_template }} Hostname: {{ create_host }} CPU: {{ create_cpu }} MEM: {{ create_mem }} Cluster: {{ create_ip }}"

- name: Ensure Code directory exists
  win_file:
    path: C:\Code
    state: directory

- name: Update Nutanix Create-VM script
  win_get_url:
    url: https://raw.githubusercontent.com/{{ github_account }}/{{ github_repo }}/master/{{ item }}
    dest: C:\Code\{{ item }}
    force: yes
    headers:
      Authorization: "token {{ hostvars['localhost']['github_auth_token'] }}"
  ignore_errors: True
  loop:
    - Create-VM.ps1
    - AnswerFile.xml

- name: Execute Create-VM Script (Windows)
  win_shell: |
    C:\Code\Create-VM.ps1 -Windows -WindowsTemplate "{{ create_template }}" -Hostname {{ create_host }} -CPU {{ create_cpu }} -MEM {{ create_mem }} -NTNXUsername "{{ ntnxusername }}" -NTNXPassword "{{ ntnxpw }}" -Cluster {{ create_ip }} -WindowsUsername "{{ prod_ntnx_user }}" -WindowsPassword "{{ prod_ntnx_password }}"
  register: create_job_1
  when:
    - create_os is defined
    - create_os == "Windows"

- name: Show Create-VM script output
  debug:
    var: create_job_1.stdout_lines

- name: Execute Create-VM Script (CentOS with Docker)
  win_shell: |
    C:\Code\Create-VM.ps1 -CentOS -CentOSTemplate "{{ create_template }}" -Docker -Hostname {{ create_host }} -CPU {{ create_cpu }} -MEM {{ create_mem }} -NTNXUsername "{{ ntnxusername }}" -NTNXPassword "{{ ntnxpw }}" -Cluster {{ create_ip }}
  register: create_job_2
  when:
    - create_os is defined
    - create_os == "CentOS"
    - create_docker

- name: Show Create-VM script output
  debug:
    var: create_job_2.stdout_lines

- name: Execute Create-VM Script (CentOS without Docker)
  win_shell: |
    C:\Code\Create-VM.ps1 -CentOS -CentOSTemplate "{{ create_template }}" -Hostname {{ create_host }} -CPU {{ create_cpu }} -MEM {{ create_mem }} -NTNXUsername "{{ ntnxusername }}" -NTNXPassword "{{ ntnxpw }}" -Cluster {{ create_ip }}
  register: create_job_3
  when:
    - create_os is defined
    - create_os == "CentOS"
    - not create_docker

- name: Show Create-VM script output
  debug:
    var: create_job_3.stdout_lines
...

The “Create-VM” script performs the following actions:

  1. Generate “user data” aka cloudinit (cloudinit for CentOS, sysprep XML for Windows)
  2. Validate and connect to the Nutanix cluster
  3. Clone the VM template to a new VM and injects the user data/cloudinit/sysprep
  4. Update VM to specified CPU/RAM
  5. Power on the VM
  6. Validate CPU/RAM configuration
  7. Collect the MAC address
  8. Communicate with IPAM API and reserve an IP address (we use Bluecat Address Manager)
  9. Assign the VM to a default protection domain
  10. Post information to Slack regarding the new VM

We’re not going to look at the entire script here, but some of the Nutanix “special sauce” portions. Here’s an example definition of the YAML used to define the cloudinit for CentOS VMs:

$DockerYAML = "
#cloud-config
ssh_pwauth: true
disable_root: false
hostname: $NewHostname
fqdn: $NewHostname.<# OBFUSCATED #>
repo_update: true
repo_upgrade: all
runcmd:
  - ip=`$(/sbin/ip -o -4 addr list eth0 `| awk `'{print `$4}`' `| cut -d/ -f1)
  - curl -X POST --data-urlencode `'payload={`"text`":`"`'`"`$HOSTNAME [`$ip] deployment started ``date```"`'`",`"username`":`"cloud-init`"}`' https://hooks.slack.com/services/<# obfuscated #>
  - [ sh, -c, `"sudo yum -y install yum-utils yum-cron expect`" ]
  - [ sh, -c, `"sudo yum -y makecache fast`" ]
  - [ sh, -c, `"sudo yum update -y --skip-broken`" ]
  - [ sh, -c, `"sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo`" ]
  - [ sh, -c, `"sudo yum -y install docker-ce`" ]
  - [ sh, -c, `"sudo yum -y --enablerepo=extras install epel-release`" ]
  - [ sh, -c, `"sudo yum -y install python-pip`" ]
  - sed -ie `'s/apply_updates = no/apply_updates = yes/`' /etc/yum/yum-cron.conf
  - [ systemctl, enable, yum-cron.service ]
  - [ systemctl, start, yum-cron.service, --ignore-dependencies ]
  - [ systemctl, enable, docker.service ]
  - [ systemctl, start, docker.service, --ignore-dependencies ]
  - [ sh, -c, `"sudo pip install --upgrade pip`" ]
  - [ sh, -c, `"sudo yum -y install awscli`" ]
  - git clone https://<# obfuscated #>.git ~/linux-scripts
  - [ sh, -c, `"sudo /root/linux-scripts/ad-authentication/auth_configuration-<# obfuscated #>.sh`" ]
  - [ sh, -c, `"sudo python /root/linux-scripts/nutanix-ngt/installer/linux/install_ngt.py`" ]
  - [ sh, -c, `"sudo /root/linux-scripts/barracuda/install_barracuda.sh`" ]
  - [ sh, -c, `"sudo /root/linux-scripts/ansible/ansible_remote_setup.sh`" ]
  - curl -X POST --data-urlencode `'payload={`"text`":`"`'`"`$HOSTNAME [`$ip] deployment completed ``date```"`'`",`"username`":`"cloud-init`"}`' https://hooks.slack.com/services/<# obfuscated #>
"

Working with Windows is slightly different. Instead, we package a pre-configured AnswerFile.xml and use Powershell to update sections of the XML with on-demand variables. There’s some extra logic built in as this script was to be designed to run as manual input or via command line parameters:

$AnswerFileOrig = "$RunningDirectory\AnswerFile.xml"
$AnswerFileNew = "$RunningDirectory\$NewHostname.answer.xml"
$xml = New-Object XML
$xml.Load($AnswerFileOrig)
$xml.unattend.settings[1].component[0].computername="$NewHostname"
$ClusterDetails = get-ntnxcluster
$ClusterName = $ClusterDetails.name
$xml.unattend.settings[1].component[0].registeredorganization="$ClusterName"	$xml.unattend.settings[1].component[0].registeredowner="$NTNXUsername"
if ($WindowsUsername -eq "") {	$xml.unattend.settings[1].component[1].Identification.Credentials.Username="$env:username"		$xml.unattend.settings[0].component[0].autologon.username="$env:username"
} else {
$xml.unattend.settings[1].component[1].Identification.Credentials.Username="$WindowsUsername"	$xml.unattend.settings[0].component[0].autologon.username="$WindowsUsername"
}
$xml.unattend.settings[0].component[0].autologon.domain="$LogonDomain"
$WindowsDomain = $xml.unattend.settings[1].component[1].Identification.Credentials.Domain
if ($WindowsPassword -eq "") {
$WindowsCreds = read-host -prompt "Please enter your credentials [$env:username] for $WindowsDomain" -assecurestring
$WindowsCreds = ConvertTo-PlainText $WindowsCreds
$xml.unattend.settings[1].component[1].Identification.Credentials.Password="$WindowsCreds"
$xml.unattend.settings[0].component[0].autologon.password.value="$WindowsCreds"
} else {
$xml.unattend.settings[1].component[1].Identification.Credentials.Password="$WindowsPassword"
$xml.unattend.settings[0].component[0].autologon.password.value="$WindowsPassword"
}
$xml.unattend.settings[0].component[0].useraccounts.administratorpassword.value="$WindowsLocalPW"
$xml.Save($AnswerFileNew)
$sysprepdata = get-content $AnswerFileNew
$CloneCustomConfig.freshInstall = $false
$CloneCustomConfig.userdata = $sysprepdata

All of that work… and now what? Simple. Clone the template VM:

$CloneResult = Clone-NTNXVirtualMachine -Vmid $vmid -SpecList $CloneSpec -VmCustomizationConfig $CloneCustomConfig

This entire process looks very complex, and it is. But the end result is extremely satisfying. The self service portal included with Nutanix Prism is nice, but this does exactly what we want with all the integrations we want (Slack, IPAM, etc.). You can’t get this functionality out of the base product.

We end up with a simple little form, that works off of Github permissions with all the auditing and logging centralized to a platform we leverage everywhere in the environment.

PLAY [localhost] ***************************************************************

TASK [Gathering Facts] *********************************************************
ok: [localhost]

TASK [Retrieve Secrets] ********************************************************
ok: [localhost] => (item=4493)
ok: [localhost] => (item=3697)
ok: [localhost] => (item=3810)

TASK [Assign Secrets to Variables] *********************************************
ok: [localhost]

PLAY [<# OBFUSCATED #>] ********************************************
included: /etc/ansible/roles/nutanix/tasks/create-vm-windows.yml for <# OBFUSCATED #>

TASK [nutanix : Assign Variables] **********************************************
ok: [<# OBFUSCATED #>]

TASK [nutanix : Check Variables] ***********************************************
ok: [<# OBFUSCATED #>] => {
    "msg": "OS: CentOS Template: CentOS 7 Template Hostname: <# OBFUSCATED #> CPU: 2 MEM: 12 Cluster: <# OBFUSCATED #>"
}

TASK [nutanix : Ensure Code directory exists] **********************************
ok: [<# OBFUSCATED #>]

TASK [nutanix : Update Nutanix Create-VM script] *******************************
changed: [<# OBFUSCATED #>] => (item=Create-VM.ps1)
changed: [<# OBFUSCATED #>] => (item=AnswerFile.xml)

TASK [nutanix : Show Create-VM script output] **********************************
ok: [<# OBFUSCATED #>] => {
    "create_job_1.stdout_lines": "VARIABLE IS NOT DEFINED!"
}

TASK [nutanix : Execute Create-VM Script (CentOS with Docker)] *****************
changed: [<# OBFUSCATED #>]

TASK [nutanix : Show Create-VM script output] **********************************
ok: [<# OBFUSCATED #>] => {
    "create_job_2.stdout_lines": [
        "", 
        "You have selected CentOS, which will use CentOS 7 Template", 
        "", 
        "Cluster IP: <# OBFUSCATED #>", 
        "VM Hostname: <# OBFUSCATED #>", 
        "VM CPU: 2", 
        "VM MEM: 12", 
        "Nutanix snappin not installed or loaded, trying to load...", 
        "", 
        "Hello <# OBFUSCATED #>.", 
        "", 
        "", 
        "Connecting to Nutanix...", 
        "Connected to <# OBFUSCATED #> as <# OBFUSCATED #>!", 
        "", 
        "Cloning CentOS 7 Template to <# OBFUSCATED #>...", 
        "...Waiting 15 seconds...", 
        "", 
        "<# OBFUSCATED #> found! [761dd739-1443-478c-be01-cdc1817de800]", 
        "Updating <# OBFUSCATED #> to CPU: 2 MEM: 12288...", 
        "...Waiting 5 seconds...", 
        "", 
        "Powering up <# OBFUSCATED #>.", 
        "...Waiting 15 seconds...", 
        "", 
        "<# OBFUSCATED #> is powered on.", 
        "", 
        "MAC Address:", 
        "<# OBFUSCATED #>", 
        "", 
        "Creating a DHCP reservation for <# OBFUSCATED #> on 10.<# OBFUSCATED #> under the pool: <# OBFUSCATED #> Deployment Pool", 
        "", 
        "Assigned IP:", 
        "10.<# OBFUSCATED #>", 
        "255.255.254.0", 
        "10.<# OBFUSCATED #>", 
        "", 
        "Adding <# OBFUSCATED #> to PD: YDC_99-NoReplication...", 
        "Disconnecting from Nutanix...", 
        ""
    ]
}

TASK [nutanix : Show Create-VM script output] **********************************
ok: [<# OBFUSCATED #>] => {
    "create_job_3.stdout_lines": "VARIABLE IS NOT DEFINED!"
}

PLAY RECAP *********************************************************************
localhost                  : ok=3    changed=0    unreachable=0    failed=0   
<# OBFUSCATED #> : ok=9    changed=2    unreachable=0    failed=0   

Finished: SUCCESS