2017-07-17

First Humble Steps With Terraform

When you are interested in cloud provisioning and infrastructure as code, you nowadays definitely come across Terraform. So did I, and I want to share my first experiences in conjunction with AWS. All code used in this post can be found here.

Installation

Installation is fairly easy: Terraform can be downloaded from its homepage as a single binary, and then you're ready to go. For convenience, you may want to put that binary into a location referenced by your PATH environment variable.

Authentication

Terraform can leverage the authentication scheme used by AWS CLI. I.e. maintain your ~/.aws/credentials like you do for using AWS CLI, and you are already able to start with Terraform. Also, the use of profiles is supported as shown in the following Terraform code snippet, which sets up the provider:

provider "aws" {
  region  = "${var.region}"
  profile = "${var.profile}"
}

Dry run

With terraform plan you can always check what actions would be taken by Terraform. This is also very helpful during development.

Collaboration

By default, Terraform stores the current state in a local file called terraform.tfstate. If you work on provisioning in a team, you should set up a remote state with locking support. Terraform supports several backends, and one of those is S3. Locks are held in a DynamoDB table. The following snippet declares the usage of a remote state:

terraform {
  backend "s3" {
    bucket         = "my-first-humble-steps-with-terraform-staging"
    key            = "staging/terraform.tfstate"
    region         = "ca-central-1"
    encrypt        = "true"
    dynamodb_table = "my-first-humble-steps-with-terraform-staging-lock"
  }
}

Before you can initialize the remote state by terraform init, the referenced AWS resources (S3 bucket and DynamoDB table) have to exist. You can create the resources by point-and-click using the console, or guess what, by using Terraform itself:

provider "aws" {
  region  = "${var.region}"
  profile = "${var.profile}"
}

resource "aws_s3_bucket" "terraform_shared_state" {
  bucket = "my-first-humble-steps-with-terraform-${lower(var.profile)}"
  acl    = "private"

  versioning {
    enabled = true
  }

  lifecycle {
    prevent_destroy = true
  }
}

resource "aws_dynamodb_table" "terraform_shared_state_lock" {
  name           = "my-first-humble-steps-with-terraform-${lower(var.profile)}-lock"
  read_capacity  = 5
  write_capacity = 5
  hash_key       = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

DRY

Modularization

What if you want to provision several environments in a similar way? Terraform provides a module concept. Simply bundle the shared resources into a module (here: resources), put the actual settings in a subdir per environment (here: staging)

.
├── resources
│   ├── outputs.tf
│   ├── resources_test.py
│   ├── resources.tf
│   └── variables.tf
└── staging
    ├── outputs.tf
    ├── remote_state.tf
    ├── staging.tf
    ├── terraform.tfvars
    └── variables.tf

The file resources/resources.tf provides the resources to be provisioned. The file staging/staging.tf just calls the resources module resources with the variables expected by the module:

module "resources" {
  source  = "../resources"

  region  = "${var.region}"
  profile = "${var.profile}"
  vpc_id  = "${var.vpc_id}"
  count   = "${var.count}"
}

If you use a module for the first time, you have to run terraform get within the environment subdir once.
A word about variables: resources/variables.tf defines the input expected by the module, and resources/outputs.tf defines the output handed back to the caller. So, this is kind of an interface definition of that module. The caller provides those input variables either by providing the actual value when calling the module (i.e. in staging/staging.tf) or by defining (see staging/variables.tf), setting (see staging/terraform.tfvars) and passing the variables (see staging/staging.tf) - the latter introduces more places to edit, when you add a new variable. On the other hand, you have a more visible interface definition of the environment itself. It's up to you to decide.

More of the same kind

You don't want to repeat yourself and add redundant code for several resources of the same kind. Therefore, use the resource's count attribute. Using the count.index attribute helps picking a single resource from a collection of resources. In the following snippet n EBS volumes and n instances are created, plus a single volume attachment for each instance/volume pair:

resource "aws_ebs_volume" "humblebee_volume" {
  count             = "${var.count}"
  availability_zone = "..."
  size              = 8
  encrypted         = true
  tags {
    Name = "Humblebee"
  }
}

resource "aws_volume_attachment" "humblebee_attachment" {
  count       = "${var.count}"
  device_name = "/dev/sdz"
  volume_id   = "${aws_ebs_volume.humblebee_volume.*.id[count.index]}"
  instance_id = "${aws_instance.humblebee_instance.*.id[count.index]}"
}

resource "aws_instance" "humblebee_instance" {
  count                       = "${var.count}"
  ami                         = "..."
  instance_type               = "t2.micro"
  subnet_id                   = "..."
  tags {
    Name = "HumbleBee"
  }
}

By the way, use volume attachments instead of direct EBS block devices mapped to your instances - if you ever want to replace an instance, e.g. because of an AMI upgrade, only the instances and the attachment get replaced, but the EBS volume itself is left untouched, which avoids undesired data loss.
EDIT: I've updated the code snippet above and switched from element(...) to the indexing operator [...] to avoid unnecessary rebuilds of unaffected resources. See here for details.

Data Sources

Beside resources, Terraform also provides data sources to access data outside of the current Terraform environment. E.g. you can query the ID of a certain AMI using the aws_ami data source:

data "aws_ami" "amazon_linux" {
  most_recent = true

  filter {
    name   = "owner-alias"
    values = ["amazon"]
  }

  filter {
    name = "name"
    values = ["amzn-ami-hvm-*-gp2"]
  }

  filter {
    name = "architecture"
    values = ["x86_64"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }

  filter {
    name   = "state"
    values = ["available"]
  }
}

Or you can query the subnets for a given VPC:

data "aws_subnet_ids" "humblebee_subnet_ids" {
  vpc_id = "${var.vpc_id}"
}

With that, you can provision an instance with the latest AMI - and if you provision more than one instance, the instances are spread across the available subnets:

resource "aws_instance" "humblebee_instance" {
  count                       = "${var.count}"
  ami                         = "${data.aws_ami.amazon_linux.id}"
  instance_type               = "t2.micro"
  subnet_id                   = "${element(data.aws_subnet_ids.humblebee_subnet_ids.ids, count.index)}"
  tags {
    Name = "HumbleBee"
  }
}

Testing

One advantage of infrastructure as code is the ability to run automated tests against that code. When it comes to infrastructure, I think of integration tests as tests against running resources. In the area of Terraform kitchen-terraform is such a integration test framework. But testing against running resources implies in most cases costs.
Therefore, I decided to use unit tests to checks what would be performed in case of a Terraform run. I came across Terraform Validate for this.

Setup

The setup if quite straightforward - I used virtualenv to set up a separate Python 3.5 environment for this:

$ git clone https://github.com/elmundio87/terraform_validate
$ virtualenv -p python3.5 tfenv
$ source tfenv/bin/activate
$ pip install -r terraform_validate/requirements.txt
$ python setup.py install

Test runs

The following test suite checks for name tags being set, direct EBS block device mappings not being used and for EBS volume encryption:

#!/usr/bin/env python # -*- coding: utf-8 -*-
"""Test suite for Terraform resources."""
import os import unittest import terraform_validate class TestResources(unittest.TestCase): """Tests related to resources.""" def setUp(self): self.path = os.path.join(os.path.dirname( os.path.realpath(__file__)), '.') self.validator = terraform_validate.Validator(self.path) def test_tags(self): """Checks resources for required tags.""" tagged_resources = ['aws_ebs_volume', 'aws_instance'] required_tags = ['Name'] self.validator.error_if_property_missing() self.validator.resources(tagged_resources).property('tags'). \ should_have_properties(required_tags) def test_ebs_block_device(self): """Checks instances for NOT having a EBS block device directly mapped.""" self.validator.resources(['aws_instance']). \ should_not_have_properties(['ebs_block_device']) def test_ebs_volume_encryption(self): """Checks EBS volume for enabled encryption.""" self.validator.error_if_property_missing() self.validator.resources(['aws_ebs_volume']).property('encrypted'). \ should_equal(True) if __name__ == '__main__': SUITE = unittest.TestLoader().loadTestsFromTestCase(TestResources) unittest.TextTestRunner(verbosity=0).run(SUITE) # vim:ts=4:sw=4:expandtab

Terraform Validate only parses the Terraform code files, but does not trigger a terraform plan run to find out what actions would be performed. This has some implications:
  • *.tfvars files are not recognized. So, variable expansion cannot be triggered. Therefore, running tests within the environment subdir is not meaningful, as actual variable expansion does not take place.
  • Variable expansion only takes default values (if defined) into account.
  • Checks on properties that are derived from variables or data sources do not work.
  • Data sources cannot be checked.
So, unit tests do not substitute integration tests, but can complement them.

Cleanup

That's simple: terraform destroy removes all the resources controlled by Terraform's state file. All other resources are left untouched.

The end

It was fun dealing with Terraform. I hope this might be useful for others as well. If you have questions or suggestions, please leave a comment.

2017-07-15

2017-07-09

Den Drachen gejagt

Auf dem Weg zum Konzert: Blick auf die Oberbaumbrücke, das Wahrzeichen meines ehemaligen Heimatbezirkes 

Reminiszenz an Grossstadtgeflüster

Support No. 1: Bass Sick Shit

Support No. 2: Mindfall

Main Act: Wisdom in Chains