Post

Iterative Resources in Terraform

Header

Introduction

You are probably already aware that you can iterate over lists, sets, and maps in terraform to deploy many instances from a single resource block. However, I’m surprised to find how few people know the difference between the two methods terraform offers for iteration.

count and for_each work in similar ways but have a subtle difference between them that could spell disaster for your infrastructure. The main difference between them is how the reference to the resource is stored in the state file. When you use count, each reference is stored at an index in an array like this:

1
2
3
aws_instance.this[0]
aws_instance.this[1]
aws_instance.this[2]

When using for_each, each reference is stored in a map where the key comes from the key => value pair at each iteration.

1
2
3
aws_instance.this[“bastion”]
aws_instance.this[“prod_service_a”]
aws_instance.this[“prod_service_b”]

The Problem

Let’s assume both examples above represent the same resources: an EC2 instance for Service A, one for Service B, and a bastion host for administrative purposes. Both examples will deploy the infrastructure and allow you to update them without any issue. So, why do we even have the option to choose?

A problem arises when we want to change our infrastructure. Consider the bastion host which, for security purposes, we don’t want running all of the time next to our production servers. We want to deploy it just for the time needed to perform some administrative tasks and then destroy it to reduce our attack surface. How does terraform react differently between the two examples above? Let’s look at some code and real world output.

Countdown to Detonation

First, we’ll establish an instance list which we’ll iterate over in our resource block. You likely wouldn’t use such a list unless you wanted all of your instances to be identical but we’ll keep things simple for the purposes of this article.

1
2
3
4
5
6
7
locals {
  instances = [
    "bastion",
    "prod_service_a",
    "prod_service_b"
  ]
}

Next, we establish an aws_instance resource and use the count parameter to tell terraform how many we want. A Name tag is set using count.index to assign the string from the current iteration of the local.instances list to the instance.

1
2
3
4
5
6
7
8
9
resource "aws_instance" "this" {
  count         = length(local.instance_names)
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.micro"
  subnet_id     = data.aws_subnet.private.id
  tags = {
    Name = local.instance_names[count.index]
  }
}

We run terraform apply and it creates three EC2 instances as expected. Now let’s remove our bastion instance by commenting it out in local.instances and run terraform plan.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
Terraform used the selected providers to generate the following execution plan.
Resource actions are indicated with the following symbols:
  ~ update in-place
  - destroy

Terraform will perform the following actions:

  # aws_instance.this[0] will be updated in-place
  ~ resource "aws_instance" "this" {
        id                                   = "i-0578762313b033b3e"
      ~ tags                                 = {
          ~ "Name" = "bastion" -> "prod_service_a"
        }
      ~ tags_all                             = {
          ~ "Name" = "bastion" -> "prod_service_a"
        }
        # (28 unchanged attributes hidden)

        # (8 unchanged blocks hidden)
    }

  # aws_instance.this[1] will be updated in-place
  ~ resource "aws_instance" "this" {
        id                                   = "i-0a8032179039f04f5"
      ~ tags                                 = {
          ~ "Name" = "prod_service_a" -> "prod_service_b"
        }
      ~ tags_all                             = {
          ~ "Name" = "prod_service_a" -> "prod_service_b"
        }
        # (28 unchanged attributes hidden)

        # (8 unchanged blocks hidden)
    }

  # aws_instance.this[2] will be destroyed
  # (because index [2] is out of range for count)
  - resource "aws_instance" "this" {
      - ami                                  = "ami-0e001c9271cf7f3b9" -> null
      - tags                                 = {
          - "Name" = "prod_service_b"
        } -> null
      - tags_all                             = {
          - "Name" = "prod_service_b"
        } -> null
    }

Plan: 0 to add, 2 to change, 1 to destroy.

Can you see what happened? We expect one instance to be destroyed but why are the other two being changed? If you look closely at the plan output you’ll see that the two prod instances shifted left in the array. Where the bastion used to be at aws_instance.this[0] we now see prod_service_a. Likewise, we see prod_service_b at aws_instance.this[1] where prod_service_a used to be. Since they’ve changed indexes, terraform sees them as needing to be updated.

Also, notice how terraform is just updating the tags. We’re not even getting the desired result by removing bastion from the list. The bastion host is being renamed to prod_service_a, prod_service_a is being renamed to prod_service_b, and prod_service_b is being destroyed! It’s going to be a long night for the poor engineer who applies this plan.

A Surgical Approach

Fortunately, there is a better way using for_each. Let’s convert our list of instances into a map. This will also give us the ability to define custom attributes for each if we so desire.

1
2
3
4
5
6
7
locals {
  instances = {
    bastion        = {}
    prod_service_a = {}
    prod_service_b = {}
  }
}

Next, we update our resource block replacing count with for_each and changing the Name tag to use each.key which pulls the key from each iteration of the map. The code looks cleaner now without the calls to length() and [count.index].

1
2
3
4
5
6
7
8
9
resource "aws_instance" "this" {
  for_each      = local.instances
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.micro"
  subnet_id     = data.aws_subnet.private.id
  tags = {
    Name = each.key
  }
}

When we run a plan, the output looks slightly different than before. We see the key from each map entry has been used in the resource map. Other than that, this is the same code as before and we get the same result - three EC2 instances.

1
2
3
4
5
6
7
8
9
10
Terraform will perform the following actions:

  # aws_instance.this["bastion"] will be created
  ...
  # aws_instance.this["prod_service_a"] will be created
  ...
  # aws_instance.this["prod_service_b"] will be created
  ...

Plan: 3 to add, 0 to change, 0 to destroy.

Now, let’s comment out the bastion host once again and run a new plan. This time our instances don’t get shifted around. Instead, we get exactly the desired result - just the bastion host is targeted for destroy.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Terraform used the selected providers to generate the following execution plan. Resource
actions are indicated with the following symbols:
  - destroy

Terraform will perform the following actions:

  # aws_instance.this["bastion"] will be destroyed
  # (because key ["bastion"] is not in for_each map)
  - resource "aws_instance" "this" {
      - tags                                 = {
          - "Name" = "bastion"
        } -> null
      - tags_all                             = {
          - "Name" = "bastion"
        } -> null
    }

Plan: 0 to add, 0 to change, 1 to destroy.

When Should I Count?

Does this mean we should abandon all use of count? Not quite, we just need to know when to use it. One such time is when we want resources to be optional. Let’s reuse our bastion host example and this time we’ll add a variable to enable/disable the resource.

1
2
3
4
5
variable "enable_bastion" {
    description = "Whether to deploy a bastion instance."
    type = bool
    default = false
}
1
2
3
4
5
6
7
8
9
resource "aws_instance" "bastion" {
  count         = var.enable_bastion ? 1 : 0
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.micro"
  subnet_id     = data.aws_subnet.private.id
  tags = {
    Name = "bastion"
  }
}

Notice how this time we separated the bastion instance into its own resource and used count with a ternary operator to determine whether we would deploy 1 or 0 instances. Now when we set the enable_bastion var to true the instance will be deployed; when it is set to false the instance will be destroyed.

We could have also done this with for_each and a single resource block by conditionally adding the bastion element to the local.instances map.

1
2
3
4
5
6
7
8
locals {
  instances = merge({
    prod_service_a = {}
    prod_service_b = {}
    },
    var.enable_bastion ? { bastion = {} } : {}
  )
}

Whichever option is easier for your team to read and maintain is usually the better choice.

Avoiding Pitfalls

There is one major pitfall I must warn you about when using for_each. If you’re fairly new to terraform, I worry that you’ll run into this issue and revert back to using count for “simplicity”. The error you might encounter is:

Error: Invalid for_each argument … The “for_each” map includes keys derived from resource attributes that cannot be determined until apply, and so Terraform cannot determine the full set of keys that will identify the instances of this resource.

You’ll encounter this error if you attempt to build a map’s keys from values that won’t be known until you apply, e.g. an EC2 instance ID. The good news is this only applies to the keys of a map and not the values. The solution is to restructure your map so the keys are static strings.

Here’s some more info from the docs

unlike most arguments, the for_each value must be known before Terraform performs any remote resource actions. This means for_each can’t refer to any resource attributes that aren’t known until after a configuration is applied (such as a unique ID generated by the remote API when an object is created).

Conclusion

Iterating over data structures can be a powerful way to deploy your resources and keep your code tidy. However, you must be aware of the different use cases for terraform’s iterative meta-arguments and know when to apply each. When used correctly they have the ability to transform your modules from a kludgy mess to an elegant solution without compromising operational stability.

This post is licensed under CC BY 4.0 by the author.