Hello World in EC2 Container Service

31 Mar 2015   docker, ruby, aws

EC2 Container Service (ECS) is a new AWS deployment option for Docker containers. It was launched at the AWS Re-Invent conference in November 2014. Its still in a limited preview and its not yet in the AWS web console so you have to use the AWS CLI.

The first half of this post compares the Docker support in ECS and Elastic Beanstalk. The second half shows how to deploy a simple Hello World app to ECS. The Hello World example consists of 2 Docker containers. The client creates a JSON message and posts it to a SQS queue. The server polls the queue for messages and outputs the contents to stdout. The example is written in Ruby and split into 2 so you can see how multiple containers can be deployed to the same EC2 instance.

Part 1: Why ECS not Beanstalk?

I’ve deployed several Rails apps to Elastic Beanstalk. As a PaaS its pretty good. Its not as slick as Heroku but its improving over time. It has also supported Docker containers since April 2014 and the Docker support is in GA so its not a preview release like ECS.

Stepping back a moment, for me the big 3 advantages of Docker and Linux Containers in general are.

  • Ease of deployment
  • Higher server density
  • Auto scaling

Ease of deployment

Practically any Linux based stack can be packaged as a set of containers. It also provides dev-prod parity. The containers running on your development machine are identical to the containers running in production.

Higher server density

By stripping out the operating system layer its possible to load many more containers onto a physical machine than with virtual machines. This is the kind of major efficiency gain we haven’t seen since the move from physical to virtual servers in the mid 2000s.

Auto scaling

Since there is no delay while the operating system boots containers can be scaled up and down in close to real time. In contrast it takes several minutes to scale VMs meaning they arrive too late. So auto scaling with VMs requires the use of workarounds like scaling up quickly and scaling down slowly.

Issues with Beanstalk

For Elastic Beanstalk there is a fixed limit of 1 container per EC2 instance. This means it helps with deployment. Most stacks can be deployed not just those supported by Beanstalk. However it does nothing for server density and very little for auto scaling.

Auto scaling is supported but each EC2 instance has to be booted, joined to an auto scaling group and only then is the Docker container launched.

In fairness though using Docker with Elastic Beanstalk means that service discovery is done for you. The EC2 instance is automatically registered with the Elastic Load Balancer that is managed by Beanstalk.

Issues with ECS

With ECS currently you have to do service discovery yourself with a tool like Consul or Zookeeper. For me this is the biggest challenge with using ECS at the moment.

Update: Elastic Beanstalk now supports multiple containers via ECS

Thanks to Chris Kalafarski for pointing out that you can now run multiple containers on Elastic Beanstalk and this works by integrating with ECS. It makes a lot of sense to integrate the 2 services and should help with the service discovery problem.

This was released last week and I’d missed the announcement. It looks great and I’ll give it a try next week.

Part 2: ECS Hello World

The Hello World app is written in Ruby and consists of 2 Rake tasks. The client takes in a message and posts a JSON message to a SQS queue.

rake hello:client['hello world!']

Sent: fbe73a51-0684-49de-9195-020c03704c9b
{
  "container": "909fbdc5351e",
  "payload": "hello world!",
  "timestamp": "2015-03-31T09:14:24+02:00"
}

The server polls the queue for messages and outputs them to stdout.

rake hello:server

Received: fbe73a51-0684-49de-9195-020c03704c9b
Container 909fbdc5351e said 'hello world!' at 31 Mar 2015 09:14:24 +02:00

The code is on GitHub and the container image is on DockerHub so it can be deployed to ECS.

Running locally using Docker Compose

Locally Docker Compose is used to run client and server containers. In this example they communicate via the message queue but they could communicate directly via a network link.

ECS Hello World running in development

The client and server containers use the same image but running different commands. All the configuration data is loaded from the .env file. The environment variables are populated in the container by Docker Compose.

# docker-compose.yml

client:
  build: .
  env_file:
    - .env
  command: bundle exec rake hello:client['hello world!']

server:
  build: .
  env_file:
    - .env
  command: bundle exec rake hello:server

TIL: Always create a .dockerignore

The .env file is excluded from Git in the .gitignore file but it should also be excluded from the Docker Image in a .dockerignore file. The .git directory is also excluded which keeps the image size down. The Dockerfile Best Practices page has lots of good tips like this.

# .dockerignore
.git/
.env

Deploying to ECS

Before deploying the app lets start with the definitions of the core ECS concepts.

  • Cluster - a logical grouping of EC2 container instances that run tasks.
  • Container Instance - an EC2 instance running the ECS agent.
  • Task Definition - an application containing one or more container definitions
  • Task - an instance of a task definition running on a container instance
  • Container - a Docker container that is part of a task.

I found some of the terminology confusing at first. Especially that a container instance is an EC2 instance. So its the Virtual Machine hosting your Container and not the Container itself.

For the Hello World app we’re going to use the default cluster and launch 2 container instances. We’ll define separate client and server task definitions and run 2 tasks for each definition. So there will be 4 Docker containers running on 2 EC2 instances.

Hello World running on EC2 Container Service

ECS Setup

The ECS setup instructions are long because they cover everything from registering for an AWS account.

Make sure you’ve done all the steps but if you’re already using AWS & EC2 the only steps you’ll definitely have to do are creating the IAM policy and role for your container instances. The policy grants access for the ECS agent on the container instances to register with ECS clusters.

For the demo you also need a SQS queue called ecs-hello-world.

$ aws sqs create-queue --queue-name ecs-hello-world --region eu-west-1

{
    "QueueUrl": "https://eu-west-1.queue.amazonaws.com/187012023547/ecs-hello-world"
}

In the IAM section of the AWS console create an IAM policy called ecs-hello-world. This grants full control but just to the ecs-hello-world queue. The queue URL includes your AWS account number so make sure it matches the output of the create-queue command.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1426021402000",
            "Effect": "Allow",
            "Action": [
                "sqs:*"
            ],
            "Resource": [
                "arn:aws:sqs:eu-west-1:187012023547:ecs-hello-world"
            ]
        }
    ]
}

Now you need to create a new IAM user and attach the ecs-hello-world policy to it. Make sure you save the access credentials as you’ll need them for your task definitions.

Launching EC2 container instances

In the EC2 section of the AWS console go to Launch Instances and use the wizard.

  • Step 1: select Community AMIs and search for amazon-ecs-optimized currently this returns amzn-ami-2014.09.1-amazon-ecs-optimized-preview3 but make sure you use the latest.
  • Step 2: use the default t2.micro as the instance type.
  • Step 3: specify 2 instances and select the AmazonECSContainerInstanceRole you created in the ECS setup instructions. Select Terminate as the shutdown behaviour.
  • Now click Review and Launch and then click Launch.
  • Select a key pair so you can SSH into the container instances and see the demo running.

Check for container instances

We’re now ready to start using the ECS CLI. The first command lists the container instances. Once your container instances have joined the cluster they’ll appear in the list.

$ aws ecs list-container-instances --region eu-west-1

{
    "containerInstanceArns": [
        "arn:aws:ecs:eu-west-1:187012023547:container-instance/df7f7bde-199e-44de-8ba3-0732abb1810e",
        "arn:aws:ecs:eu-west-1:187012023547:container-instance/e680c93c-5ad3-4dd2-9046-5a556453c4d6"
    ]
}

Create Task Definitions

The next step is to register the task definitions. The CLI takes in a JSON file and the information is similar to the docker-compose.yml. Edit the JSON so it has your IAM user and SQS queue URL.

$ aws ecs register-task-definition --cli-input-json file://hello-server-task.json --region eu-west-1

# hello-server-task.json

{
  "containerDefinitions": [
    {
      "name": "hello-server",
      "image": "rossf7/ecs-hello-world",
      "cpu": 384,
      "memory": 384,
      "essential": true,
      "environment": [
        { "name": "AWS_ACCESS_KEY_ID", "value": "* IAM ACCESS KEY *" },
        { "name": "AWS_SECRET_ACCESS_KEY", "value": "* IAM SECRET KEY *" },
        { "name": "AWS_REGION", "value": "eu-west-1" },
        { "name": "SLEEP_MILLIS", "value": "1000" },
        { "name": "SQS_ENDPOINT", "value": "https://sqs.eu-west-1.amazonaws.com/* ACCOUNT NUM */ecs-hello-world" }
      ]
    }
  ],
  "family": "hello-server"
}

The task has a single container.

  • family - is the name of the task definition.
  • name - is the container name.
  • image - is the public image on DockerHub.
  • cpu - is the number of cpu units to allocate (there are 1024 units per core).
  • memory - is the amount of memory to allocate in MB.
  • essential - set to true so if the container fails the task will also fail.

Make sure you edit the environment section to specify your IAM user and SQS URL.

Now lets create the client task definition. Its identical apart from also specifying the rake hello:world command.

$ aws ecs register-task-definition --cli-input-json file://hello-client-task.json --region eu-west-1

# hello-client-task.json

{
  "containerDefinitions": [
    {
      "name": "hello-client",
      "image": "rossf7/ecs-hello-world",
      "command": [
        "bundle",
        "exec",
        "rake",
        "hello:client['Hello World!']"
      ],
      "cpu": 384,
      "memory": 384,
      "essential": true,
      "environment": [
        { "name": "AWS_ACCESS_KEY_ID", "value": "* IAM ACCESS KEY *" },
        { "name": "AWS_SECRET_ACCESS_KEY", "value": "* IAM SECRET KEY *" },
        { "name": "AWS_REGION", "value": "eu-west-1" },
        { "name": "SLEEP_MILLIS", "value": "1000" },
        { "name": "SQS_ENDPOINT", "value": "https://sqs.eu-west-1.amazonaws.com/* ACCOUNT NUM */ecs-hello-world" }
      ]
    }
  ],
  "family": "hello-client"
}

List Task Definitions

The task definitions are versioned so they will be referenced as hello-client:1 and hello-server:1

$ aws ecs list-task-definitions --region eu-west-1

{
    "taskDefinitionArns": [
        "arn:aws:ecs:eu-west-1:187012023547:task-definition/hello-client:1",
        "arn:aws:ecs:eu-west-1:187012023547:task-definition/hello-server:1"
    ]
}

Run First Task

We’re now ready to run our first client task.

$ aws ecs run-task --task-definition hello-client:1 --count 1 --region eu-west-1

{
    "failures": [],
    "tasks": [
        {
            "taskArn": "arn:aws:ecs:eu-west-1:187012023547:task/70d2824b-578f-45ed-93ac-19e65cfd9ae8",
            "overrides": {
                "containerOverrides": [
                    {
                        "name": "hello-client"
                    }
                ]
            },
            "lastStatus": "PENDING",
            "containerInstanceArn": "arn:aws:ecs:eu-west-1:187012023547:container-instance/e680c93c-5ad3-4dd2-9046-5a556453c4d6",
            "clusterArn": "arn:aws:ecs:eu-west-1:187012023547:cluster/default",
            "desiredStatus": "RUNNING",
            "taskDefinitionArn": "arn:aws:ecs:eu-west-1:187012023547:task-definition/hello-client:2",
            "containers": [
                {
                    "containerArn": "arn:aws:ecs:eu-west-1:187012023547:container/87ebdb38-99dc-41bf-810d-7f0db9cd7545",
                    "taskArn": "arn:aws:ecs:eu-west-1:187012023547:task/70d2824b-578f-45ed-93ac-19e65cfd9ae8",
                    "lastStatus": "PENDING",
                    "name": "hello-client"
                }
            ]
        }
    ]
}

We can see if the client is working by checking the SQS queue length. The client sends 1 message a second so you should see the count going up.

$ aws sqs get-queue-attributes --queue-url https://sqs.eu-west-1.amazonaws.com/187012023547/ecs-hello-world --attribute-names ApproximateNumberOfMessages --region eu-west-1

{
    "Attributes": {
        "ApproximateNumberOfMessages": "42"
    }
}

Run More Tasks

Now lets run another client and 2 server containers.

$ aws ecs run-task --task-definition hello-client:1 --count 1 --region eu-west-1
$ aws ecs run-task --task-definition hello-server:1 --count 2 --region eu-west-1

If you check the queue length it should have stopped going up. The number of clients and servers are balanced and the servers remove 1 message a second from the queue.

Container Instance Capacity

Lets try running another client task.

$ aws ecs run-task --task-definition hello-client:1 --count 1 --region eu-west-1

{
    "failures": [
        {
            "reason": "RESOURCE:MEMORY",
            "arn": "arn:aws:ecs:eu-west-1:187012023547:container-instance/e680c93c-5ad3-4dd2-9046-5a556453c4d6"
        },
        {
            "reason": "RESOURCE:MEMORY",
            "arn": "arn:aws:ecs:eu-west-1:187012023547:container-instance/df7f7bde-199e-44de-8ba3-0732abb1810e"
        },
        {
            "reason": "AGENT",
            "arn": "arn:aws:ecs:eu-west-1:187012023547:container-instance/b2cddd66-435c-4a41-aace-b3f77409e76f"
        }
    ],
    "tasks": []
}

This fails because the container instances don’t have sufficient memory. The task definitions specify 384 MB of RAM. The t2.micro instances have 1GB of RAM meaning that only 2 containers fit.

Accessing the container via SSH

Get the IP address of one of the container instances from the instances section of the EC2 console. SSH into it as ec2-user and using the keypair you specified earlier. You can now run any docker commands.

$ docker ps

CONTAINER ID        IMAGE                            COMMAND                CREATED             STATUS              PORTS                        NAMES
d9df03723f33        rossf7/ecs-hello-world:latest    "bundle exec rake he   19 minutes ago      Up 19 minutes                                    ecs-hello-server-2-hello-server-a2efc4b3f8b7d3c88801
cbc3b002ec5d        rossf7/ecs-hello-world:latest    "bundle exec rake 'h   29 minutes ago      Up 29 minutes                                    ecs-hello-client-2-hello-client-a2cdf8d38dd698a98601
fb390240adbc        amazon/amazon-ecs-agent:latest   "/agent"               About an hour ago   Up About an hour    127.0.0.1:51678->51678/tcp   ecs-agent

$ docker logs cbc3b002ec5d

Sent: 612fbaf4-c9ad-4e19-83ba-8125088246a8
Sent: e103f379-44d5-47fc-88ea-daeec7b8e4a1

$ docker logs d9df03723f33

Received: c9a06426-80ab-4e58-a61a-08836cd648c0
Container cbc3b002ec5d said ''Hello World!'' at  2 Apr 2015 00:00:00 +0000

So there is Hello World running on ECS!

Shutting Down

That completes the demo. Make sure you delete the SQS queue and terminate the 2 EC2 instances via the AWS console. Otherwise you’ll have to keep on paying for them.

$ aws sqs delete-queue --queue-url https://eu-west-1.queue.amazonaws.com/187012023547/ecs-hello-world --region eu-west-1

Problems and Debugging

The big mistake I made when creating the task definitions was not specifying enough CPU and RAM. The ECS tutorial uses a busybox image and runs a wait command so it only allocates 10MB RAM and 10 CPU units.

I copied the example but since this example is using Ruby and calling SQS my containers were immediately crashing due to insufficient memory. This took a while to debug and I eventually found the problem by checking the ECS agent log on the container instances (/var/log/ecs/ecs-agent.log).

Conclusion

I think its still very early days for ECS which makes sense as its still a preview release. There is no support yet in the web console and even the CLI and API support is not 100% complete. At the moment you can’t delete task definitions.

However being able to load multiple containers onto EC2 instances is a huge improvement. This means you can increase server density and start doing auto scaling with containers. Although at the moment you’d need to develop the auto scaling yourself.

Another area that needs improvement is the default ECS scheduler doesn’t yet support fault tolerance. It is possible to use other schedulers such as Marathon which is part of Apache Mesos. However instead of using ECS you could just run a Mesos cluster on EC2 or Digital Ocean.

What’s Next?

As I mentioned earlier Service Discovery is currently the hardest part of using ECS. For this demo I completely ducked it by connecting the clients and servers via a SQS queue. Obviously you can’t do that in most real world situations.

So the next step is to split the server into API and Worker servers. The API will be simple REST API using Sinatra. For Service Discovery I’m going to use Consul to register the API servers with an Elastic Load Balancer.

More Information

The ECS Developer Guide and Forum were both essential when setting up the demo.

comments powered by Disqus