From #19:
Identify a service to orchestrate the deployment of containers. In addition to orchestrating container deployment, this service should at minimum provide a web interface to container logs and provisioning container access by application. Ideally, it would also allow managing by access by user and centralize other key devops tasks, such as DNS and SSL management. We have identified two immediate candidate services: ECS and Heroku. We propose evaluating these services by standing up a containerized application in both and selecting the one we prefer. If we prefer ECS but desire further abstraction, we then propose trialing Fargate.
- Server provisioning
- Image management
- Container orchestration
- Logging
- Secret management
- Routing traffic
ECS is AWS's contribution to the container orchestration space. It works in concert with several other AWS microservices to "deploy, manage, and scale Docker containers."
In general, I followed the approach outlined here: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-cli-tutorial-ec2.html
My notes, as well as the modified docker-compose.ecs.yml
and ecs-params.yml
files live in a branch of dedupe-service
.
EC2.
I used the EC2 CLI, as recommended in the docs linked above.
Fargate would abstract needing to provision this away. I had the notion that this was an expensive option, however there was a significant reduction in price in early 2019. This doesn't make me any more able to decipher AWS pricing pages, but it's heartening just the same.
Used ECR, since I didn't have access to the datamade DockerHub organization / our image was private. DockerHub is an equally viable option.
ECS.
ECS supports a modified docker-compose.yml
syntax that allows you to skip
defining your services by hand. It also allows you to define AWS-specific
blocks, e.g., for logging integration with CloudWatch and secret management
in the parameter store.
CloudWatch.
This was actually quite easy to set up, just needed to add a logging
block
to each service:
logging:
driver: awslogs
options:
awslogs-group: dedupe-service-ecs
awslogs-region: us-east-1
awslogs-stream-prefix: redis
Didn't set this up, but seems possible via AWS Systems Manager Parameter Store. Related documentation for the ecs-params.yml
file needed to define secrets: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cmd-ecs-cli-compose-ecsparams.html.
ELB.
The documentation I was following suggested that I should be able to view my application with the IP address and port, but that didn't seem to be the case.
TO-DO: Can I do this if I map my container to port 80 (the default when spinning up EC2 instances)?
Instead, it's recommended to use a load balancer to route web traffic.
When I tried to set up an ELB for my ECS-configured EC2 instance, the instance was not available during configuration. This is where I put it down.
There are a lot of microservices to configure here – and while some of that configuration can be scripted, much of it takes place in far-flung corners of the AWS console.
It also feels a bit off to transition to a service optimized for scaling, when our real problem is managing single instances of an application. It makes things like using a load balancer instead of a web server feel unnecessary.
That said, are some things to like:
- You can use familiar docker-compose syntax (with some exceptions).
- Wow, why aren't we using
ec2-cli
to provision our servers now? - It's a breeze to set up logging.
- I like the idea of managing secrets in a central location, with access controlled by IAM roles we already manage. I didn't realize this was something AWS offered.
@jeancochrane explored Heroku-based deployments in May: #9 (comment).
Start there for useful context.
The changes I made to stand up dedupe-service
on Heroku are in
a branch.
Heroku abstracts this away, however I only wanted to use the cheapo, single-dyno tier. We will want to research, select, and document a sane default for most applications. You can upgrade dynos in the Heroku UI, or via the CLI: https://devcenter.heroku.com/articles/scaling#manual-scaling.
For advanced needs, e.g., Dedupe.io, Heroku offers enterprise accounts with features like Private Spaces (with PCI, HIPAA, ISO, and SOC compliance) and autoscaling. https://www.heroku.com/enterprise
Heroku offers two options for containerized deployment: building your container and publishing it to a registry, then releasing it to your dyno; or defining your application in a yml file and building your application container/s from bundled Dockerfile/s as part of each release: https://devcenter.heroku.com/categories/deploying-with-docker
I gave the former a try, but ultimately decided to pilot the latter, because I preferred the expressive syntax, I liked that you could define addons, multiple processes, and I wanted to run migrations as part of the release cycle, prior to starting the app.
So the answer here is None, your container is built from your app's Dockerfile on each deploy.
Heroku allows you to define an application, as well as add-on services and
pre-deployment steps, in a heroku.yml
file. (@jeancochrane mentions that
this is also configurable in the UI!)
All told, the following took me about three hours to research, troubleshoot, and deploy Dedupe.io on Heroku.
-
Create and verify a Heroku account.
-
Install and authenticate with the Heroku CLI. https://devcenter.heroku.com/articles/heroku-cli#download-and-install
-
Create a Heroku app.
heroku apps:create dedupeio
-
Add the Postgres addon.
N.b., while you can specify add-ons in your
heroku.yml
file, thesetup
block does not appear to fire as part of regular builds. Creating an application from asetup
block, including the addition of add-ons, is in beta.heroku addons:create heroku-postgresql:hobby-dev
Also note that the Postgres add-on injects a
DATABASE_URL
environmental variable into your container. You need to update your application settings to reference that variable, instead of a hard-coded URI. -
Enter the
psql
shell and add extensions. I did this manually, however it could also be automated with aninit_db
script in your application's language.# locally heroku pg:psql # in the heroku psql shell create extension if not exists pgcrypto; create extension if not exists intarray;
-
Write a heroku.yml file.
# Setup doesn't actually run unless you create your app from setup # https://devcenter.heroku.com/articles/build-docker-images-heroku-yml#creating-your-app-from-setup setup: addons: - plan: heroku-postgresql - plan: heroku-redis # Specify the processes you want to run. These will be created in separate dynos. build: docker: web: Dockerfile worker: Dockerfile # Define scripts to run before your application/s start release: command: - alembic upgrade head || (python init_db.py && alembic upgrade head) image: web # Define commands to run application/s. run: # PORT is generated by Heroku web: python runserver.py --host 0.0.0.0 --port $PORT worker: python run_queue.py
-
Commit your changes and push them to Heroku.
git add heroku.yml [updated_config.py updated_script.py ...] git commit -m "add heroku.yml" git push heroku your-branch:master
Heroku exposes build and release logs in the UI. Application logs can be
viewed via the CLI, like heroku logs --tail
.
Cribbed from @jeancochrane:
Like Travis CI, Heroku lets you configure secret environment variables for each app through its console (or its CLI). Those environment variables then get threaded into the application at build time and at runtime. Public environment variables can be configured in the Heroku configuration files that you keep in your repo.
This sort of paradigm works a lot better with containerized applications, where you can define the environment through an .env file or environment attribute in a Docker Compose file and then pass those into the container. I could see a simple secrets management solution where we have an encrypted .env file for local development (or just download it from S3), and then for review/staging/production we configure secret environment variables in the Heroku console.
Heroku provides a yourapp.herokuapp.com URL automatically. This on custom URLs also cribbed from @jeancochrane:
On Heroku, if you want to put your app behind a custom domain, you have to create an ALIAS or CNAME record in your DNS provider that delegates DNS to a Heroku hostname.
This pattern works perfectly fine for subdomains, but for root domains it's a little tricky because our domain registrar of choice (NameCheap) doesn't currently offer ALIAS or ANAME records for root domains. To get custom root domains to work with Heroku, we'd need to use a different domain registrar. (Luckily, this is a pretty common feature these days, and I'm surprised NameCheap doesn't offer it.)
Overall, Heroku is several times easier than ECS, as well as even our current deployment practices with EC2 and CodeDeploy.
It appears to play nicely with Travis, also: https://docs.travis-ci.com/user/deployment/heroku/
With that said, I only had a single dyno to use, so I didn't trial interactions between dynos, e.g., between an app and worker process. There may yet be some kinks to iron out there.
There also seems to be a price on convenience.
For our most complicated app (Dedupe.io), I think we'd need a $25/month 1x dyno for the Dedupe.io frontend and a $50/month 2x dyno for the worker processes, since they do some intense computing.
For most apps, though:
Dynos are comparably priced to EC2 instances: there's a free tier that works nicely for review and staging apps, a super-cheap "hobby" version that's $7/mo but doesn't come with horizontal scaling or long-term metrics, and then a "standard" version that's $25/mo... However, the $7/mo offering seems really appealing to me, and for low-traffic sites I can't figure out a reason why it wouldn't work. In this case, Heroku would probably be cheaper than EC2.
Add-ons are more persistently costly:
There's a "hobby" tier that gives you 10,000 rows for free, and you can bump it up to 1,000,000 rows for $9/mo, but these databases are colocated with other customer's databases and don't come with uptime guarantees or advanced monitoring... The cheapest option that isn't colocated is $50/mo. This is actually pretty comparable with AWS RDS pricing, but it's a lot more expensive than our current practice of colocating our Postgres and app installs on the same EC2 instance.
To really assess whether Heroku is more expensive, it would be very helpful to know our current hosting costs, both for Dedupe.io and particular, and for a run-of-the-mill application.
(Thanks to @jeancochrane for putting in the legwork here!)