This is a remote service which can receive an HTML document from a client application and return a URL to a rendered PDF document.
The service is implemented using the following AWS services:
- A Lambda function written in GoLang
- An API Gateway to provide HTTP access to the Lambda function
- An S3 bucket to cache the rendered PDF document
The actual rendering of the PDF document is handled by wkhtmltopdf 0.12.4 (with patched qt).
Security is implemented on two levels:
- The client includes an Authorization header when it posts HTML to the service through the API Gateway.
- The service returns a signed URL to the rendered PDF document with an expiry.
The setup required for this framework is extensive. I'll be providing step-by-step instructions on how to set up this service in AWS, and anything not covered will be linked to the AWS documentation.
It is my hope this document will not only help you generate PDF documents for your applications, but also serve as a learning tool for AWS services.
The approach to generating PDF documents has potential drawbacks.
- wkhtmltopdf may not render the provided HTML and CSS according to current standards. It's up to the developer to adapt their source code to the renderer.
- Image assets aren't included with the source code payload, meaning that wkhtmltopdf must request each image to add to the PDF document. If a large number of images are included in the source code, or those images are very large, the time needed to download them may cause the Lambda function or the API gateway to timeout.
- I'm not an expert in AWS. The infrastructure choices made here may not be the best. Use at your own risk.
This service is written in the Go programming language. You should have Go already installed. Install all dependencies by running go get ./...
from your project directory.
Clone this project into your ~/go/src
project directory with:
git clone https://github.com/jjmontgo/pdf-service
Create your AWS Account if you don't already have one.
You'll also need to set up the AWS command line interface on your system.
- Choose Services from the main menu. Search for Lambda and open the Lambda service.
- Click the "Create a Function" button.
- Leave the default option "Author from scratch."
- Enter the name of the Lambda function. The name of the function should be the same name as the folder directory of your Go project. In this case, it's
pdf-service
. - Choose Go 1.x for the runtime.
- Under Role, choose Create a custom role.
- Leave the default IAM Role option "Create a new IAM role" selected.
- Enter a descriptive role name. For example:
pdf-service-role
. - A basic policy document is attached to the role. This policy allows the role to execute the Lambda function and write CloudWatch logs. You'll be giving the role access to other services later. Create the role.
- You'll be redirected back to the "Create function" form. Make sure your new role is selected, and click the "Create function" button.
- In the function's Configuration tab, scroll down to the section called "Function code". In the Handler text box, change the handler name from "hello" to the Go executable file name. It should be the same name as your function, which is
pdf-service
. - Scroll down to the Basic Settings section and change the Timeout to 5 minutes.
- There is a deployment script in the project,
deploy.sh
, which will compile the Go code and deploy it to your Lambda function. Run it withsh deploy.sh
.
Now you're going to create an S3 bucket to cache rendered PDF files. The service caches requests by generating an MD5 digest from the request and using it as part of the object key.
- Choose Service from the main menu. Search for S3 and open the S3 service.
- Click the "Create bucket" button.
- Enter a meaningful bucket name. eg. "pdf-service-cache"
- Leave all other settings as default. By default, the bucket is not publically accessible.
- Click the bucket's table row in the list of buckets. An information panel slides in from the right. Click the button "Copy Bucket ARN" (Amazon Resource Name) and keep it in your buffer. You'll need the bucket ARN in the following section to give the Lambda function access to the bucket.
Return to the Lambda function and scroll down to the section "Environment Variables". Add a new variable called BUCKET_NAME
with the name of the S3 bucket you just created. The function will now know which bucket in which to drop the PDF.
The function has to have permission to write to the bucket. You'll set this permission by adding a permission policy to the role you created when you set up the function.
- Choose Services from the main menu. Search for IAM and open the IAM service.
- Choose Roles from the left-hand menu.
- Choose the role you created when you set up the Lambda function, eg. "pdf-service-role".
- The Permissions tab should be open by default. Under this tab, click "Attach policies."
- Click the "Create Policy" button. A new tab is opened.
- Click Service, and choose S3.
- Click Actions, and click the checkbox for All S3 Actions.
- Click Resources, and beside
bucket
, click Add ARN. Paste the Bucket ARN you copied from the previous section, and click Save Changes. Besideobject
, click Add ARN. Paste the Bucket ARN again, and for Object name click the Any checkbox. Click the Add button. - Click the Review Policy button.
- In the Name field, choose a highly descriptive name for the policy. eg.
full-access-pdf-service-cache
- Click the Create Policy button.
- Return to the
pdf-service-role
in the previous tab. You can now add the created policy to the role. - Click the
Filter Policies
link. Check theCustomer Managed
checkbox. Choose the policy you just created and click theAttach Policy
button.
In order to access the Lambda function from the internet, you'll need to use the Amazon API Gateway. Here is how to set that up:
- Choose services from the main menu. Search for API Gateway and open the API Gateway service.
- If you haven't created an API before, you should see a "Get Started" button. Click it.
- Choose New API.
- Enter an API name, eg. "PDF Service API". Leave everything else as default and click "Create API".
- In Resource, click the
Actions
drop-down menu and choose Create Method. - HTML is going to be passed to the API through a POST method. So choose
POST
and click the checkmark button. - In Integration Type, leave Lambda Function selected.
- Check the checkbox beside
Use Lambda Proxy integration
. Requests will be proxied to Lambda with request details available in the "event" of the function handler. - Leave the default Lambda Region selected.
- In the Lambda Function text box, enter the name of the Lambda function. An autocomplete dropdown will open to let you select the name.
- Leave
Use Default Timeout
checked and click the Save button. - A modal opens telling you you're giving the API Gateway permission to invoke your Lambda function. Click OK.
- Under the Actions drop-down menu, choose Deploy API.
- Under Deployment Stage, choose New Stage. We're only going to have one stage. For stage name, enter "prod" and click Deploy.
- You should now see an Invoke URL in the prod Stage Editor. This is the URL to which you'll POST your HTML to convert to PDF.
The Lambda function is going to store the generated PDF in the S3 bucket, and return a signed URL to the client. The signed URL provides temporary access to the document. To sign the URL, you'll need to do the following:
- Create a cloudfront distribution as a frontend to the PDF S3 bucket, which is only accessible through signed URLs.
- Get your cloudfront signing keys from your root AWS account.
- Create another S3 bucket to store the cloudfront private key.
- Give the Lambda function access to the S3 bucket with the private key.
- Choose services from the main menu. Search for CloudFront and open the CloudFront service.
- Click the
Create Distribution
button. - Under the Web distribution option, click the
Get Started
button. - In the Origin Domain Name, choose the S3 Bucket, eg. "pdf-service-cache"
- For the
Restrict Bucket Access
option, chooseYes
. - For the
Origin Access Identity
option, chooseCreate a New Identity
. - For the
Grant Read Permissions on Bucket
option, chooseYes, Update Bucket Policy
. - Scroll down to the option
Restrict Viewer Access (Use Signed URLs or Signed Cookies)
and change it toYes
. - Leave all other options on their defaults, and click the
Create Distribution
button. - Add the URL of the distribution to your Lambda function with an environment variable called
CLOUDFRONT_PRIVATE_URL
. The URL is listed under theDomain Name
column of the cloudfront distribution list. It uses the format .cloudfront.net.
- You'll need to login to AWS using your root account, which is the only place the keys are available.
- Click your name in the top menu and choose
My Security Credentials
. - Dismiss the popup window by clicking
Continue to Security Credentials
. - Open the
CloudFront key pairs
section. - Click the
Create New Keypair
button. Download both the public and private keys and save them. - Your Lambda function will need to know the public key through an environment variable. Return to the function and add an environment variable called
CLOUDFRONT_KEY_PAIR_ID
, setting the value to the public key you just downloaded.
I never keep secret data in version control. AWS Secrets Manager might have been a good place for storing keys, but I don't want to pay $0.40 per secret per month. So I decided to keep the cloudfront private key in a second S3 bucket and fetch it from the Lambda function.
- Create a second S3 bucket and name it something appropriate, like "app-private-keys". Leave all the settings default so the bucket isn't accessible by anyone.
- Upload the private key you downloaded in the previous section in the new bucket. Name it something like "cloudfront-private-key.pem".
- Add a new environment variable to the Lambda function called
APP_KEYS_BUCKET_NAME
with the name of the bucket. - Add another environment variable to the Lambda fucntion called
CLOUDFRONT_PRIVATE_KEY_OBJECT_NAME
and set it to the key name in the bucket, "cloudfront-private-key.pem". Remember to clickSave
. - Now you'll need to give your Lambda function access to the private key. You can do this by adding a permission policy to the function's role that gives it access to both the bucket and the key in the bucket.
- Click the row representing your new private keys bucket. Click the
Copy Bucket ARN
button in the panel that slides in from the right. - Under Services, choose IAM again.
- Click
Roles
in the left-hand menu. - Open the Lambda function's role (
pdf-service-role
). - Click the
Attach policies
button. - Click the
Create policy
button. - Under Service, choose S3.
- Under Actions, check the
Read
access level. - Under Resources, you will first add the bucket ARN you copied earlier. You will also need to provide the Object. Click
Add ARN
and paste the bucket ARN again besideBucket Name
. Then enter the object name for your private key, eg. cloudfront-private-key.pem. Click theAdd
button. - Click the
Review policy
button. - Enter a good name for the policy, such as
access-cloudfront-private-key
. - Click the
Create Policy
button. - Return to the Lambda function role in the previous tab.
- Click the
Filter policies
link and chooseCustomer managed
. - Click the Refresh button in the top right-hand corner so that the new policy you created shows up in the list.
- Check the new policy and click the
Attach policy
button.
To create signed URLs for a cloudfront distribution, you use cloudfront keys. But to create signed URLs to access Amazon resources such as API Gateway, you need to create an IAM user with API keys.
- Choose services from the main menu. Search for IAM and open the IAM Management Console.
- Click
Users
in the left hand menu. - Click the
Add user
button. - Give the user the user name
pdf-service
. Under "Access Type", selectProgrammatic Access
. ClickNext
. - Under "Set Permissions," choose
Attach existing policies directly
. Click theCreate Policy
button. - Beside "Service," click
Choose a service
. Search forExecuteAPI
and select it. - Under "Access level," check the box for
All ExecuteAPI actions (execute-api:*)
. - Click the
Resources
section and chooseAdd ARN
. Unfortunately, this part is a little difficult and will require you to open the AWS console in a separate tab to get the following information. 8.1. Enter theRegion
your API is in using the right code. 8.2. Enter yourAccount
number. To get this number, click the drop-down menu item of your name in the main menu, and clickMy Account
. Your Account number is under "Account Settings" and beside "Account Id". 8.3. Enter theApi id
. Return to your API Gateway and you'll see your Api id in the breadcrumb menu at the end. eg. APIs > PDF Service (Api Id is here
). 8.4. For theStage
, enter * for Any stage. 8.5. For the method, enterPOST
. - Click
Review Policy
. Enter the namepdf-service-api-gateway
and clickCreate policy
. - Return to the user you were creating in step 4, and add the new policy you created to the user. You may need to click the Refresh button for the policy to appear in the list.
- Click the
Create user
button. - You should now see the "Access key ID" and "Secret access key" for the new user. Record both of these values.
- Choose services from the main menu. Search for "API Gateway" and open the gateway you set up earlier.
- Open the POST method by clicking on
POST
. - Click the
Method Request
link. - Under Settings, beside Authorization, click the edit pencil icon.
- Choose
AWS_IAM
and click the checkmark.
To retrieve a URL to a rendered PDF, you'll need to do the following in your client code:
- Generate an authorization header using your IAM user's public and private API keys: https://docs.aws.amazon.com/apigateway/api-reference/signing-requests/
- POST the HTML to the API Gateway as query parameters in the body of the request, with the authorization header.
- The service will return a URL to the generated PDF document. You can redirect the user to this URL, or anything else you like.
You'll need to implement this in your language of choice. In my case, the client language was PHP. So I've included a class that implements the process with this gist.