Skip to content

Latest commit

 

History

History
148 lines (113 loc) · 5.51 KB

README.md

File metadata and controls

148 lines (113 loc) · 5.51 KB

No more F5!

Build Status

I procrastinate a lot by reloading webpages, looking for new content. However, I don't like being a Skinner box rat, so I wrote this digest generator to tame my FOMO.

⚠️ no-more-f5 works only with Java 8! Java 9 is not supported.

Installation

Cloud stack

Rough idea of the required cloud stack:

  • Emails are sent using AWS SES via the SMTP protocol.
  • The function itself is deployed to AWS Lambda and is triggered by a scheduled CloudWatch event (cron).

If you want to know, here's the motivation for this stack:

Why SMTP protocol?

Since we need to scrape RSS feeds, we need Internet access. This can be configured in two ways:

  • Place the Lambda function outside a VPC and connecting to SES via SMTP.
  • Place the Lambda function inside a VPC and route Internet traffic through a NAT Gateway. In this case we can talk to SES directly.

I use the former way. Although SMTP emails cost a little bit more, configuring a VPC and a NAT Gateway is tedious and a NAT Gateway is certainly much more expensive than the SMTP emails. However, if you already have one, you can certainly try it. YMMV.

Building and packaging your function

You will need Leiningen to build your uberjar. But first, create a list of your Atom/RSS feeds and save it in a file, e.g. my_feeds:

$ cat > my_feeds <<EOF
https://github.com/BurntSushi/ripgrep/releases.atom
https://github.com/atom/atom/releases.atom
EOF

Now we build a standalone uberjar and add my_feeds to it (remember, jars are just zip archives). This process is automated in prepare_package.sh (specify your feeds file as a call parameter):

$ ./prepare_package.sh my_feeds

Preparing your SES

  1. Verify your email address in SES.
  2. Create SMTP credentials and save them -- we'll need them later.

Important: Creating SMTP credentials also creates an IAM user. Do not use this user's credentials for the SMTP server!

Creating and configuring the Lambda function

  1. Create a new Lambda function.
  2. Use a standard IAM role, just enough to store CloudWatch logs.
  3. Select Java 8 as runtime.
  4. Add a CloudWatch event as a trigger. Schedule it to something like cron(0 6 * * ? *), i.e. every day at 6:00 UTC.
  5. Choose something around 384 MB memory and 90 seconds timeout (depends heavily on the number of feeds you want to digest).
  6. Set handler to no_more_f5.core::handler
  7. Now we need to setup environment variables. Add following envvars:
Variable Note Example
FEEDS Filename of the file with your feed URLs my_feeds
USER_AGENT See below Mozilla/5.0 ...
SMTP_SERVER Address of your AWS SES SMTP server email-smtp.eu-west-1.amazonaws.com
SMTP_PORT SMTP server port, check out your SES docs 587
SMTP_USER Use your SES SMTP credentials here
SMTP_PASS Use your SES SMTP credentials here
EMAIL_FROM Must be verified in AWS SES [email protected]
EMAIL_TO All of them must be verified in AWS SES [email protected], [email protected]
SINGLE_SITE_TIMEOUT Timeout for each fetching connection 2000

You need to specify USER_AGENT since some sites block scrapers without it. Just use something similar to your main browser.

EMAIL_TO can contain multiple addresses, separated by commas. Make sure you use only verified addresses if you are still in the SES Sandbox mode.

SINGLE_SITE_TIMEOUT is helpful if some feed is unresponsive. Instead of timing out the whole Lambda function, you'll just get an exception message for the unresponsive feed.

Ok, you should be ready to go! Create a dummy testing event (just use an empty dict {} as context) and see if you've got a digest in your inbox!

Configuring CloudWatch logs retention

One more thing: Go to CloudWatch and configure log retention for your no-more-f5 log group. Set it to something reasonable, e.g. 7 days. Storing a lot of logs (several GBs) might be expensive and it's just not worth it in this case.

Local dev environment

For local testing, create a profiles.clj file in the root repo folder. Add the following map to it:

{:dev
  {:env
    {
      :feeds "dev_feeds"
      :single-site-timeout "2000"
      :smtp-user "..."
      :smtp-pass "..."
      :smtp-server "email-smtp.eu-west-1.amazonaws.com"
      :smtp-port "587"
      :user-agent "..."
      :email-from "..."
      :email-to "..."
      }
    }
  }

Then just use lein run to run the app. Alternatively, you can set all required environment variables and call

$ java -cp <path_to_your_uberjar> no_more_f5.core

If you have your own server running 24/7, you can schedule local execution with cron. And of course you can use your own email account, just make sure to get an app token for SMTP instead of using your password.

How much is the fish?

No idea, I'll update this when I get my first monthly bill. But probably not much.