It uses RSS feeds to find audio versions of FT articles.
The absorber reads RSS feeds from 3rd party providers of audio versions of the FT.
A check is run every two minutes.
For each item in the RSS feed, the absorber checks for a copy of the audio file (and an OGG version of it) in an S3 bucket, and for metadata associated with that file in a DynamoDB table.
If no file is found in the S3 bucket, a copy will be retrieved from the source the RSS feed points to, convert it to an OGG version and then upload both the source file and the OGG to the S3 bucket.
Any metadata contained in the RSS for the file (as query parameters on the audio file URL) will be stored in the DynamoDB table.
After the absorption, a PURGE
request will be made to the audio availability service to force an update of information at that UUID.
A file will only ever be overwritten if the existing file was created using an automated voice, and a new human-read version is available. At this point, all existing metadata and audio files will be replaced with the new human version.
An email will be sent to individuals listed in the ALERT_MAIL_RECIPIENTS
env var for each human-read audio file that is absorbed.
If a copy of the audio file already exists in the S3 bucket, or an entry exists in the DynamoDB table, it won't be updated. If a newer file exists, the existing MP3 file and database entry must be deleted for the absorber to add them back again.
Clone this repo, set your...
Environment variables can be set in a .env file.
The absorber needs the following environment variables to run:
- An AWS access key tied to an account with S3/DynamoDB read/write permissions
- The secret key for the access key.
- Valid JSON object describing the RSS feeds to parse Example JSON
{
"data" : [
{
"provider" : "third-party",
"provider_name" : "First 3rd Party Provider"
"url" : "https://validrssurl.com/feed",
"type" : "rss"
}
]
}
The provider
value is the shortname that will be used to identify providers in the DynamoDB tables. The provider_name
value is how you wish the 3rd party's name to be displayed on client-side applications.
The type can be either rss
for a standard RSS feed, or itunes
for and iTunes XML feed.
- The name of the S3 bucket that audio will be uploaded to.
- The expected file format of the files described by the RSS feeds
- The name of the DynamoDB table for logging the interations the absorber has with the RSS feeds and audio files therein.
- The name of the DynamoDB table for storing any metadata associated with audio files.
- The AWS region code for your AWS resources.
- The URL for Sentry error alerts
- The email address of the person who should be notified if the app breaks.
- A comma separated list of email addresses that will be alerted when a new audio file has been detected and stored.
- A valid auth token for the FTs emailing API.
- An absolute URL for the FT Labs Ingestion dashboard that displays the state of audio on the FT.
- Boolean value.
true
if you want the stderr/stdout of the FFMPEG jobs logged to the console.
- A comma-separated list of keys that RSS feed item must expose to the absorber to be considered valid.
- A comma-separated list of keys that RSS feeds must expose to the absorber to be considered valid.
- The URL for the audio-available service.