Skip to content

Lambda Function getRedditDataFunction

Kenneth Myers edited this page Apr 12, 2023 · 3 revisions

The code for the Lambda Function to retrieve Reddit data can be found here. These steps assume you have already performed the previous instructions including setting up your local AWS profile:

  1. You must create an application under your reddit account to use the API. This could be a personal use script with the redirect uri pointing to https://127.0.0.1. You need the client ID and client secret as well as your username and password. Make a copy of example_reddit.cfg in the same root directory renamed reddit.cfg and place these credentials in the file. This is not good practice but it's what I'm working with for now. This file should be ignored by .gitignore but double check that you don't commit it. This config file gets parsed by the lambda function.
  2. Create an S3 bucket for like packages-[my-name]. We will be using this to store python packages that the the lambda environment does not inherently have access to.
  3. In the scripts/ directory, zip the PRAW package and write it to s3 using sh zipPythonPackage.sh -p praw==7.7.0 -s [s3-bucket-name] -u [sso-profile-name], substitute the praw version if appropriate. This will download the package, zip it, and push it to the s3 location. See this link for additional information about this.
  4. Create the new Lambda Function. I named mine the same as the function, getRedditDataFunction. Choose a runtime that matches your project ie. python 3.7. I created a new role for this function, we will add more permissions to this later. I left all other settings as they were default.
  5. Again in the scripts/ directory, zip the lambda function code using sh zipLambdaFunction.sh -f getRedditDataFunction. This will output the zipped code to scripts/zippedLambdaFunction.
  6. On the UI for the Lambda Function you created, under Code Source upload that zipped package.
  7. Scroll down to the bottom of the page to where it says Layers and click add new layer. Under this page click "add new layer" (very small text) and create a new layer for praw and upload the zipped package from S3 that you created in Step 3 (see image below). After this return to the previous page and use this new layer as a custom layer and add it.

  1. Before this function can write the data to DynamoDB, it must first have permissions to do so. Go to IAM > Roles and look for the newly created role that would be something like getRedditDataFunction-role-[some-id] and click on it. I created a new permission set and called it dynamoAdministrator and gave it the following definition which also gives it power to create tables (something the lambda function will do if you haven't already made the tables or tested the code locally).
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "dynamodb:BatchGetItem",
                "dynamodb:BatchWriteItem",
                "dynamodb:ConditionCheckItem",
                "dynamodb:PutItem",
                "dynamodb:ListTables",
                "dynamodb:DeleteItem",
                "dynamodb:Scan",
                "dynamodb:Query",
                "dynamodb:DescribeStream",
                "dynamodb:UpdateItem",
                "dynamodb:ListStreams",
                "dynamodb:CreateTable",
                "dynamodb:DescribeTable",
                "dynamodb:GetShardIterator",
                "dynamodb:GetItem",
                "dynamodb:DescribeLimits",
                "dynamodb:GetRecords"
            ],
            "Resource": "*"
        }
    ]
}
  1. Going back to the lambda function, you can now run a test and see if it can run successfully. If there is an error then you will need to debug that but if it is successful then you will be able to continue to the last step.
  2. The last step is to create the trigger that will run the lambda function periodically. In the Function Overview of the lambda function, click "Add Trigger" and on the next page under "Select a Source" choose "EventBridge". Create a new rule, give it a name and description, and under "Schedule expression" put rate(1 minute). This is essentially a cron job that will now trigger the lambda function every minute. Click add and your lambda function will start working!

Additional notes:

  • The tableDefinition defines many of the parameters that create the table including global secondary index and the RCU and WCU. The GSI is needed for later work, but the RCU and WCU can be adjusted now or later. You can also add additional index definitions here or later.
Clone this wiki locally