An AWS CDK (Cloud Development Kit) project that automatically processes Instagram posts (given only their ids) to extract and serve event information, with a focus on free food events at UBC (but can be customized for other use cases). The stack costs less than $0.01/3K posts for the scraping feature (excludes AI processing fee).
This system scrapes Instagram posts, processes them using AI to extract structured event data, and serves the information via a public REST API. The architecture is designed to handle Instagram post IDs asynchronously, using a queue-based processing system with AI-powered event extraction.
- The Local Instagram ID Scraper sends IDs to the enqueue lambda to be processed (The scraper must be run on your local residential IP address, and you can create a cron job to run it locally periodically).
- The enqueue lambda saves the ID to the DynamoDB table and sends a message to the SQS queue which depends on whether the ID is already in the table.
- The enqueue lambda adds the ID to the fifo SQS queue.
- A dequeue lambda is triggered every 2 minutes to process the oldest ID from the queue (throttled due to free link preview API limit).
- Scraped data (+ AI processed data) is saved to the DynamoDB table.
- A client requests the data from the API Gateway.
- The API Gateway triggers the get-events lambda to query the DynamoDB table for the data.
- The get-events lambda fetches the data from the DynamoDB table and returns it to the client.
- Purpose: Stores processed event data
- Partition Key:
id(Instagram post ID) - Schema:
id: Instagram post ID (string)preview_data: Link preview metadata (title, description, image URL)ai_data: Structured event data extracted by AI, including:event_name: Name of the eventdescription: 1-2 sentence summaryhas_food_or_drinks: Boolean flagfood_or_drinks: List of food/drinks serveddatetime: Event date/time (ISO format)location: Event locationfree: Whether the event is free (yes/no/unsure)instagram_url: Link to the original Instagram post
- Capacity: Provisioned with 25 read/write capacity units
- Purpose: Queues Instagram post IDs to throttle processing of posts (limited by free instagram preview API)
- Type: FIFO (First-In-First-Out) to ensure ordered processing
- Integration: Receives messages from
enqueue-instagram-idLambda, consumed bydequeue-instagram-idLambda
- Runtime: Python 3.10
- Trigger: Function URL (public, no authentication)
- Purpose: Receives Instagram post IDs and enqueues them for processing
- Workflow:
- Receives POST request with
instagramIdin body - Checks if Instagram ID already exists in DynamoDB
- If new, creates a placeholder record in DynamoDB
- Sends message to SQS FIFO queue
- Receives POST request with
- Returns: JSON object with
enqueuedfield (true if new, false if already exists) - Permissions: Read/Write access to DynamoDB, Send messages to SQS
- Runtime: Python 3.10
- Trigger: EventBridge rule (every 2 minutes)
- Timeout: 60 seconds
- Purpose: Processes queued Instagram posts to extract event information
- Workflow:
- Dequeues one message from SQS FIFO queue
- Fetches link preview data using Link Preview API (title, description, image)
- Saves preview data to DynamoDB
- If food-related, uses AWS Bedrock (Claude Haiku) to extract structured event data:
- Analyzes image and text context
- Extracts event name, description, datetime, location, food info
- Saves structured data to DynamoDB
- Dependencies:
- Link Preview API (requires
LINK_PREVIEW_API_KEYandLINK_PREVIEW_API_URLenvironment variables) - AWS Bedrock (Claude 4.5 Haiku model)
- Link Preview API (requires
- Permissions: Read/Write access to DynamoDB, Consume messages from SQS, Invoke Bedrock models
- Runtime: Python 3.10
- Trigger: API Gateway REST API
- Timeout: 10 seconds
- Purpose: Public API endpoint to query events
- Workflow:
- Queries DynamoDB for events within date range (today to 2 weeks in future)
- Filters by
ai_data.datetimefield - Returns JSON array of events
- API Endpoint:
GET /(root path) - CORS: Enabled for all origins (configure for production)
- Throttling: 10 requests/second, burst limit of 10 (ensures users can't spam the endpoint)
- Permissions: Read access to DynamoDB
- Type: REST API
- Stage:
prod - Features:
- CORS enabled
- Rate limiting (10 req/s, burst 10)
- Integrated with
get-eventsLambda
- Schedule: Every 2 minutes (Link Preview API free tier technically has a limit of 60 requests per hour and this can be increased to every minute)
- Target:
dequeue-instagram-idLambda - Purpose: Triggers automatic processing of queued Instagram posts
1. External System
↓ (POST Instagram ID)
enqueue-instagram-id Lambda (Function URL)
↓ (if new)
DynamoDB (create placeholder) + SQS FIFO Queue
2. EventBridge (every 2 minutes)
↓ (trigger)
dequeue-instagram-id Lambda
↓ (dequeue from SQS)
Link Preview API → Preview Data
↓ (if food-related)
AWS Bedrock (Claude Haiku) → Structured Event Data
↓ (save)
DynamoDB (update with preview_data and ai_data)
3. Public Client
↓ (GET request)
API Gateway → get-events Lambda
↓ (query)
DynamoDB (filter by date range)
↓ (return)
JSON response with events
- Node.js (v22)
- AWS CLI configured with appropriate credentials
- AWS CDK CLI installed (
npm install -g aws-cdk) - Python 3.10 (for Lambda functions)
-
Install dependencies:
npm install
-
Configure environment variables: Create a
.envfile or set environment variables:ACCOUNT_ID=your-aws-account-id REGION=your-aws-region AWS_ACCESS_KEY=your-aws-access-key AWS_SECRET_ACCESS_KEY=your-aws-secret-access-key LINK_PREVIEW_API_KEY=your-api-key LINK_PREVIEW_API_URL=your-link-preview-api-url
-
Deploy: Deployment is automatically run on push to main; is configured with GitHub Actions here file
Send a POST request to the enqueue-instagram-id Function URL:
curl -X POST https://<function-url> \
-H "Content-Type: application/json" \
-d '{"instagramId": "ABC123XYZ"}'The Instagram ID should be the post ID from the Instagram URL (e.g., from https://www.instagram.com/p/ABC123XYZ/, the ID is ABC123XYZ).
Query events via the API Gateway endpoint:
curl https://<api-gateway-url>/prodThe API returns events from today up to 2 weeks in the future, filtered by the ai_data.datetime field.
