Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Kinesis delivery stream and Athena query infrastructure to enab…
…le queries over the UCAN logs (#191) This PR has an implementation of option 2 from #190 (comment) This encompasses a fair amount of functionality - partitioning the UCAN logs into S3 buckets, configuring a Glue database and tables, adding example queries to Athena and more. A partial list of functionality follow: - implement UCAN log partitioning in S3 - first partition by `type` - everything in "workflows" shows up in "receipts" so this reduces the amount of data scanned by ~50% - next partition by `op` to allow us to create tables that only query a specific operation (ie, `store/add` or `provider/add`) - this lets us add operation-specific Glue table schemas with much less clutter in result types than we'd need if we tried to defined all possible inputs and outputs in a single table - finally partition by date to allow queries to only load recent data - use these partitions to implement standalone tables for receipts in general and the `store/add`, `upload/add` and `provider/add` UCANs specifically, - add queries that demonstrate the use of all of these tables - add dynamo connector so we can join the UCAN logs to our Dynamo tables in queries - add queries that demonstrate using the Dynamo and Glue tables together --------- Co-authored-by: Travis Vachon <[email protected]> Co-authored-by: Travis Vachon <[email protected]>
- Loading branch information