Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time interval usage metrics by logs from UCAN Stream #190

Open
vasco-santos opened this issue Apr 13, 2023 · 1 comment
Open

Time interval usage metrics by logs from UCAN Stream #190

vasco-santos opened this issue Apr 13, 2023 · 1 comment
Milestone

Comments

@vasco-santos
Copy link
Contributor

vasco-santos commented Apr 13, 2023

Context

Within w3up infrastructure, through the UCAN stream we track system wide metrics and space metrics. These metrics allow us to know overall system metrics and w3up users to know about their total usage. However, we have no visibility on real time volume of usage by each space. Based on the operation of older APIs, we see value on knowing usage volume, so that we can proactively avoid abused and get to know patterns.

Requirements

  • Ability to visualise real time usage for the spaces with most usage (ideally in Grafana)
    • ability to filter by capability executed, and if receipt or workflow
  • Manual query for usage of specific
  • Data durability limited (30 days?)
@vasco-santos
Copy link
Contributor Author

Available Options

  1. Kinesis Data Analytics for Apache Flink + Amazon timestream
  2. Kinesis Data Firehose + S3 + Athena

1. Kinesis Data Analytics for Apache Flink + Amazon timestream

  • Kinesis Data Analytics for Apache Flink

    • One consumer for the data stream that preprocesses and ingests data into Timestream.
    • Needs Apache flink application with tables
  • Amazon Timestream

    • scalable serverless time series database for operational events ingestion and querying
    • allows you to configure different retention periods for your tables to optimize storage costs
    • There is a grafana plugin

References

2. Kinesis Data Firehose + S3 Data Lake + Athena

  • Kinesis Data Firehose
    • fully managed service for delivering real-time streaming data to destinations such as Amazon Simple Storage Service (Amazon S3)
    • no need to write applications or manage resources. You configure your data producers to send data to Kinesis Data Firehose, and it automatically delivers the data to the destination that you specified
    • also possible to configure Kinesis Data Firehose to transform data before delivering it
  • Amazon Athena

References

@vasco-santos vasco-santos added this to the w3up phase 4 milestone Apr 13, 2023
@vasco-santos vasco-santos changed the title Real time usage metrics by logs from UCAN Stream Time interval usage metrics by logs from UCAN Stream Apr 13, 2023
travis added a commit that referenced this issue Oct 11, 2023
…le queries over the UCAN logs (#191)

This PR has an implementation of option 2 from
#190 (comment)

This encompasses a fair amount of functionality - partitioning the UCAN
logs into S3 buckets, configuring a Glue database and tables, adding
example queries to Athena and more. A partial list of functionality
follow:

- implement UCAN log partitioning in S3
- first partition by `type` - everything in "workflows" shows up in
"receipts" so this reduces the amount of data scanned by ~50%
- next partition by `op` to allow us to create tables that only query a
specific operation (ie, `store/add` or `provider/add`) - this lets us
add operation-specific Glue table schemas with much less clutter in
result types than we'd need if we tried to defined all possible inputs
and outputs in a single table
  - finally partition by date to allow queries to only load recent data

- use these partitions to implement standalone tables for receipts in
general and the `store/add`, `upload/add` and `provider/add` UCANs
specifically,
- add queries that demonstrate the use of all of these tables
- add dynamo connector so we can join the UCAN logs to our Dynamo tables
in queries
- add queries that demonstrate using the Dynamo and Glue tables together

---------

Co-authored-by: Travis Vachon <[email protected]>
Co-authored-by: Travis Vachon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant