Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research on analytics infrastructure #117

Open
harshita-srivastava-yral opened this issue Jan 9, 2025 · 6 comments
Open

Research on analytics infrastructure #117

harshita-srivastava-yral opened this issue Jan 9, 2025 · 6 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@harshita-srivastava-yral

Background

  • Investigate as big query is not good enough
  • We need to cater to 4 UCs:
  1. Dashboard and most tracked metrics are visible
  2. Custom querying powerful queuing on actual day unlike GA where there is leakage of data. Eg 1024 sign up will end up showing 1000 ad blocks etc make it more difficult…We were ok till now for trading with these number n accuracy
  3. Developer workflows - metrics tracing and logging..there are lot of overlaps
  4. PNs require reading into the actual data and listen to specific events…!
  • Our ingestion pipeline should be good for getting a sense of this
  • If we choose right tooling we can get all of it

Solution proposed

  • Vector.dev by Datadog has unified ingestion sources
  • It also has Kafka type que based mechanism

Actionables suggested

  • Primarily @komal-sai-yral to setup infrastructure for this and do research on vector.dev which is rust based platform
  • @vishnu-shankar-yral to work on tasks at hand and push Komal for the relevant use cases catering via Vecto.dev
@harshita-srivastava-yral
Copy link
Author

  • Directly use vector dev and try to align on the same

@komal-sai-yral
Copy link
Contributor

komal-sai-yral commented Jan 15, 2025

Task 1

  • is there wasm for vector.dev ?
  • test fly log shipper for other sources

Task 2

Task 3
sinks for logs

  • GCP cloud monitoring
  • S3
    • test with storj

Task 4

Task 5

  • canister logs

Task 6

  • migrate events to vector

@harshita-srivastava-yral
Copy link
Author

@vishnu-shankar-yral to deep dive into quickwit - https://github.com/quickwit-oss/quickwit

@harshita-srivastava-yral
Copy link
Author

  • DAU unique across all domains is the metric we would want to trigger it daily on "Events-alerts" space
  • Airflow job should run it periodic and push numbers

@siyara-m-yral
Copy link

  • ⁠Komal - Figured out vector isnt wasm compatible. Will start testing out the fly log shipper version

@siyara-m-yral
Copy link

  • Will be setting up quickwit today
  • Should be done around Monday

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

4 participants