Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One Click Node Deployment and Observability for Flow Nodes #268

Open
1 task done
haroldsphinx opened this issue Aug 14, 2024 · 0 comments
Open
1 task done

One Click Node Deployment and Observability for Flow Nodes #268

haroldsphinx opened this issue Aug 14, 2024 · 0 comments
Assignees

Comments

@haroldsphinx
Copy link

Grant Category

  • Developer Tools/Services

Introduction

Introducing Blockops Network

Blockops is a platform that makes it extremely easy for builders, solo-stakers and node-operators to deploy, maintain and manage their nodes ANYWHERE with complete observability.

This reduces the onboarding complexities for builders and stakers.

Our Products

We built a range of developer tools which includes:
Mission Control : for effortless multi-node deployment and maintenance ANYWHERE, and
Telescope : a privacy-preserving web3 observability tool for Node Operators and Blockchain Networks

Description

Problem Statement

The Problem

Building and maintaining blockchain nodes infrastructure is a tedious, costly and time consuming process

There’s little to no observability solutions for both Node Operators to monitor the performance of their nodes and for Blockchain Networks to monitor the performance of their Node Operators.

Onboarding Node Operators & Validators is time consuming and expensive due to the complexity and the technical challenges faced by Node Operators in setting up Nodes which may slow onboarding of Node Operators and Solo Stakers on Flow Blockchain.

Developers rely on traditional tools like Prometheus and Grafana, which doesn’t capture chain-specific data and are difficult to set up, this reliance delays error detection and harms network health.

The Domino Effect

  • Slashing & Offline Penalties
  • Slow pace of onboarding Solo Stakers & Node Operators
  • Reactive instead of being Proactive to Onchain or Client Software related Incidents due to poor observability
  • Blockchain Networks spend time managing NO instead of focusing on their core business
  • Business High Operating cost of running Flow Nodes might discourage Node Operators and Solo Stakers

Evidence of Need

  • The data and configs required to start a Flow node must be downloaded manually, this includes files likes the root block and execution state during a spork
  • When running flow nodes, secrets are stored in plain text on the filesystem
  • Lack of detailed documentation that walkthrough a flow node deployment process for a “not so” technical person looking to deploy flow nodes
  • The process of making changes to existing bootstrapped flow config is cumbersome
  • No central way to for Flow team to monitor the performance of their node operators or debug issues without having to tell Node Operators to download their logs manually and send it

Target audience

Every developers, builders and node operators on Flow

Proposed Solution

  • One Click Node Deployment platform ANYWHERE
  • Cost-effective hosting solutions tailored for running validator nodes
  • Scalable Node Deployments optimized for avoiding network penalties during migrations
  • Privacy Preserving Observability with telescope for all observability signals (metrics, logs, traces)
  • Real-time customizable monitoring dashboards and alerts
  • Detailed analytics and insights to optimize performance proactively

Our Solution Blueprint

Our proposed solution addresses the needs of both builders and node operators:

Accessible Yet Reliable Node Observability:
Comprehensive Visibility: Flow network can easily monitor all node operators running their clients.
Ready-Made Monitoring: Node operators get a one-click observability setup, user-friendly tool to oversee their entire node network.

Decentralised and Privacy-Conscious Monitoring:
Performance Alerts: Receive early warnings about potential node performance issues.
Web3-Specific Insights: Get tailored insights into onchain events across all Flow Nodes.
Privacy-Preserving Reporting: Obtain crash reports on client tools while safeguarding privacy and avoiding the storage of PII data.

Out-of-the-Box Monitoring Solutions
Node Fleet Oversight: A robust monitoring solution for managing all node fleets for node operators and enterprise clients.
Custom Alerts: Multiple alert channels to ensure node operators get the notifications they need.
Integrated SLOs: Service Level Objectives (SLOs) are built in for both node operators and networks.
Complete Observability Signals (metrics, logs and traces)
Advanced Log Analysis: Enhanced operational intelligence through sophisticated log analysis.

Telescope

Telescope serves as a complete observability platform designed specifically for decentralised applications (dApps) and node operators within the Flow ecosystem. It is a one-stop tool for web3 observability, enabling builders and networks to collect, store, and visualise all monitoring data, including metrics, logs, traces, and onchain events.

Telescope Key Features:

  • Privacy-preserving observability with Telescope for all signals (metrics, logs, traces and onchain events monitoring), giving developers and operators full visibility into node performance and smart contracts.
  • Real-time customizable monitoring dashboards and alerts
  • Detailed analytics and insights to optimise performance proactively
  • Alert integration with support for Telegram, Slack, Email, Pagerduty and SMS as integration channels
  • Tracing instrumentation with Telescope SDK’s

How Telescope Works (High Level, Not Complete)

8f468213f21543705cab8d67f08206469161e47c_2_389x500

Core Components, Protocols & Architecture
Collector: A Telescope agent is installed on the host (VM, k8s, web app, WebAssembly) via the CLI or SDK, depending on the host. The agent is responsible for collecting observability data from the host based on user preferences. This includes metrics, logs, traces, and runtime events.
Filters and Splitters: Filters and splitters are responsible for stripping sensitive information from the observability data, encrypting and converting these metrics into a SHA key that can only be decrypted with the user's private key, which will be generated upon Telescope installation.

Technology Stack Used
Rust
Golang
ClickHouse
Typescript
Apache Kafka
GraphQL
Streaming Fast

Mission Control Features

  • One click node deployment platform ANYWHERE (cloud and baremetal servers)
  • Cost-effective hosting solutions tailored for running all flow nodes types
  • Scalable node deployments optimised for avoiding network penalties during migrations
  • Vault integration
  • Node fleet management support (client upgrades, upscaling and downscaling of nodes with no slashing and zero downtime)

Impact

Our solution makes things easier and faster. We’ve identified key issues and built tools to help developers work more efficiently. Here’s how:

Node & Network Observability with Telescope:
Our ready-to-use observability dashboard offers Flow developers in-depth insights into node performance metrics, system health, and operational efficiency. It allows developers to make informed decisions that boost productivity and improve the reliability of both their projects and the broader blockchain networks.

Proactive Incident Management:
We help Flow developers to anticipate and resolve potential issues before they escalate. It is a proactive approach to incident management that leads to a smoother operational flow, improving overall developer experience and satisfaction.

Accelerated Time to Deployment:
Our solution simplifies the process of building decentralised applications and enterprise infrastructure on Flow, drastically reducing time to deployment.

Dedicated Node Provider Service:
Our service ensures reliable infrastructure and support for developers' blockchain and decentralised application needs.

Easy Onboarding of Builders:
We simplify the onboarding process for individual stakers within the Flow network, providing a user-friendly experience that lowers barriers to entry and encourages broader participation.

Running Highly Available Flow Nodes:
By operating highly available flow nodes, we minimise latency and enhance network performance, ensuring seamless communication within the Flow network and supporting a more robust developer ecosystem.

Milestones and Funding

Mission Control

Milestone Task Deliverables Timeline
1 Node Deployment and Management a. Node Deployment into Cloud Providers (GCP, AWS)
b. Node Deployment to Bare Metal Server & Cherry Server
4 weeks
2 Keys Management with Integration to Vault or Web3Signer Self Custody Key Management Integration 2 weeks
3 Support Integration a. Integrate Support for Digital Ocean, OVH and Cherry Servers
b. Integrate Support for SSH Module Node Deployment for Bare Metal Servers
c. Integrate Kubernetes Cluster Supports
2 week
4 Documentation Details and instructions for the setup 1 week
5 Testing & Adoption Evaluation of setup to ensure it meets specified requirements 1 week
Mission Control Release Summary 10 weeks

Telescope

Milestone Task Deliverables Timeline
1 Offchain Monitoring (Node & Network) a. Develop SDKs for integration into Web3 protocols (rust, golang and typescript)
b. Implement core system functionalities: data filtering, splitting, and encryption; secret sharing; and anomaly detection.
c. Perform unit and integration testing.
8 weeks
2 OnChain Monitoring a. ETL: Extract, Transform, Load of on-chain data for the Flow Network into a data warehouse.
b. Building customisable Grafana-like analytics dashboards.
6 weeks
3 Alert Integration a. Integrate alert customization feature for onchain metrics & analytics
b. Integrate alert customization feature for offchain (nodes) metrics and dashboards.
c. Integrate support for major notification channels (Discord, Slack, Email).
6 weeks
4 Documentation Details and instructions for the setup 1 week
5 Testing & Adoption Evaluation of setup to ensure it meets specified requirements 1 week
6 Audits SOC 1 Certification 2 weeks
Telescope v1.0.0 Release Summary a. Public chain monitoring for all system chains.
b. Subscribe to RSS feed notifications for system chain activities: upgrades, forks, downtime.
c. Set up custom alerts to different channels for custom events, both off-chain and on-chain.
d. Visualise metrics, logs, and traces for all system chains.
e. Provide an invite link to allow any user to view the logs dashboard for debugging or collaboration.
f. Instrument observability into protocols and system chains using Telescope SDKs (supporting Rust, Go, and TypeScript).
24 weeks

Note: Timeline for certification may vary based on external factors.

Cost Breakdown

Phase USD Proposal
Telescope and Mission Control Delivery for Flow Testnet $15,000
Telescope and Mission Control Delivery for Flow Mainnet $15,000

Total Funding Proposed: $30,000

Team

Name Role Contact
Adedayo Akinpelu Team Lead & SRE Engineer [email protected]
Franklin Okpako DevOps & SRE Engineer [email protected]
Calvin Puram DevOps & SRE Engineer [email protected]
Bethel Irumudomon Product Manager [email protected]
Osaro Igbinovia Backend Engineer [email protected]
Opeyemi Ogunbode Frontend Engineer [email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: New: In review
Development

No branches or pull requests

2 participants