Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate Running a Passport ETH Archive Node #2793

Open
2 tasks
Jkd-eth opened this issue Aug 20, 2024 · 2 comments
Open
2 tasks

Evaluate Running a Passport ETH Archive Node #2793

Jkd-eth opened this issue Aug 20, 2024 · 2 comments
Assignees

Comments

@Jkd-eth
Copy link

Jkd-eth commented Aug 20, 2024

User Story:

As the Passport team,
I want to have unlimited and cost-effective access to Ethereum transaction data,
So that we can efficiently run analyses and query the blockchain without incurring high third-party costs.

Acceptance Critiria

GIVEN the need for detailed blockchain data analysis,
WHEN evaluating different options for hosting an Ethereum archive node,
THEN we should decide on the most cost-effective and efficient setup for Passport.

Exploration Goals

  1. Evaluate Feasibility: Explore running an Ethereum archive node using Reth and Trueblocks, including the option of using dedicated hardware.
  2. Cost Analysis: Determine the monthly operating cost for each option, both cloud-based (e.g., AWS) and local hardware setups.
  3. Comparison: Compare Reth, Trueblocks, and other potential solutions, focusing on functionality, cost, and ease of integration.
  4. Decision Making: Provide a go/no-go recommendation based on findings.
  5. Next Steps: Outline subsequent stories and tasks if the decision is to proceed with running our own node.

Product & Design Links:

Tech Details:

  • Previous cost estimation of $500/month for AWS setup: Cost Estimate
  • Hardware specifications and requirements for local setup
  • Reth and Trueblocks installation and configuration details

Open Questions:

  • Can this setup support multiple blockchains, or is it limited to Ethereum?
  • How does this impact our current reliance on Alchemy, and will it reduce our associated costs?

Notes/Assumptions:

  • Assumption: We have access to the required hardware or cloud infrastructure for testing.
  • Note: The exploration should not exceed 3 days to keep costs and time investment reasonable.
@erichfi erichfi changed the title Spike: Passport Eth Archive node Evaluate Running a Passport ETH Archive Node Aug 31, 2024
@erichfi
Copy link
Collaborator

erichfi commented Sep 3, 2024

Let's see if we still need a centralized transaction data cache in case we run an Archive Node: #2829

@tim-schultz tim-schultz self-assigned this Sep 9, 2024
@tim-schultz
Copy link
Collaborator

Ethereum Node Requirements

Mainnet Nodes

RETH Mainnet

  • Disk: 2.2TB+ (TLC NVMe recommended)
  • RAM: 8GB+
  • Note: NVMe disk crucial for I/O performance

GETH Mainnet

  • RAM: 16GB+ recommended
  • Disk: 12TB+ for full archive node

Layer 2 Solutions

Optimism / OP Stack

  • RAM: 16GB+
  • Disk: 2TB SSD (NVMe recommended)

Arbitrum

  • RAM: 16GB
  • CPU: 4 cores (single-core performance important)
  • Storage (as of April 2024):
    • Arbitrum One: 560GB (pruned), growing ~200GB/month
    • Arbitrum Nova: 400GB (pruned), growing ~1.6TB/month
  • Note: NVMe SSD recommended for both

Polygon

Component Minimum Recommended
CPU 4 cores 16 cores
RAM 32GB 64GB
Storage 2.5TB 5TB
Bandwidth 100 Mbps+ 1 Gbps
AWS instance c5.4xlarge m5d.4xlarge

AWS Configurations for Polygon Nodes

Basic Node (single to double-digit RPC requests/s)

  • Compute: m7g.4xlarge (16 vCPU, 64GB RAM)
  • Storage: 7000 GB EBS GP3, 16000 IOPS, 1000 MBps throughput

High-Performance Node (hundreds of RPC requests/s)

  • Compute: im4gn.4xlarge (16 vCPU, 64GB RAM)
  • Storage: 7500 GB instance storage, 50 GB EBS gp3 root volume

Maximum Performance Node (up to 1000 RPC requests/s)

  • Compute: m7g.4xlarge (16 vCPU, 64GB RAM)
  • Storage: 7000 GB EBS IO2 with 16000 IOPS

Ethereum Node Testing and Deployment Summary

Local Testing

Hardware Specifications

  • RAM: 32GB
  • Storage: 4TB SSD
  • Network: ranged between 5 - 15 Mb/s upload and download speeds

Equipment

Node Synchronization

  • Used RETH from Merkle.io snapshots
  • 1.3TB file downloaded and extracted overnight (approximately 12 hours)
  • Node is keeping up with the network tip, including pending blocks

Performance Test

  • Downloaded all USDC transfer events to parquet files (Block 0 to present)
  • Command: cryo erc20_transfers --contract 0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48 --rpc localhost:8545
  • Duration: 1 hour and 10 minutes
  • Additional data types available: Cryo Datasets

Data Query Performance

  • Specific address transfer events retrieved in 6 seconds
  • Potential for joining data on a large number of addresses
  • Example query: Gist Link
  • Could also build a well indexed DB for pulled data

Running Additional Nodes

  • L2 nodes need to run in parallel with the mainnet node
  • Additional storage required for each L2 node
  • Lack of good L2 snapshot sources noted

ISP Requirements

  • Unlimited bandwidth is necessary

Pricing Considerations

Latitude (Suggested by RETH)

  • Model: rs4.metal.large
  • Price: $2.20/hr or $1606/month
  • Storage: $0.64/TB
  • Capability: Could theoretically run all currently used nodes
  • Note: Can be turned on/off as needed, but requires sync time

AWS

  • Model: m6gd.16xlarge
  • Price: $2.8928/hr or $2108/month
  • Additional EBS storage costs apply
  • Potential for optimization using powerful processor with cheaper EBS storage

Kammerdeiner

  • Pricing: TBD

Individual Network Pricing

  • If certain networks are a priority, pricing can be obtained for individual networks

Other Considerations

  • Time-consuming to ensure and maintain node functionality
  • Potential efficiency in bulk data pulling from current Alchemy account on a regular basis
  • Further discussion needed on feasibility of using these tools for analysis

Saving on Costs

  • Cryo/TrueBlocks are awesome tools for data extraction from RPC nodes
  • We could consider bulk pulling data from our current Alchemy account and running analysis on our data(postgres or parquet) if this would work for the data team

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

3 participants