Evaluate Running a Passport ETH Archive Node #2793

Jkd-eth · 2024-08-20T15:58:01Z

User Story:

As the Passport team,
I want to have unlimited and cost-effective access to Ethereum transaction data,
So that we can efficiently run analyses and query the blockchain without incurring high third-party costs.

Acceptance Critiria

GIVEN the need for detailed blockchain data analysis,
WHEN evaluating different options for hosting an Ethereum archive node,
THEN we should decide on the most cost-effective and efficient setup for Passport.

Exploration Goals

Evaluate Feasibility: Explore running an Ethereum archive node using Reth and Trueblocks, including the option of using dedicated hardware.
Cost Analysis: Determine the monthly operating cost for each option, both cloud-based (e.g., AWS) and local hardware setups.
Comparison: Compare Reth, Trueblocks, and other potential solutions, focusing on functionality, cost, and ease of integration.
Decision Making: Provide a go/no-go recommendation based on findings.
Next Steps: Outline subsequent stories and tasks if the decision is to proceed with running our own node.

Product & Design Links:

Tech Details:

Previous cost estimation of $500/month for AWS setup: Cost Estimate
Hardware specifications and requirements for local setup
Reth and Trueblocks installation and configuration details

Open Questions:

Can this setup support multiple blockchains, or is it limited to Ethereum?
How does this impact our current reliance on Alchemy, and will it reduce our associated costs?

Notes/Assumptions:

Assumption: We have access to the required hardware or cloud infrastructure for testing.
Note: The exploration should not exceed 3 days to keep costs and time investment reasonable.

erichfi · 2024-09-03T14:25:59Z

Let's see if we still need a centralized transaction data cache in case we run an Archive Node: #2829

tim-schultz · 2024-09-10T19:02:03Z

Ethereum Node Requirements

Mainnet Nodes

RETH Mainnet

Disk: 2.2TB+ (TLC NVMe recommended)
RAM: 8GB+
Note: NVMe disk crucial for I/O performance

GETH Mainnet

RAM: 16GB+ recommended
Disk: 12TB+ for full archive node

Layer 2 Solutions

Optimism / OP Stack

RAM: 16GB+
Disk: 2TB SSD (NVMe recommended)

Arbitrum

RAM: 16GB
CPU: 4 cores (single-core performance important)
Storage (as of April 2024):
- Arbitrum One: 560GB (pruned), growing ~200GB/month
- Arbitrum Nova: 400GB (pruned), growing ~1.6TB/month
Note: NVMe SSD recommended for both

Polygon

Component	Minimum	Recommended
CPU	4 cores	16 cores
RAM	32GB	64GB
Storage	2.5TB	5TB
Bandwidth	100 Mbps+	1 Gbps
AWS instance	c5.4xlarge	m5d.4xlarge

AWS Configurations for Polygon Nodes

Basic Node (single to double-digit RPC requests/s)

Compute: m7g.4xlarge (16 vCPU, 64GB RAM)
Storage: 7000 GB EBS GP3, 16000 IOPS, 1000 MBps throughput

High-Performance Node (hundreds of RPC requests/s)

Compute: im4gn.4xlarge (16 vCPU, 64GB RAM)
Storage: 7500 GB instance storage, 50 GB EBS gp3 root volume

Maximum Performance Node (up to 1000 RPC requests/s)

Compute: m7g.4xlarge (16 vCPU, 64GB RAM)
Storage: 7000 GB EBS IO2 with 16000 IOPS

Ethereum Node Testing and Deployment Summary

Local Testing

Hardware Specifications

RAM: 32GB
Storage: 4TB SSD
Network: ranged between 5 - 15 Mb/s upload and download speeds

Equipment

Base System: Amazon Link
Upgraded Disk: Amazon Link

Node Synchronization

Used RETH from Merkle.io snapshots
1.3TB file downloaded and extracted overnight (approximately 12 hours)
Node is keeping up with the network tip, including pending blocks

Performance Test

Downloaded all USDC transfer events to parquet files (Block 0 to present)
Command: cryo erc20_transfers --contract 0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48 --rpc localhost:8545
Duration: 1 hour and 10 minutes
Additional data types available: Cryo Datasets

Data Query Performance

Specific address transfer events retrieved in 6 seconds
Potential for joining data on a large number of addresses
Example query: Gist Link
Could also build a well indexed DB for pulled data

Running Additional Nodes

L2 nodes need to run in parallel with the mainnet node
Additional storage required for each L2 node
Lack of good L2 snapshot sources noted

ISP Requirements

Unlimited bandwidth is necessary

Pricing Considerations

Latitude (Suggested by RETH)

Model: rs4.metal.large
Price: $2.20/hr or $1606/month
Storage: $0.64/TB
Capability: Could theoretically run all currently used nodes
Note: Can be turned on/off as needed, but requires sync time

AWS

Model: m6gd.16xlarge
Price: $2.8928/hr or $2108/month
Additional EBS storage costs apply
Potential for optimization using powerful processor with cheaper EBS storage

Kammerdeiner

Pricing: TBD

Individual Network Pricing

If certain networks are a priority, pricing can be obtained for individual networks

Other Considerations

Time-consuming to ensure and maintain node functionality
Potential efficiency in bulk data pulling from current Alchemy account on a regular basis
Further discussion needed on feasibility of using these tools for analysis

Saving on Costs

Cryo/TrueBlocks are awesome tools for data extraction from RPC nodes
We could consider bulk pulling data from our current Alchemy account and running analysis on our data(postgres or parquet) if this would work for the data team

erichfi changed the title ~~Spike: Passport Eth Archive node~~ Evaluate Running a Passport ETH Archive Node Aug 31, 2024

tim-schultz self-assigned this Sep 9, 2024

Jkd-eth mentioned this issue Sep 12, 2024

Spike: Evaluate TrueBlocks / IPFS #2859

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate Running a Passport ETH Archive Node #2793

Evaluate Running a Passport ETH Archive Node #2793

Jkd-eth commented Aug 20, 2024 •

edited by erichfi

Loading

erichfi commented Sep 3, 2024

tim-schultz commented Sep 10, 2024

Evaluate Running a Passport ETH Archive Node #2793

Evaluate Running a Passport ETH Archive Node #2793

Comments

Jkd-eth commented Aug 20, 2024 • edited by erichfi Loading

User Story:

Acceptance Critiria

Product & Design Links:

Tech Details:

Open Questions:

Notes/Assumptions:

erichfi commented Sep 3, 2024

tim-schultz commented Sep 10, 2024

Ethereum Node Requirements

Mainnet Nodes

RETH Mainnet

GETH Mainnet

Layer 2 Solutions

Optimism / OP Stack

Arbitrum

Polygon

AWS Configurations for Polygon Nodes

Basic Node (single to double-digit RPC requests/s)

High-Performance Node (hundreds of RPC requests/s)

Maximum Performance Node (up to 1000 RPC requests/s)

Ethereum Node Testing and Deployment Summary

Local Testing

Hardware Specifications

Equipment

Node Synchronization

Performance Test

Data Query Performance

Running Additional Nodes

ISP Requirements

Pricing Considerations

Latitude (Suggested by RETH)

AWS

Kammerdeiner

Individual Network Pricing

Other Considerations

Saving on Costs

Jkd-eth commented Aug 20, 2024 •

edited by erichfi

Loading