Skip to content

Conversation

@fivertran-karunveluru
Copy link
Collaborator

ZeroHash Connector

Created: 2025-10-31

Business Owner: Crypto & Digital Assets Operations Team

Technical Owner: Data Engineering Team

Last Updated: 2025-10-31

Business Context

  • Data Source: ZeroHash API for cryptocurrency and digital asset infrastructure data
  • Business Criticality: Critical - supports digital asset custody, trading operations, and compliance reporting
  • Data Consumers: Crypto operations teams, finance teams, compliance officers, risk management, executive leadership
  • Business SLAs: Data must be fresh within 4 hours for trading operations, 24 hours for compliance reporting
  • Compliance Requirements: KYC/AML compliance for participants, financial reporting compliance, regulatory reporting for digital assets
  • Budget Constraints: ZeroHash API access included with subscription, rate limits based on plan tier

Technical Context

  • API Documentation: ZeroHash Developer Portal (https://api.cert.zerohash.com)
  • Authentication Method: HMAC-SHA256 signature-based authentication with API key and secret key
  • Rate Limits: Varies by ZeroHash plan, typically 1,000-10,000 requests/hour
  • Data Volume:
    • Participants: 100-10,000+ participants (customers and businesses)
    • Accounts: 200-50,000+ accounts per participant
    • Assets: 50-500+ supported cryptocurrency and digital assets
    • Transactions: Variable based on trading volume
  • Data Velocity: Account balances updated on trades/deposits, participant data updated on KYC changes, assets updated on new listings
  • Data Quality: Structured JSON with consistent schema, financial amounts stored as strings for precision, some fields may be null for incomplete records
  • Network Considerations: HTTPS only, RESTful API with standard reliability, US-based infrastructure

Operational Context

  • Deployment Environment: Development (certification), staging, and production environments
  • Monitoring Requirements: Alert on >2% error rate, >2 hour sync time, account balance discrepancies, authentication failures
  • Maintenance Windows: Weekends for non-critical updates, immediate deployment for trading-critical fixes
  • Team Structure: Data Engineering team, Crypto Operations, Compliance officers, Risk Management
  • Escalation Path: Data Engineer → Team Lead → Crypto Operations Director → CTO

API-Specific Details

  • Base Endpoint: https://api.cert.zerohash.com (certification), https://api.zerohash.com (production)
  • Authentication: HMAC-SHA256 signature in X-SCX-SIGNED header with timestamp in X-SCX-TIMESTAMP header and API key in x-pk header
  • Pagination: Not applicable - responses processed as complete datasets using memory-efficient streaming
  • Date Format: ISO 8601 (e.g., 2024-01-15T10:30:00Z)
  • Response Format: JSON with nested objects and arrays
  • Key Endpoints:
    • /participants - Participant information including customers and businesses
    • /accounts - Account balances and asset holdings for each participant
    • /assets - Supported cryptocurrency and digital asset definitions

Data Schema Overview

  • participants: Participant profile, type (CUSTOMER, BUSINESS, INDIVIDUAL), status (ACTIVE, PENDING, SUSPENDED), contact information
  • accounts: Account balances (total and available), asset holdings, participant associations, account status
  • assets: Cryptocurrency and digital asset definitions, symbol, name, type, decimals, minimum/maximum amounts, status

Data Replication Expectations

  • Initial Sync: Last 90 days of participant, account, and asset data for baseline
  • Incremental Sync: Data since last successful sync timestamp
  • Sync Frequency:
    • Production: Every 4 hours for account balances, daily for participants and assets
    • Development: Daily for all data types
  • Data Retention: 7 years of historical account and participant data for compliance requirements
  • Backfill Capability: Full historical data available based on ZeroHash retention policies
  • Data Consistency: Near real-time with 4-hour maximum lag for trading operations

Operational Requirements

  • Uptime SLA: 99.7% availability during business hours (trading operations critical)
  • Performance SLA:
    • Initial sync: <4 hours for 90 days of data
    • Incremental sync: <30 minutes for 4-hour updates
  • Error Handling:
    • Automatic retry with exponential backoff and jitter
    • Dead letter queue for failed account/participant records
    • Alert on consecutive sync failures during trading hours
    • Rate limit handling with Retry-After header respect
  • Monitoring:
    • API response times and error rates
    • Participant count trends and anomaly detection
    • Account balance completeness validation
    • Asset listing changes and updates
  • Security:
    • HMAC signatures generated per request with timestamp
    • API keys and secrets encrypted at rest
    • Access logs maintained for 7 years (compliance)
    • Participant PII handling per privacy regulations

Rate Limiting Strategy

  • Standard Plan: 1,000 requests/hour, 10,000 requests/day
  • Professional Plan: 5,000 requests/hour, 50,000 requests/day
  • Enterprise Plan: 10,000 requests/hour, 100,000 requests/day
  • Recommended: Implement exponential backoff with jitter for 429 responses
  • Error Handling: 429 status code indicates rate limit exceeded, respect Retry-After header
  • Monitoring: Track rate limit utilization and plan for subscription upgrades

Data Quality Considerations

  • Required Fields: id, participant_code (participants), account_id, participant_id, asset_symbol (accounts), symbol, name, decimals (assets)
  • Optional Fields: email, name, type, status, created_at, updated_at (participants), balance, available_balance, status (accounts), type, status, minimum_amount, maximum_amount (assets)
  • Data Validation:
    • Participant IDs must be unique UUIDs
    • Account IDs must be unique within participant
    • Asset symbols must be valid cryptocurrency codes
    • Balance amounts must be non-negative strings for precision
    • Decimal places must match asset specifications
  • Data Completeness:
    • Participants: 100% have basic identification data
    • Accounts: 95%+ have complete balance information
    • Assets: 100% have symbol, name, and decimal specifications
  • Duplicate Handling: Primary key constraints prevent duplicate participant and account records

Integration Points

  • Fivetran Destinations: Snowflake, BigQuery, Redshift, PostgreSQL
  • Downstream Systems:
    • Trading and execution platforms
    • Risk management systems
    • Compliance and reporting platforms
    • Financial reporting systems
    • Portfolio management systems
  • Data Dependencies: None - standalone cryptocurrency infrastructure data source
  • External Dependencies: ZeroHash API availability, cryptocurrency market trading hours

Disaster Recovery

  • Backup Strategy: Daily snapshots of all participant, account, and asset tables
  • Recovery Time Objective: 4 hours for full data recovery
  • Recovery Point Objective: 2 hours maximum data loss for trading-critical account data
  • Failover: Automatic failover to backup API credentials
  • Testing: Monthly disaster recovery drills with crypto operations team validation

Compliance & Security

  • Data Classification: Participant PII - highly sensitive, account balances - financial sensitive, asset definitions - public
  • Retention Policy: 7 years for account and transaction data (compliance), 3 years for operational participant data
  • Access Controls: Strict role-based access with principle of least privilege
  • Audit Trail: All data access logged and monitored for compliance audits
  • Encryption: Data encrypted in transit and at rest with enterprise-grade security
  • Privacy: GDPR compliance for EU participants, CCPA compliance for CA participants, KYC/AML compliance for all participants

Performance Optimization

  • Parallel Processing: Multiple API calls for different data types (participants, accounts, assets)
  • Caching: Asset definitions cached for 24 hours (rarely change)
  • Indexing: Participant ID, account ID, asset symbol, and date columns indexed
  • Partitioning: Account data partitioned by date and participant for efficient querying
  • Compression: Historical account data compressed for storage efficiency
  • Streaming: Memory-efficient generator-based processing prevents data accumulation

Troubleshooting Guide

  • Common Issues:
    • Rate limit exceeded: Reduce sync frequency or upgrade ZeroHash plan
    • HMAC signature invalid: Verify secret key and timestamp generation
    • Missing account data: Check participant ID and account active status
    • Timeout errors: Increase timeout values or reduce batch size
    • Balance precision issues: Verify string handling for decimal amounts
    • Asset listing discrepancies: Validate asset symbol mappings
  • Debug Mode: Enable detailed logging for participant and account data troubleshooting
  • Support Contacts:
    • Technical: Data Engineering team
    • Business: Crypto Operations team
    • Vendor: ZeroHash support (for API and account issues)
    • Compliance: Legal/Compliance team (for KYC/AML and regulatory issues)

Checklist

Some tips and links to help validate your PR:

  • Tested the connector with fivetran debug command.
  • Added/Updated example specific README.md file, refer here for template.
  • Followed Python Coding Standards, refer here
capture

@fivertran-karunveluru fivertran-karunveluru requested review from a team as code owners October 31, 2025 22:11
@fivertran-karunveluru fivertran-karunveluru added the hackathon For all the PRs related to the internal Fivetran 2025 Connector SDK Hackathon. label Oct 31, 2025
@github-actions github-actions bot added the size/XL PR size: extra large label Oct 31, 2025
@github-actions
Copy link

github-actions bot commented Oct 31, 2025

🧹 Python Code Quality Check

✅ No issues found in Python Files.

🔍 See how this check works

This comment is auto-updated with every commit.

Updated the README to improve clarity and formatting.
Copy link
Contributor

@fivetran-dejantucakov fivetran-dejantucakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hackathon For all the PRs related to the internal Fivetran 2025 Connector SDK Hackathon. size/XL PR size: extra large

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants