Skip to content

GCP BigQuery Pipeline: Data Discovery & Schema Analysis #50

@JGrubb

Description

@JGrubb

Overview

Connect to BigQuery billing exports and analyze data structure, granularity, and partitioning to inform pipeline design.

Tasks

  • Implement BigQuery client with service account authentication
  • Query and analyze billing export table schema
  • Examine data granularity (resource-level vs aggregated)
  • Analyze partition structure (_PARTITIONTIME usage)
  • Document findings for pipeline design decisions

Acceptance Criteria

  • Can successfully connect to GCP BigQuery billing exports
  • Clear understanding of data schema and volume
  • Documented analysis of partition strategy
  • Recommendations for incremental loading approach

Dependencies

Architecture

  • Create vendors/gcp/discovery.py for exploration
  • Document findings in vendors/gcp/README.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestgcpGoogle Cloud Platform related issuesresearchResearch and discovery tasks

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions