Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS CUR Parquet conversion fails: 'bill_invoicing_entity' not found in data #358

Open
saurav955 opened this issue Aug 7, 2024 · 1 comment

Comments

@saurav955
Copy link

Describe the bug
After downloading AWS CUR Parquet file and running the converter, it fails saying 'bill_invoicing_entity' not found in data.

To Reproduce
Steps to reproduce the behavior:
Download parquet file from aws

Run:
focus-converter convert --provider aws --data-path /opt/aws-plaform-00005.snappy.parquet --data-format parquet --parquet-data-format dataset --export-path /tmp/test/

Error:
ValueError: Column(s) 'bill_invoicing_entity' not found in data

Expected behavior
The converter runs as expected.

Additional context
OS : Debian 12
focus-converter version : 1.0

Well the error seems self explanatory, but we are not sure

  1. How we can add this field in Cost and Usage report?
  2. Can we convert the file even without this field in the report?
@saurav955
Copy link
Author

I converted the parquet file to a csv using pandas and tried adding bill_invoicing_entity.
There was also a similar field named bill_billing_entity and I even tried replacing it with bill_invoicing_entity (thinking bill_invoicing_entity has changed to bill_billing_entity).

I tried to run focus-converter with both csv files using the following command.

Command:
focus-converter convert --provider aws --data-path sample.csv --data-format csv --export-path /tmp/test/

With this I was able to bypass the column not found error But now i am getting a conversion error.

ComputeError: expected duration or datetime, got str

Error originated just after this operation:
 WITH_COLUMNS:
 [col("line_item_currency_code").alias("BillingCurrency")]
   WITH_COLUMNS:
   [col("bill_payer_account_id").alias("BillingAccountId")]
     WITH_COLUMNS:
     [col("line_item_availability_zone").alias("AvailabilityZone")]
       WITH_COLUMNS:
       [col("line_item_resource_id").str.split([String(:)]).list.get([6]).str.titlecase().alias("ResourceName")]
         WITH_COLUMNS:
         [col("line_item_resource_id").str.split([String(:)]).list.get([6]).alias("tmp_resource_id_ResourceType")]
           WITH_COLUMNS:
           [col("line_item_resource_id").str.split([String(:)]).list.get([5]).alias("tmp_resource_type_ResourceType")]
             WITH_COLUMNS:
             [null.alias("CommitmentDiscountName").strict_cast(String)]
               WITH_COLUMNS:
               [null.alias("SubAccountName").strict_cast(String)]
                 WITH_COLUMNS:
                 [null.alias("BillingAccountName").strict_cast(String)]
                   WITH_COLUMNS:
                   [col("product_purchase_option").cast(String)]
                     WITH_COLUMNS:
                     [col("line_item_resource_id").cast(String)]
                       WITH_COLUMNS:
                       [col("pricing_public_on_demand_rate").cast(Float64)]
                         WITH_COLUMNS:
                         [col("reservation_unused_recurring_fee").cast(Float64)]
                           WITH_COLUMNS:
                           [col("reservation_unused_amortized_upfront_fee_for_billing_period").cast(Float64)]
                             WITH_COLUMNS:
                             [col("savings_plan_total_commitment_to_date").cast(Float64)]
                               WITH_COLUMNS:
                               [col("savings_plan_used_commitment").cast(Float64)]
                                 WITH_COLUMNS:
                                 [col("savings_plan_used_commitment").cast(Float64)]
                                   WITH_COLUMNS:
                                   [col("line_item_unblended_cost").cast(Float64)]
                                     WITH_COLUMNS:
                                     [col("product_region").alias("product_region")]
                                       WITH_COLUMNS:
                                       [null.strict_cast(String).alias("savings_plan_savings_plan_arn")]
                                         WITH_COLUMNS:
                                         [null.strict_cast(String).alias("reservation_reservation_arn")]
                                           WITH_COLUMNS:
                                           [col("reservation_reservation_a_r_n").alias("reservation_reservation_a_r_n")]
                                             WITH_COLUMNS:
                                             [null.strict_cast(Float64).alias("line_item_net_unblended_cost")]
                                              DF ["", "identity_line_item_id", "identity_time_interval", "bill_invoice_id"]; PROJECT */273 COLUMNS; SELECTION: "None"

LogicalPlan had already failed with the above error; after failure, 30 additional operations were attempted on the LazyFrame

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant