You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am storing timeseries dataframes (index=datatimeindex, multiple columns of data).
I add a column "year" with the df.index.year.
I write to the collection with collection.write(item_name, df, overwrite=True, partition_on=["year"]).
When I read it back, I use item = collection.item(item_name, filters=[("year", "==", year)]) and I would like to avoid reading (for performance) the "year" column (as it is only used for partitioning). I can read the columns in item.data.columns and remove from this Index the "year". But then, in the item.to_pandas(), I cannot specify the columns to read from.
Any way to do what I want to do properly ?
The text was updated successfully, but these errors were encountered:
I am storing timeseries dataframes (index=datatimeindex, multiple columns of data).
I add a column "year" with the df.index.year.
I write to the collection with
collection.write(item_name, df, overwrite=True, partition_on=["year"])
.When I read it back, I use
item = collection.item(item_name, filters=[("year", "==", year)])
and I would like to avoid reading (for performance) the "year" column (as it is only used for partitioning). I can read the columns initem.data.columns
and remove from this Index the "year". But then, in theitem.to_pandas()
, I cannot specify the columns to read from.Any way to do what I want to do properly ?
The text was updated successfully, but these errors were encountered: