Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pickled dataframe to support pandas >=2.0 #3

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

millsks
Copy link

@millsks millsks commented Aug 23, 2023

When trying to load the pickled dataframe using the latest version of pandas it will throw the following error because the internal pandas.core.indexes.numeric module had been deprecated when pandas 2.0 broke backwards compatibility with pandas 1.x.

>>> import pickle
>>> with open('2018-04-01.pkl', 'rb') as fd:
...   df = pickle.load(fd)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
ModuleNotFoundError: No module named 'pandas.core.indexes.numeric'

To fix this issue the pickled dataframes were rewritten using the pandas==2.0.3. This should give end users read capabilities for future versions of pandas.

>>> import pathlib
>>> pd.__version__
'2.0.3'
>>> for f in pathlib.Path('.').glob('*.pkl'):
...   pd.read_pickle(f).to_pickle(f)
...
>>> with open('2018-04-01.pkl', 'rb') as fd:
...   df = pickle.load(fd)
...
>>> df
      TRANSACTION_ID         TX_DATETIME  ...  TERMINAL_ID_NB_TX_30DAY_WINDOW  TERMINAL_ID_RISK_30DAY_WINDOW
0                  0 2018-04-01 00:00:31  ...                             0.0                            0.0
1                  1 2018-04-01 00:02:10  ...                             0.0                            0.0
2                  2 2018-04-01 00:07:56  ...                             0.0                            0.0
3                  3 2018-04-01 00:09:29  ...                             0.0                            0.0
4                  4 2018-04-01 00:10:34  ...                             0.0                            0.0
...              ...                 ...  ...                             ...                            ...
9483            9483 2018-04-01 23:56:50  ...                             0.0                            0.0
9484            9484 2018-04-01 23:58:14  ...                             0.0                            0.0
9485            9485 2018-04-01 23:58:31  ...                             0.0                            0.0
9486            9486 2018-04-01 23:59:28  ...                             0.0                            0.0
9487            9487 2018-04-01 23:59:51  ...                             0.0                            0.0

[9488 rows x 23 columns]

Resolves #2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate pickled dataframes to pandas >=2.0
1 participant