A tool for processing Anki spaced repetition review logs using Protocol Buffers and Parquet files.
This project provides utilities to:
- Convert Anki review logs from Protocol Buffer format to Parquet files
- Upload/download datasets to/from Hugging Face Hub
- Show example of how to use the dataset
The converted dataset is available on Hugging Face Hub: Anki Revlogs 10K
Required dependencies:
pandas
pyarrow
protobuf
huggingface_hub
tqdm
Use the build_parquet.py script to convert Protocol Buffer files to Parquet format:
python build_parquet.py
To download the processed dataset:
python download_from_hf.py
To analyze data for a specific user:
python process_dataset.py
- Raw review logs are read from .revlog files (Protocol Buffer format)
- Data is processed and transformed using pandas DataFrames
- Results are saved as Parquet files partitioned by user_id
- Upload to Hugging Face Hub for sharing
.
├── README.md
├── build_parquet.py # Main conversion script
├── stats.proto # Protocol Buffer definition
├── download_from_hf.py # Dataset download utility
├── upload_to_hf.py # Dataset upload utility
└── process_dataset.py # Individual data processing
GNU AGPL, version 3 or later