-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement data handling for RP forum dumps #4
Labels
enhancement
New feature or request
Comments
You can assign this to me. I'll be working on it today. I'll follow-up in Matrix and put a summary here when that's done. |
Summary of the discussion:
|
lloorree
added a commit
to lloorree/data-toolbox
that referenced
this issue
Feb 10, 2023
lloorree
added a commit
to lloorree/data-toolbox
that referenced
this issue
Feb 12, 2023
PR #9 is for this. |
Very old issue, closing for now. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Summary
Scope of this task is to implement support for Enjin forum dumps in the
data-toolbox
.Source file formats
Source files are SQLite3 databases generated by the encuum utility. Please reach out to me in private (via email, Matrix, Discord, etc.) for example files if you're interested in tackling this implementation.
Implementation details
An
EnjinDataset
class should be implemented undertoolbox/datasets/enjim.py
, following the general format of the other datasets. Threads should map toEpisode
s, and posts within threads should map toTurn
s within theEpisode
.An
EnjinVDM
should then be implemented undertoolbox/modules/enjim_pdm.py
. Feel free to look at thelight_pdm.py
file in that same folder for an example.A lot of data processing will then need to take place. Off the top of my head:
...and maybe more. This will probably be the trickiest part. Feel free to reach out, we can discuss these points here or in the Matrix.
The text was updated successfully, but these errors were encountered: