We present a recommendation system for Data Scientists that given a user cell of code will recommend what the next line of code should be.
The recommendation system is built of three main parts (that are thoroughly explained here):
- Data-set Builder : Collects the necessary data to build our system (see- data_gathering).
- Workflow-Stage Classifier : Classifies the code to the relevant Data Science workflow stage and provides context to the code (see- Classification).
- Recommendation Engine : Generates the next-line recommendation (see- Chatbot).
The system architecture scheme:
The entire flow of creating the system is explained in the Flow.ipynb notebook.
Required libraries can be installed using the requirements.txt file. Alternatively, you can create an environment using the environment.yml file.
Notice that in order to use the Dataset Builder you must have Kaggle credentials set up.
Follow instruction at: https://github.com/Kaggle/kaggle-api#api-credentials
You also need to configure your kaggle username and password in the data_gathering/consts.py file.
For the weak supervision process in Classification/Exploration_and_WeakSupervision.ipynb you must have snorkel v0.7 installed.
snorkel does not support pip install. Follow instructions at: https://github.com/HazyResearch/snorkel#installation