Skip to content
/ GDS Public

Guided Data Science - recommendation system for Data Scientists

Notifications You must be signed in to change notification settings

guryaniv/GDS

Repository files navigation


Guided Data Science

We present a recommendation system for Data Scientists that given a user cell of code will recommend what the next line of code should be.

The recommendation system is built of three main parts (that are thoroughly explained here):

  • Data-set Builder : Collects the necessary data to build our system (see- data_gathering).
    • Downloaded Datasets, notebooks and metadata are stored in the datasets directory.
    • The parsed tsv files that were used to train our models are stored in the Data directory.
  • Workflow-Stage Classifier : Classifies the code to the relevant Data Science workflow stage and provides context to the code (see- Classification).
  • Recommendation Engine : Generates the next-line recommendation (see- Chatbot).

The system architecture scheme:

The entire flow of creating the system is explained in the Flow.ipynb notebook.

Prerequisites:

Required libraries can be installed using the requirements.txt file. Alternatively, you can create an environment using the environment.yml file.
Notice that in order to use the Dataset Builder you must have Kaggle credentials set up.
Follow instruction at: https://github.com/Kaggle/kaggle-api#api-credentials
You also need to configure your kaggle username and password in the data_gathering/consts.py file.
For the weak supervision process in Classification/Exploration_and_WeakSupervision.ipynb you must have snorkel v0.7 installed. snorkel does not support pip install. Follow instructions at: https://github.com/HazyResearch/snorkel#installation

About

Guided Data Science - recommendation system for Data Scientists

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published