Skip to content

richatechiie/datascience-project

 
 

Repository files navigation

Learning Data Science

In this repository, I'll keep the code I write as I learn about Data Science.

I write about what I am learning here: https://medium.com/@gabrieltseng/

For all notebooks which require a GPU I use an AWS P2 instance.

I approached the projects in the following order (latest to earliest):

Table of Contents

A WGAN used to generate MNIST digits.

I experiment with two language models, one based of the weight dropped LSTM, and one based off temporal convolutional networks. Both are trained on the Wikitext 2 dataset.

I build a single and multi image object detector, which can label and locate objects in an image. Both models are trained on the Pascal VOC 2007 challenge dataset.

I work through the Stanford Databases by Jennifer Widom. The folder contains my solutions to the exercises. In the case where there were no exercises (i.e. there was only a quiz), I only added the statement of accomplishment.

I work through Think Bayes by Allen Downey. The folder contains my solutions to the exercises.

I took the exercises from both the book, and the GitHub repository

I build a tweet summarizer (COWTS), with the goal of providing a useful summary of tweets to a rescue team in a disaster scenario. This involves experimenting with Integer Linear Programming, term frequency - inverse document frequency scores and word graphs.

I train machine learning algorithms on a smaller dataset (~3000 datapoints) to recognize bullying in online discussions, as part of Kaggle's Detecting Insults in Social Commentary competition. By implementing word embeddings, I significantly improve the competition's best result.

I experiment with generative neural networks by building a style neural network, which takes as input two images, and outputs an image with the content of the first image and the style of the second image. I improve the original neural style network (A Neural Network of Artistic Style) by implementing two additional papers (Incorporating Long Range Consistency in CNN based Texture Generation and Stable and Controllable Neural Texture Synthesis and Style Transfer Using Histogram Losses).

I build a recurrant neural network based on the GloVe word embeddings to recognize the intent of questions posted on Quora as part of Kaggle's Quora Question Pairs competition.

In this project, I use the Movie Lens dataset to explore a variety of data science tools, including dimensionality reduction and word embeddings. I build a recommender system using a recurrant neural network, and implement Google's Wide and Deep recommender neural network.

In this project, I finetune and ensemble a variety of pretrained convolutional neural networks in Keras to identify invasive plant species in images, as part of Kaggle's Invasive Species Monitoring competition.

About

A collection of personal data science projects

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 97.5%
  • C++ 1.3%
  • Python 1.2%