Welcome!
Welcome to Wharton Undergraduate Data Analytics Club. In this repository we host and compile resources for students in the hopes that this will aid in their learning process.
Getting Started
If you are a member of WUDAC, our private resources are hosted below. Please make sure to have a Github account, and sign up for the Student pack for free private repositories (2 years).
For the long haul
Get the tools
- Sublime Text 3 - Several addi-ins available
- VBScript (See VBA below)
- Anaconda IDE
- Amazon Web Services - AWS RBS
- AWS - Database Products Overview
- CouchDB
Penn Academics & Organizations
From our WUDAC Alum, James Wang: What classes should I take at UPenn if I want to become a data scientist?
- CIS545 - Big Data Analytics (Spring 2017)
Penn has several organizations and academic programs tailored towards Data Analytics and Data Science, our club is only one of many.
-
Wharton Customer Analytics Initiative - WCAI Home
- WCAI Workshops: Events
- WCAI Recommended Courses for Analytics
- NYU: MS in Data Science
- The Data Incubator
- Insight - Data Engineering Fellows Program
- Insight - Data Science Fellows Program
-
Udacity: Nanodegree courses
-
Curated list Machine Learning and NLP resources for healthcare
-
Bento - Learning Tracks
Machine Learning Glossary - Google Developers
- Data Viz - Roundup
A. OpenDataSoft - 2600+ OPEN DATA PORTALS AROUND THE WORLD
- A. Kaggle
- A. Springboard
- B. Healthcare
- A. Airbnb Listings
- B. Mode Analytics - 5 Public Datasets
- Incredible Repository: Everything you need to know to get the job
- More tailored towards Software Engineering, but worth checking out.
- Uncubed
- Velvet Jobs - Paid but can help find applications
- The 2018 Wealthfront Career Launching Companies List
A. Blogs / Sources
- Towards Data Science:Medium
B. Opinion Articles / Tips
- (Op) Covers Letters + Data Science = What You Need to Know
- (Op) Even for Data and Tech Jobs, a Cover Letter is the Best Way to Sell Your Human Skills
- To-do: Elevator pitch
- Cracking the PM Interview, Ch. 14-15
- Customer Lifetime Value: How to calculate CLV
- The Power of CLV: Managing Customer Lifetime Value at IBM
- Intro to A/B Testing
- Quora: Expected A/B testing D.S. interviews and how to prepare
- Statistical Confidence: How is it calculated?
- A/B Testing & SEO: Guide
- 19 A/B Testing Q&A's
- Statistical Significance vs. Validity
- Columbia Paper: P-values and Statistical Practice
Sample Q&A:
-
Explaining CI's and Significance: If the statistical test returns significant, then you conclude that the effect is unlikely to arise from random chance alone. If you reject something with 95% confidence, then in the case that there is no true effect, then a result like ours (or a result more extreme than ours) will happen in less than 5% of all possible samples.
-
Why is randomization important in experimental design? How would you answer the question, does attending local meetups cause Etsy sellers to gather more sales?
- Randomization is at the core of experimentation because it balances out these confounding variables. By assigning 50% of users to a control group and 50% of users to a treatment group, you can ensure that the rough level of seller commitment is on average balanced between the two groups, as is every single other possible confounding variable, measured or not.
-
What things might we need to be worried about if we have an experiment with 20 different metrics? What if we run 20 experiments simultaneously?
- The more metrics you are measuring, the more likely you are to get at least one false positive. Ways to attempt to correct for this include changing your confidence level (e.g. Bonferroni Correction) or doing family-wide tests before you dive in to the individual metrics (e.g. Fisher's Protected LSD). However, these are not used often in practice, and most people decide to just proceed with caution and be wary of spurious results.
-
Null Hypothesis: "Disputing a null hypothesis is a matter of running the experiment long enough to rule out an incidental outcome. This concept is also referred to as reaching statistical significance."
- Practice problems: Leetcode
- HackerRank
- Quora: Repository
- Jane Street: Probability & Markets Guide
- ChainerCV: a Library for Deep Learning in Computer Vision
- Tutorial: Road-traffic counting with Computer Vision
Deep Learning Reading Roadmap Generating text with deep learning
- Cornell CS 4740/5740 : Introduction to Natural Language Processing
-
Computer Science
Harvard: CS50 - Introduction to the intellectual enterprises of computer science and the art of programming.
-
Machine Learning
-
Recommender Systems
-
Deep Learning
-
Databases
- Natural Language Processing (NLP) / Advanced Text Mining
How to share your data science portfolio Github Project - Best Practices
- Stack Overflow - Types of Databases
- Code Academy: Learn the command line = NYT Configuration: Tutorial
- Awesome Python - A curated list of awesome Python frameworks, libraries, software and resources.
- Python Style Guide
- Practice Python
- Resources for Learner
-
Scikit learn - Main site
-
Data Cleaning / Wrangling / Manipulation
-
Other (ie C, AI, etc)
- Mode Analytics: Python Tutorial
-
Mode Analytics: Python Tutorial
-
FAQS / Troubleshooting: a. Dealing with dataframes/dictionaries (i) Dataframe from Dictionary with different lengths
- PCA in Python
- Other Guides: 2
- Introduction to Ensembling/Stacking in Python
- Kaggle's 2017 Data Science Bowl 2017: Full Preprocessing Tutorial
- Tensorflow
- TensorBoard API
- TensorBoard Github
- Simple Reinforcement Learning
- CIS 545 - Deep Learning with Tensorflow Recitation
Getting started: Installation, setup, learning from the basics
Check the WUDAC Dropbox for the main resources (WIP)
-
Using SQL in R: Database Strategy
- Tidyverse: Home
- Data Manipulation: dplyr package
-
Crash course: SQL Teaching (!)
-
MySQL Workbench: Download here
-
Quora: Learning
-
Database/SQL Interview Questions - ProgrammerInterview.com
- WCAI SQL Course (Availability varies per semester, register here)
- Khan Academy
Intro to SQL: Querying and Managing Data Google BigQuery Tutorial New SQL Script
- Tutorials Point SQL
- Mode Analytics: SQL Tutorial for Data Analysis
- W3Schools: SQL Tutorial
- Head First Labs: Practice Problems
- SQL Zoo: Tutorial
- SQL Course: Tutorial
- HackerRank: SQL Challenges
- SQL Indexing Advanced Tutorial
- LinkedIn Learning (Paywall): SQL
- DataWorld: Intro to SQL Functions and Groupby
Overview: SQL As Understood By SQLite
-
Data types
-
Functions
-
Databases
- Other
- LinkedIn Learning (Paywall): Android
- LinkedIn Learning (Paywall): AngularJS
- LinkedIn Learning (Paywall): Bootstrap
- Quora: Learning
- Stack Overflow: The Definitive C++ Book Guide and List
- LinkedIn Learning (Paywall):
- LinkedIn Learning (Paywall): Cassandra
-
Sublime Add-in: VBScript
-
LinkedIn Learning (Paywall):
- LinkedIn Learning (Paywall): Go
- Hadoop MapReduce Tutorial - Github Wiki
- Quora: Learning
- Cognitive Class.ai: Hadoop 101
- MapReduce: Tutorial - See NETS212
- Haskell Tutorial: Learn You a Haskell
- LinkedIn Learning (Paywall): Haskell
- Quora: Learning
- Dash: LEARN TO CODE AWESOME WEBSITES IN HTML, CSS, AND JAVASCRIPT
- W3Schools
- Mozilla Development Network (MDN - Github Repo
- LinkedIn Learning (Paywall):
- Khan Academy Tutorials
Intro to HTML/CSS: Making webpages HTML/JS: Making webpages interactive HTML/JS: Making webpages interactive with jQuery
- LinkedIn Learning (Paywall): iOS
- CIS 110 / 120 (...)
- Testing Framework - CompileJava
- LinkedIn Learning (Paywall): Java
- Udacity - Introduction to Javascript
- AirBnB - Javascript Style Guide
- Quora: What is the best way to learn JS?
- Book: JavaScript: The Good Parts
- Khan Academy Tutorials
Intro to JS: Drawing & Animation Advanced JS: Games & Visualizations Advanced JS: Natural Simulations HTML/JS: Making webpages interactive
- Asynchronous Javascript:
- Chat Interface: Socket.io
- Guide to Node.js - InfoWorld
- Art of Node - Intro Guide
- LinkedIn Learning (Paywall): Julia
- LinkedIn Learning (Paywall): Kotlin
- Quora:
- LinkedIn Learning (Paywall): Perl
- LinkedIn Learning (Paywall): PHP
- Quora: Learning
- LinkedIn Learning (Paywall): Scala
- Apache Spark
- Spark Examples
- LinkedIn Learning (Paywall): Apache Spark
- LinkedIn Learning (Paywall): SPSS
- LinkedIn Learning (Paywall): Swift
- LinkedIn Learning (Paywall): Tableau
- LinkedIn Learning (Paywall): Wordpress
- Research Areas - Data Science:
https://research.fb.com/category/data-science/ https://research.fb.com/blog/
- Publications:
- Engineering Blog:
https://code.facebook.com/posts/datascience/ https://code.facebook.com/posts/data/
- Research Areas - Data Science: