datacamp_courses.csv

,title,hours,videos,exercises,participants,xp,clean,descriptions,tracks,instructors,datasets,prereqs,link,topic,tech
0,,4,0,62,"1,317,429","6,200",,,[],[],[],[],https://www.datacamp.com/courses/free-introduction-to-r,,
1,,4,11,57,"1,656,629","4,700",,,[],[],[],[],https://www.datacamp.com/courses/intro-to-python-for-data-science,,
2,,4,0,35,"31,090","3,500",,,[],[],[],[],https://www.datacamp.com/courses/importing-cleaning-data-in-r-case-studies,,
3,,4,0,41,"361,803","3,450",,,[],[],[],[],https://www.datacamp.com/courses/intro-to-sql-for-data-science,,
4,A/B Testing in R,4,16,60,"3,203","4,700",A/B Testing in R,"A/B Testing in R
In this course, you will learn the foundations of A/B testing, including hypothesis testing, experimental design, and confounding variables. You will also be exposed to a couple more advanced topics, sequential analysis and multivariate testing. The first dataset will be a generated example of a cat adoption website. You will investigate if changing the homepage image affects conversion rates (the percentage of people who click a specific button). For the remainder of the course you will use another generated dataset of a hypothetical data visualization website.
Short case study on building and analyzing an A/B experiment.
In this chapter we'll continue with our case study, now moving to our statistical analysis. We'll also discuss how to do follow-up experiment planning.
In this chapter we'll dive deeper into the core concepts of A/B testing. This will include discussing A/B testing research questions, assumptions and types of A/B testing, as well as what confounding variables and side effects are.
In the final chapter we'll go over more types of statistical tests and power analyses for different A/B testing designs. We'll also introduce the concepts of stopping rules, sequential analysis, and multivariate analysis.",[],"['Page Piccinini', 'Chester Ismay', 'David Campos', 'Shon Inouye']","[('Click dataset', 'https://assets.datacamp.com/production/repositories/2292/datasets/4407050e9b8216249a6d5ff22fd67fd4c44e7301/click_data.csv'), ('Experiment dataset', 'https://assets.datacamp.com/production/repositories/2292/datasets/52b52cb1ca28ce10f9a09689325c4d94d889a6da/experiment_data.csv'), ('Data Visualization Website - April 2018', 'https://assets.datacamp.com/production/repositories/2292/datasets/b502094e5de478105cccea959d4f915a7c0afe35/data_viz_website_2018_04.csv')]","['Intermediate R', 'Foundations of Inference', 'Experimental Design in R']",https://www.datacamp.com/courses/ab-testing-in-r,Probability & Statistics,R
5,ARIMA Modeling with R,4,13,45,"16,735","3,600",ARIMA Modeling R,"ARIMA Modeling with R
In this course, you will become an expert in fitting ARIMA models to time series data using R.  First, you will explore the nature of time series data using the tools in the R stats package.  Next, you learn how to fit various ARMA models to simulated data (where you will know the correct model) using the R package astsa.  Once you have mastered the basics, you will learn how to fit integrated ARMA models, or ARIMA models to various real data sets. You will learn how to check the validity of an ARIMA model and you will learn how to forecast time series data.  Finally, you will learn how to fit ARIMA models to seasonal data, including forecasting using the astsa package.
You will  investigate the nature of time series data and learn the basics of ARMA models that can explain the behavior of such data.  You will learn the basic R commands needed to help set up raw time series data to a form that can be analyzed using ARMA models.
You will discover the wonderful world of ARMA models and how to fit these models to time series data. You will learn how to identify a model, how to choose the correct model, and how to verify a model once you fit it to data.  You will learn how to use R time series commands from the stats and astsa packages.
Now that you know how to fit ARMA models to stationary time series, you will learn about integrated ARMA (ARIMA) models for nonstationary time series.  You will fit the models to real data using R time series commands from the stats and astsa packages.
You will learn how to fit and forecast seasonal time series data using seasonal ARIMA models. This is accomplished using what you learned in the previous chapters and by learning how to extend the R time series commands available in the stats and astsa packages.","['Quantitative Analyst with R', 'Time Series with R']","['David Stoffer', 'Lore Dirick', 'Matt Isaacs']",[],"['Introduction to R', 'Intermediate R', 'Introduction to Time Series Analysis']",https://www.datacamp.com/courses/arima-modeling-with-r,Probability & Statistics,R
6,Advanced Deep Learning with Keras in Python,4,13,46,"6,620","3,950",Advanced Deep Learning Keras,"Advanced Deep Learning with Keras in Python
This course shows you how to solve a variety of problems using the versatile Keras functional API. You will start with simple, multi-layer dense networks (also known as multi-layer perceptrons), and continue on to more complicated architectures. The course will cover how to build models with multiple inputs and a single output, as well as how to share weights between layers in a model. We will also cover advanced topics such as category embeddings and multiple-output networks. If you've ever wanted to train a network that does both classification and regression, then this course is for you!
In this chapter, you'll become familiar with the basics of the Keras functional API.  You'll build a simple functional network using functional building blocks, fit it to data, and make predictions.
In this chapter, you will build two-input networks that use categorical embeddings to represent high-cardinality data, shared layers to specify re-usable building blocks, and merge layers to join multiple inputs to a single output. By the end of this chapter, you will have the foundational building blocks for designing neural networks with complex data flows.
In this chapter, you will extend your 2-input model to 3 inputs, and learn how to use Keras' summary and plot functions to understand the parameters and topology of your neural networks. By the end of the chapter, you will understand how to extend a 2-input model to 3 inputs and beyond.
In this chapter, you will build neural networks with multiple outputs, which can be used to solve regression problems with multiple targets.  You will also build a model that solves a regression problem and a classification problem simultaneously.",[],"['Zachary Deane-Mayer', 'Sumedh Panchadhar']","[('Basketball data', 'https://assets.datacamp.com/production/repositories/2189/datasets/78cfc4f848041e10a64e72a9cd4f0a8e6f80ab21/basketball_data.zip'), ('Basketball models', 'https://assets.datacamp.com/production/repositories/2189/datasets/87408a711961f0d640f7c31faa9cfbf8248e6a23/basketball_models.zip')]",['Deep Learning in Python'],https://www.datacamp.com/courses/advanced-deep-learning-with-keras-in-python,Machine Learning,Python
7,Advanced Dimensionality Reduction in R,4,16,51,846,"4,300",Advanced Dimensionality Reduction in R,"Advanced Dimensionality Reduction in R
Dimensionality reduction techniques are based on unsupervised machine learning algorithms and their application offers several advantages. In this course you will learn how to apply dimensionality reduction techniques to exploit these advantages, using interesting datasets like the MNIST database of handwritten digits, the fashion version of MNIST released by Zalando, and a credit card fraud detection dataset. Firstly, you will have a look at t-SNE, an algorithm that performs non-linear dimensionality reduction. Then, you will also explore some useful characteristics of dimensionality reduction to apply in predictive models. Finally, you will see the application of GLRM to compress big data (with numerical and categorical values) and impute missing values. Are you ready to start compressing high dimensional data?
Are you ready to become a master of dimensionality reduction?
In this chapter, you'll start by understanding how to represent handwritten digits using the MNIST dataset.  You will learn what a distance metric is and which ones are the most common, along with the problems that arise with the curse of dimensionality.
Finally, you will compare the application of PCA and t-SNE .
Now, you will learn how to apply the t-Distributed Stochastic Neighbour Embedding (t-SNE) algorithm. After finishing this chapter, you will understand the different hyperparameters that have an impact on your results and how to optimize them. Finally,  you will do something really cool: compute centroids prototypes of each digit to classify other digits.
In this chapter, you'll apply t-SNE to train predictive models faster. This is one of the many advantages of dimensionality reduction. You will learn how to train a random forest with the original features and with the embedded features and compare them. You will also apply t-SNE to understand the patterns learned by a neural network. And all of this using a real credit card fraud dataset!
In the final chapter, you will practice another useful dimensionality reduction algorithm: GLRM. Here you will make use of the Fashion MNIST data to classify clothes, impute missing data and also train random forests using the low dimensional embedding.",['Unsupervised Machine Learning with R'],"['Federico Castanedo', 'Chester Ismay', 'Sara Billen']","[('MNIST sample', 'https://assets.datacamp.com/production/repositories/1680/datasets/68b37d6c5f7f6768d5e11796687993b6f3da1f72/mnist-sample-200.RData'), ('Credit card fraud', 'https://assets.datacamp.com/production/repositories/1680/datasets/5b6c593225dc1f417f82822cb5fce83887890f4a/creditcard.RData'), ('Fashion MNIST sample', 'https://assets.datacamp.com/production/repositories/1680/datasets/8d19bc657cc5b03e9b368eb6d3ff0527c50d184d/fashion_mnist_500.RData')]",['Dimensionality Reduction in R'],https://www.datacamp.com/courses/advanced-dimensionality-reduction-in-r,Machine Learning,R
8,Advanced NLP with spaCy,5,15,55,"4,517","4,450",Advanced NLP spaCy,"Advanced NLP with spaCy
If you're working with a lot of text, you'll eventually want to know more about it. For example, what's it about? What do the words mean in context? Who is doing what to whom? What companies and products are mentioned? Which texts are similar to each other? In this course, you'll learn how to use spaCy, a fast-growing industry standard library for NLP in Python, to build advanced natural language understanding systems, using both rule-based and machine learning approaches.
This chapter will introduce you to the basics of text processing with spaCy. You'll learn about the data structures, how to work with statistical models, and how to use them to predict linguistic features in your text.
In this chapter, you'll use your new skills to extract specific information from large volumes of text. You'll learn how to make the most of spaCy's data structures, and how to effectively combine statistical and rule-based approaches for text analysis.
This chapter will show you to everything you need to know about spaCy's processing pipeline. You'll learn what goes on under the hood when you process a text, how to write your own components and add them to the pipeline, and how to use custom attributes to add your own meta data to the documents, spans and tokens.
In this chapter, you'll learn how to update spaCy's statistical models to customize them for your use case – for example, to predict a new entity type in online comments. You'll write your own training loop from scratch, and understand the basics of how training works, along with tips and tricks that can make your custom NLP projects more successful.",[],"['Ines Montani', 'Mari Nazary', 'Adrián Soto']",[],['Natural Language Processing Fundamentals in Python'],https://www.datacamp.com/courses/advanced-nlp-with-spacy,Data Manipulation,Python
9,Analyzing Business Data in SQL,4,15,46,"4,241","3,700",Analyzing Business Data in SQL,"Analyzing Business Data in SQL
Businesses track data on everything, from operations to marketing to HR. Leveraging this data enables businesses to better understand themselves and their customers, leading to higher profits and better performance. In this course, you’ll learn about the key metrics that businesses use to measure performance. You'll write SQL queries to calculate these metrics and produce report-ready results. Throughout the course, you'll study data from a fictional food delivery startup, modeled on data from real companies.
Profit is one of the first things people use to assess a company's success. In this chapter, you'll learn how to calculate revenue and cost, and then combine the two calculations using Common Table Expressions to calculate profit.
Financial KPIs like profit are important, but they don't speak to user activity and engagement. In this chapter, you'll learn how to calculate the registrations and active users KPIs, and use window functions to calculate the user growth and retention rates.
Since a KPI is a single number, it can't describe how data is distributed. In this chapter, you'll learn about unit economics, histograms, bucketing, and percentiles, which you can use to spot the variance in user behaviors.
Executives often use the KPIs you've calculated in the three previous chapters to guide business decisions. In this chapter, you'll package the KPIs you've created into a readable report you can present to managers and executives.",[],"['Michel Semaan', 'Mona Khalil', 'Sara Billen']",[],['Intermediate SQL'],https://www.datacamp.com/courses/analyzing-business-data-in-sql,Case Studies,SQL
10,Analyzing Election and Polling Data in R,4,15,55,"3,189","4,650",Analyzing Election and Polling Data in R,"Analyzing Election and Polling Data in R
This is an introductory course to the R programming language as applied in the context of political data analysis. In this course students learn how to wrangle, visualize, and model data with R by applying data science techniques to real-world political data such as public opinion polling and election results. The tools that you'll use in this course, from the dplyr, ggplot2, and choroplethr packages, among others, are staples of data science and can be used to analyze almost any dataset you get your hands on. Students will learn how to mutate columns and filter datasets, graph points and lines on charts, make maps, and create models to understand relationships between variables and predict the future. This course is suitable for anyone who already has downloaded R and knows the basics, like how to install packages.
Chapter one uses a dataset of job approval polling for US presidents since Harry Truman to introduce you to data wrangling and visualization in the tidyverse.
In this chapter, you will embark on a historical analysis of ""generic ballot"" US House polling and use data visualization and modeling to answer two big questions: Has the country changed over time? Can polls predict elections?
This chapter teaches you how to make maps and understand linear regression in R. With election results from the United States and the United Kingdom, you'll also learn how to use regression models to analyze the relationship between two (or more!) variables.
In this ensemble of applied statistics and data analysis, you will wrangle, visualize, and model polling and prediction data for two sets of very important US elections: the 2018 House midterms and 2020 presidential election.",[],"['G Elliott Morris', 'Chester Ismay', 'David Campos', 'Shon Inouye']","[('Brexit Polls', 'https://assets.datacamp.com/production/course_6778/datasets/brexit_polls.csv'), ('Brexit Results', 'https://assets.datacamp.com/production/course_6778/datasets/brexit_results.csv'), ('Gallup Approval Polls', 'https://assets.datacamp.com/production/course_6778/datasets/gallup_approval_polls.csv'), ('Generic Ballot', 'https://assets.datacamp.com/production/course_6778/datasets/generic_ballot.csv'), ('US Pres 2016 by County', 'https://assets.datacamp.com/production/course_6778/datasets/us_pres_2016_by_county.csv')]",['Introduction to the Tidyverse'],https://www.datacamp.com/courses/analyzing-election-and-polling-data-in-r,Case Studies,R
11,Analyzing IoT Data in Python,4,16,53,843,"4,250",Analyzing IoT Data,"Analyzing IoT Data in Python
Have you ever heard about Internet of Things devices? Of course, you have. Maybe you also have a Raspberry PI in your house monitoring the temperature and humidity.
  IoT devices are everywhere around us, collecting data about our environment.
  You will be analyzing Environmental data, Traffic data as well as energy counter data.
  Following the course, you will learn how to collect and store data from a data stream.
  You will prepare IoT data for analysis, analyze and visualize IoT data, before  implementing
  a simple machine learning model to take action when certain events occur and deploy this model
  to a real-time data stream.
In this chapter, you will first understand what IoT data is.
Then, you learn how to aquire IoT data through a REST API and using an MQTT data stream to collect data in real time.
In the second chapter, you will look at the  data  you gathered during the first chapter.  You will visualize the data and learn the importance of timestamps when dealing with data streams. You will also implement caching to an MQTT data stream.
In this chapter, you will combine multiple datasoures with different time intervals.
You will then analyze the data to detect correlations, outliers and trends.
In this final chapter, you will use the data you analyzed during the previous chapters to build a machine learning pipeline. You will then learn how to implement this pipeline into a data stream to make realtime predictions.",[],"['Matthias Voppichler', 'Hadrien Lacroix', 'Hillary Green-Lerman']","[('Environment', 'https://assets.datacamp.com/production/repositories/4296/datasets/b7afe8572732cb5c8e6687c7e0c51216983a2b12/environ_MS83200MS_nowind_3m-10min.json'), ('Traffic (heavy vehicles)', 'https://assets.datacamp.com/production/repositories/4296/datasets/2ab899af51dac636796c5b3fce2f27f69010a6d6/traffic_raw_siemens_heavy-veh.json'), ('Traffic (light vehicles)', 'https://assets.datacamp.com/production/repositories/4296/datasets/d4eae90c370fa6912c035323737c9fc276792514/traffic_raw_siemens_light-veh.json')]",['pandas Foundations'],https://www.datacamp.com/courses/analyzing-iot-data-in-python,Data Manipulation,Python
12,Analyzing Marketing Campaigns with pandas,4,14,53,"1,878","4,500",Analyzing Marketing Campaigns pandas,"Analyzing Marketing Campaigns with pandas
One of the biggest challenges when studying data science technical skills is understanding how those skills and concepts translate into real jobs. Whether you're looking to level up in your marketing job by incorporating Python and pandas or you're trying to get a handle on what kinds of work a data scientist in a marketing organization might do, this course is a great match for you. We'll practice translating common business questions into measurable outcomes, including ""How did this campaign perform?"", ""Which channel is referring the most subscribers?"", ""Why is a particular channel underperforming?"" and more using a fake marketing dataset based on the data of an online subscription business. This course will build on Python and pandas fundamentals, such as merging/slicing datasets, groupby(), correcting data types and visualizing results using matplotlib.
In this chapter, you will review pandas basics including importing datasets, exploratory analysis, and basic plotting.
In this chapter, you will learn about common marketing metrics and how to calculate them using pandas. You will also visualize your results and practice user segmentation.
In this chapter, you will build functions to automate common marketing analysis and determine why certain marketing channels saw lower than usual conversion rates during late January.
In this chapter, you will analyze an A/B test and learn about the importance of segmentation when interpreting the results of a test.",[],"['Jill Rosok', 'Mona Khalil', 'Sumedh Panchadhar']","[('Marketing dataset 1', 'https://assets.datacamp.com/production/repositories/3879/datasets/bdbbd97f839ef5cafebcc15363201d0e7b08881a/marketing.csv'), ('Marketing dataset 2', 'https://assets.datacamp.com/production/repositories/3879/datasets/6d86195bbc39785128d437e26a14ffb7cf68f9dc/marketing_new.csv')]","['Intermediate Python for Data Science', 'pandas Foundations']",https://www.datacamp.com/courses/analyzing-marketing-campaigns-with-pandas,Case Studies,Python
13,Analyzing Police Activity with pandas,4,16,50,"11,246","4,100",Analyzing Police Activity pandas,"Analyzing Police Activity with pandas
Now that you have learned the foundations of pandas, this course will give you the chance to apply that knowledge by answering interesting questions about a real dataset! You will explore the Stanford Open Policing Project dataset and analyze the impact of gender on police behavior. During the course, you will gain more practice cleaning messy data, creating visualizations, combining and reshaping datasets, and manipulating time series data. Analyzing Police Activity with pandas will give you valuable experience analyzing a dataset from start to finish, preparing you for your data science career!
Before beginning your analysis, it is critical that you first examine and clean the dataset, to make working with it a more efficient process. In this chapter, you will practice fixing data types, handling missing values, and dropping columns and rows while learning about the Stanford Open Policing Project dataset.
Does the gender of a driver have an impact on police behavior during a traffic stop? In this chapter, you will explore that question while practicing filtering, grouping, method chaining, Boolean math, string methods, and more!
Are you more likely to get arrested at a certain time of day? Are drug-related stops on the rise? In this chapter, you will answer these and other questions by analyzing the dataset visually, since plots can help you to understand trends in a way that examining the raw data cannot.
In this chapter, you will use a second dataset to explore the impact of weather conditions on police behavior during traffic stops. You will practice merging and reshaping datasets, assessing whether a data source is trustworthy, working with categorical data, and other advanced skills.","['Data Analyst with Python', 'Data Manipulation with Python', 'Data Scientist with Python']","['Kevin Markham', 'Becca Robins', 'Sara Snell']","[('Traffic stops in Rhode Island', 'https://assets.datacamp.com/production/repositories/1497/datasets/62bd9feef451860db02d26553613a299721882e8/police.csv'), ('Weather in Providence, Rhode Island', 'https://assets.datacamp.com/production/repositories/1497/datasets/02f3fb2d4416d3f6626e1117688e0386784e8e55/weather.csv')]","['pandas Foundations', 'Manipulating DataFrames with pandas', 'Merging DataFrames with pandas']",https://www.datacamp.com/courses/analyzing-police-activity-with-pandas,Data Manipulation,Python
14,Analyzing Social Media Data in Python,4,14,51,"5,121","4,000",Analyzing Social Media Data,"Analyzing Social Media Data in Python
Twitter produces hundreds of million messages per day, with people around the world discussing sports, politics, business, and entertainment. You can access thousands of messages flowing in this stream in a matter of minutes. In this course, you will learn how to collect Twitter data and analyze tweet text, Twitter networks, and the geographical origin of the tweet. We'll be doing this with datasets on tech companies, data science hashtags, and the 2018 State of the Union address. Using these methods, you will be able to inform business and political decision-making by discovering the prevalence of important topics, the diversity of discussion networks, and a topic's geographical reach.
Why analyze Twitter, how to access Twitter APIs, and understanding Twitter JSON.
How to process Twitter text.
Network analysis with Twitter data.
How to map Twitter data.",[],"['Alex Hanna', 'Greg Wilson', 'Kara Woo', 'David Campos', 'Shon Inouye', 'Eunkyung Park']","[('Data Science Hashtag dataset', 'https://assets.datacamp.com/production/repositories/2161/datasets/43d85d27d293323c1a4effec25717d0c2eb43169/data-science-hashtags.csv'), ('State of the Union Reply Network dataset', 'https://assets.datacamp.com/production/repositories/2161/datasets/55860c218310485e9400997ae33aecd0e97f8b51/sotu2018-reply.csv'), ('State of the Union Retweet Networking dataset', 'https://assets.datacamp.com/production/repositories/2161/datasets/51e79668580cdb86969c2c625172eaed2ded684a/sotu2018-rt.csv')]",['pandas Foundations'],https://www.datacamp.com/courses/analyzing-social-media-data-in-python,Data Manipulation,Python
15,Analyzing Survey Data in R,4,14,49,"3,211","3,950",Analyzing Survey Data in R,"Analyzing Survey Data in R
You've taken a survey (or 1000) before, right? Have you ever wondered what goes into designing a survey and how survey responses are turned into actionable insights? Of course you have! In Analyzing Survey Data in R, you will work with surveys from A to Z, starting with common survey design structures, such as clustering and stratification, and will continue through to visualizing and analyzing survey results. You will model survey data from the National Health and Nutrition Examination Survey using R's survey and tidyverse packages. Following the course, you will be able to successfully interpret survey results and finally find the answers to life's burning questions!
Our exploration of survey data will begin with survey weights.  In this chapter, we will learn what survey weights are and why they are so important in survey data analysis.  Another unique feature of survey data are how they were collected via clustering and stratification.  We'll practice specifying and exploring these sampling features for several survey datasets.
Now that we have a handle of survey weights, we will practice incorporating those weights into our analysis of categorical data in this chapter.  We'll conduct descriptive inference by calculating summary statistics, building summary tables, and constructing bar graphs.  For analytic inference, we will learn to run chi-squared tests.
Of course not all survey data are categorical and so in this chapter, we will explore analyzing quantitative survey data.  We will learn to compute survey-weighted statistics, such as the mean and quantiles. For data visualization, we'll construct bar-graphs, histograms and density plots.  We will close out the chapter by conducting analytic inference with survey-weighted t-tests.
To model survey data also requires careful consideration of how the data were collected.  We will start our modeling chapter by learning how to incorporate survey weights into scatter plots through aesthetics such as size, color, and transparency.  We'll model the survey data with linear regression and will explore how to incorporate categorical predictors and polynomial terms into our models.",[],"['Kelly McConville', 'Chester Ismay', 'Becca Robins', 'Eunkyung Park']","[('Quarter 4 of the 2016 BLS Consumer Expenditure Survey', 'https://assets.datacamp.com/production/repositories/1932/datasets/54e81635756ae4b5a0207b661586c108e6dc5566/ce.csv')]","['Introduction to the Tidyverse', 'Foundations of Inference']",https://www.datacamp.com/courses/analyzing-survey-data-in-r,Probability & Statistics,R
16,Analyzing US Census Data in Python,5,16,57,"1,179","4,850",Analyzing US Census Data,"Analyzing US Census Data in Python
Data scientists in diverse fields, from marketing to public health to civic hacking, need to work with demographic and socioeconomic data. Government census agencies offer richly detailed, high-quality datasets, but the number of variables and intricacies of administrative geographies (what is a Census tract anyway?) can make approaching this goldmine a daunting process. This course will introduce you to the Decennial Census and the annual American Community Survey, and show you where to find data on household income, commuting, race, family structure, and other topics that may interest you. You will use Python to request this data using the Census API for large and small geographies. You will manipulate the data using pandas, and create derived data such as a measure of segregation. You will also get a taste of the mapping capabilities of geopandas.
Start exploring Census data products with the Decennial Census. Use the Census API and the requests package to retrieve data, load into pandas data frames, and conduct exploratory visualization in seaborn. Learn about important Census geographies, including states, counties, and tracts.
Explore topics such as health insurance coverage and gentrification using the American Community Survey. Calculate Margins of Error and explore change over time. Create choropleth maps using geopandas.
Explore racial segregation in America. Calculate the Index of Dissimilarity, and important measure of segregation. Learn about and use Metropolitan Statistical Areas, and important geography for urban research. Study segregation changes over time in Chicago.
In this chapter, you will apply what you have learned to four topical studies. Explore unemployment by race and ethnicity; commuting patterns and worker density; immigration and state-to-state population flows; and rent burden in San Francisco.",[],"['Lee Hachadoorian', 'Mari Nazary', 'Adrián Soto']","[('Hispanic Origin & Race by State, 2010', 'https://assets.datacamp.com/production/repositories/2155/datasets/68d8a7bcd8383e4e631d561d6ddc9cf61aa74d6b/states.csv'), ('Household Internet Access by State, 2017', 'https://assets.datacamp.com/production/repositories/2155/datasets/3be5b05dd02bff25b4f5efdb22d1aa1777fe799e/states_internet.gpkg'), ('Brooklyn Tract Demographics, 2000', 'https://assets.datacamp.com/production/repositories/2155/datasets/75a53f4cafd31c368d147e0b64755d74d18cff66/tracts_brooklyn_2000.pickle'), ('Brooklyn Tract Geometries, 2000', 'https://assets.datacamp.com/production/repositories/2155/datasets/5246046c1acde7183c46fe07925437d1d6c43382/brooklyn_tract_2000.gpkg'), ('Brooklyn Tract Demographics, 2010', 'https://assets.datacamp.com/production/repositories/2155/datasets/75c9d53f047b5e8e69307d22a2b3f0069760c7da/tracts_brooklyn_2010.pickle'), ('Brooklyn Tract Geometries, 2010', 'https://assets.datacamp.com/production/repositories/2155/datasets/cafe61e927146e7c0e655bdacc1647243b62dc84/brooklyn_tract_2010.gpkg')]","['Intermediate Python for Data Science', 'pandas Foundations']",https://www.datacamp.com/courses/analyzing-us-census-data-in-python,Case Studies,Python
17,Analyzing US Census Data in R,4,17,59,"1,722","5,050",Analyzing US Census Data in R,"Analyzing US Census Data in R
Analysts across industries rely on data from the United States Census Bureau in their work. In this course, students will learn how to work with Census tabular and spatial data in the R environment. The course focuses on the tidycensus package for acquiring data from the decennial US Census and American Community survey in a tidyverse-friendly format, and the tigris package for accessing Census geographic data within R. By the end of this course, students will be able to rapidly visualize and explore demographic data from the Census Bureau using ggplot2 and other tidyverse tools, and make maps of Census demographic data with only a few lines of R code.
In this chapter, students will learn the basics of working with Census data in R with tidycensus.  They will acquire data using tidycensus functions, search for data, and make a basic plot.
In this chapter, students learn how to use tidyverse tools to wrangle data from the US Census and American Community Survey.  They also learn about handling margins of error in the ACS.
In this chapter, students will learn how to work with US Census Bureau geographic data in R using the tigris R package.
In this chapter, you will learn how to obtain feature geometry with the tidycensus package, and use ggplot2 and mapview to make customized static and interactive maps of US Census data.",[],"['Kyle Walker', 'Chester Ismay', 'Becca Robins']",[],"['Introduction to the Tidyverse', 'Spatial Analysis in R with sf and raster']",https://www.datacamp.com/courses/analyzing-us-census-data-in-r,Other,R
18,Anomaly Detection in R,4,13,47,"2,926","3,900",Anomaly Detection in R,"Anomaly Detection in R
Are you concerned about inaccurate or suspicious records in your data, but not sure where to start? An anomaly detection algorithm could help! Anomaly detection is a collection of techniques designed to identify unusual data points, and are crucial for detecting fraud and for protecting computer networks from malicious activity. In this course, you'll explore statistical tests for identifying outliers, and learn to use sophisticated anomaly scoring algorithms like the local outlier factor and isolation forest. You'll apply anomaly detection algorithms to identify unusual wines in the UCI Wine quality dataset and also to detect cases of thyroid disease from abnormal hormone measurements.
In this chapter, you'll learn how numerical and graphical summaries can be used to informally assess whether data contain unusual points.  You'll use a statistical procedure called Grubbs' test to check whether a point is an outlier, and learn about the Seasonal-Hybrid ESD algorithm, which can help identify outliers when the data are a time series.
In this chapter, you'll learn how to calculate the  k-nearest neighbors distance and the local outlier factor, which are used to construct continuous anomaly scores for each data point when the data have multiple features.   You'll learn the difference between local and global anomalies and how the two algorithms can help in each case.
k-nearest neighbors distance and local outlier factor use the distance or relative density of the nearest neighbors to score each point.  In this chapter, you'll explore an alternative tree-based approach called an isolation forest, which is a fast and robust method of detecting anomalies that measures how easily points can be separated by randomly splitting the data into smaller and smaller regions.
You've now been introduced to a few different algorithms for anomaly scoring. In this final chapter, you'll learn to compare the detection performance of the algorithms in instances where labeled anomalies are available.  You'll learn to calculate and interpret the precision and recall statistics for an anomaly score, and how to adapt the algorithms so they can accommodate data with categorical features.",[],"['Alastair Rushworth', 'Chester Ismay', 'Amy Peterson']","[('Furniture', 'https://assets.datacamp.com/production/repositories/2385/datasets/8977d3e5d10f1ac243696e86a64e6470f578cf57/furniture.csv'), ('Wine', 'https://assets.datacamp.com/production/repositories/2385/datasets/ee4b58d16708ae7647f7f6278c041623de6e3ad4/big_wine.csv'), ('Thyroid', 'https://assets.datacamp.com/production/repositories/2385/datasets/735c85adc275d9265b6b1bdef11a78020a31e9e3/thyroid.csv')]",['Intermediate R'],https://www.datacamp.com/courses/anomaly-detection-in-r,Probability & Statistics,R
19,Applying SQL to Real-World Problems,4,13,47,102,"3,550",Applying SQL Real-World Problems,"Applying SQL to Real-World Problems
Now that you’ve learned the basic tools of SQL you are ready to synthesize them into practical, real-world skills. In this course, you will work with a database of a fictional movie rental company. The size and complexity of this database will allow you to experience the challenges of working with databases firsthand. Throughout this course, you will use SQL to answer business-driven questions. You will learn new skills that will empower you to find the tables you need. You will then learn how to store and manage this data in tables and views that you create. Best of all you will also learn how to write code that not only clearly conveys your intent but is also legible.
You will review some of the most commonly used SQL commands to ensure you are prepared to tackle both real-world problems as well as every exercise covered in this course.
How do you find the data you need in your database in order to answer real-world business questions?  Here you will learn how to use system tables to explore your database. You will use these tables to create a new tool that contains a list of all tables and columns in your database. Finally, you will create an Entity Relationship Diagram (ERD) which will help you connect multiple tables.
Working with SQL to solve real-world problems will oftentimes require you to do more than retrieve the data you need, oftentimes you will need to manage the data in your database. This includes creating data, updating it and, when necessary, deleting it.
How do you ensure that the SQL scripts you write will be easy to understand for anyone who needs to read them? This chapter will cover approaches you can leverage to ensure that your code clearly conveys your intent, is readable by others and follows best practices.",[],"['Dmitriy Gorenshteyn', 'Chester Ismay', 'Adrián Soto']","[('DVD Rental Database', 'https://assets.datacamp.com/production/repositories/3868/datasets/3509e6592ac9ccc8ed3084649dd1be809d9c55a9/pagilla_fixed_v3.sql')]","['Intro to SQL for Data Science', 'Joining Data in SQL']",https://www.datacamp.com/courses/applying-sql-to-real-world-problems,Data Manipulation,SQL
20,Bayesian Modeling with RJAGS,4,15,58,"1,936","4,650",Bayesian Modeling RJAGS,"Bayesian Modeling with RJAGS
The Bayesian approach to statistics and machine learning is logical, flexible, and intuitive. In this course, you will engineer and analyze a family of foundational, generalizable Bayesian models. These range in scope from fundamental one-parameter models to intermediate multivariate & generalized linear regression models.  The popularity of such Bayesian models has grown along with the availability of computing resources required for their implementation. You will utilize one of these resources - the rjags package in R. Combining the power of R with the JAGS (Just Another Gibbs Sampler) engine, rjags provides a framework for Bayesian modeling, inference, and prediction.
Bayesian models combine prior insights with insights from observed data to form updated, posterior insights about a parameter. In this chapter, you will review these Bayesian concepts in the context of the foundational Beta-Binomial model for a proportion parameter. You will also learn how to use the rjags package to define, compile, and simulate this model in R.
The two-parameter Normal-Normal Bayesian model provides a simple foundation for Normal regression models.  In this chapter, you will engineer the Normal-Normal and define, compile, and simulate this model using rjags.  You will also explore the magic of the Markov chain mechanics behind rjags simulation.
In this chapter, you will extend the Normal-Normal model to a simple Bayesian regression model.  Within this context, you will explore how to use rjags simulation output to conduct posterior inference.  Specifically, you will construct posterior estimates of regression parameters using posterior means & credible intervals, you will test hypotheses using posterior probabilities, and you will construct posterior predictive distributions for new observations.
In this final chapter, you will generalize the simple Normal regression model for application in broader contexts. You will incorporate categorical predictors, engineer a multivariate regression model with two predictors, and finally extend this methodology to Poisson multivariate regression models for count variables.",[],"['Alicia Johnson', 'Chester Ismay', 'Nick Solomon', 'Eunkyung Park']","[('Sleep study data', 'https://assets.datacamp.com/production/repositories/2096/datasets/62737a3d23519405d7bfe3eceb85be0f97a07862/sleep_study.csv')]","['Fundamentals of Bayesian Data Analysis in R', 'Introduction to the Tidyverse']",https://www.datacamp.com/courses/bayesian-modeling-with-rjags,Probability & Statistics,R
21,Bayesian Regression Modeling with rstanarm,4,15,45,"1,713","3,400",Bayesian Regression Modeling rstanarm,"Bayesian Regression Modeling with rstanarm
Bayesian estimation offers a flexible alternative to modeling techniques where the inferences depend on p-values. In this course, you’ll learn how to estimate linear regression models using Bayesian methods and the rstanarm package. You’ll be introduced to prior distributions, posterior predictive model checking, and model comparisons within the Bayesian framework. You’ll also learn how to use your estimated model to make predictions for new data.
A review of frequentist regression using lm(), an introduction to Bayesian regression using stan_glm(), and a comparison of the respective outputs.
Learn how to modify your Bayesian model including changing the number and length of chains, changing prior distributions, and adding predictors.
In this chapter, we'll learn how to determine if our estimated model fits our data and how to compare competing models.
In this chapter, we'll learn how to use the estimated model to create visualizations of your model and make predictions for new data.",[],"['Jake Thompson', 'Chester Ismay', 'David Campos', 'Shon Inouye']","[('Spotify dataset', 'https://assets.datacamp.com/production/repositories/2199/datasets/3c921f85674c92085b3428c303b9364573a8bd4f/datacamp-spotify-data.csv')]","['Data Visualization with ggplot2 (Part 1)', 'Multiple and Logistic Regression', 'Bayesian Modeling with RJAGS']",https://www.datacamp.com/courses/bayesian-regression-modeling-with-rstanarm,Probability & Statistics,R
22,Big Data Fundamentals via PySpark,4,16,55,"7,142","4,600",Big Data Fundamentals via PySpark,"Big Data Fundamentals via PySpark
There's been a lot of buzz about Big Data over the past few years, and it's finally become mainstream for many  companies. But what is this Big Data? This course covers the fundamentals of Big Data via PySpark. Spark is “lightning fast cluster computing""  framework for Big Data. It provides a general data processing platform engine and lets you run programs up to 100x faster in memory, or 10x faster on disk,  than Hadoop. You’ll use PySpark, a Python package for spark programming and its powerful, higher-level libraries such as SparkSQL, MLlib (for machine learning), etc.,  to interact with works of William Shakespeare, analyze Fifa football 2018 data and perform clustering of genomic datasets. At the end of this course, you will gain an in-depth  understanding of PySpark and it’s application to general Big Data analysis.
This chapter introduces the exciting world of Big Data, as well as the various concepts and different frameworks for processing Big Data. You will understand why Apache Spark is considered the best framework for BigData.
The main abstraction Spark provides is a resilient distributed dataset (RDD), which is the fundamental and backbone data type of this engine. This chapter introduces RDDs and shows how RDDs can be created and executed using RDD Transformations and Actions.
In this chapter, you'll learn about Spark SQL which is a Spark module for structured data processing. It provides a programming abstraction called DataFrames and can also act as a distributed SQL query engine. This chapter shows how Spark SQL allows you to use DataFrames in Python.
PySpark MLlib is the Apache Spark scalable machine learning library in Python consisting of common learning algorithms and utilities. Throughout this last chapter, you'll learn important Machine Learning algorithms. You will build a movie recommendation engine and a spam filter, and use k-means clustering.",[],"['Upendra Kumar Devisetty', 'Hadrien Lacroix', 'Chester Ismay']","[('Complete Shakespeare', 'https://assets.datacamp.com/production/repositories/3514/datasets/d9e4e9c9a26e932e3164ad7585bc30fc06596a50/Complete_Shakespeare.txt'), ('Movie ratings', 'https://assets.datacamp.com/production/repositories/3514/datasets/cab267d2a4c482f3323aec8dd9278875d2048a01/ratings.csv'), ('5000 points', 'https://assets.datacamp.com/production/repositories/3514/datasets/84f3b6bab25357840cfc90c4276edc9604553fd7/5000_points.txt'), ('FIFA 2018', 'https://assets.datacamp.com/production/repositories/3514/datasets/1ad7ffa377b5ba5d42e95efc9944293be97efd62/Fifa2018_dataset.csv'), ('People', 'https://assets.datacamp.com/production/repositories/3514/datasets/db8a991f6a506fb50fff7f7baf32d2ae02e7c480/people.csv'), ('Spam', 'https://assets.datacamp.com/production/repositories/3514/datasets/2d331f8b1b3c80e205850e38c53c1284b54c46cc/spam.txt'), ('Ham', 'https://assets.datacamp.com/production/repositories/3514/datasets/26b670b5ae766aecf7ebf7bf364fe9a590c2788b/ham.txt')]",['Introduction to Python'],https://www.datacamp.com/courses/big-data-fundamentals-via-pyspark,Machine Learning,Python
23,Biomedical Image Analysis in Python,4,15,54,"4,884","4,400",Biomedical Image Analysis,"Biomedical Image Analysis in Python
The field of biomedical imaging has exploded in recent years - but for the uninitiated, even loading data can be a challenge! In this introductory course, you'll learn the fundamentals of image analysis using NumPy, SciPy, and Matplotlib. You'll navigate through a whole-body CT scan, segment a cardiac MRI time series, and determine whether Alzheimer’s disease changes brain structure. Even if you have never worked with images before, you will finish the course with a solid toolkit for entering this dynamic field.
Prepare to conquer the Nth dimension! To begin the course, you'll learn how to load, build and navigate N-dimensional images using a CT image of the human chest. You'll also leverage the useful ImageIO package and hone your NumPy and matplotlib skills.
Cut image processing to the bone by transforming x-ray images. You'll learn how to exploit intensity patterns to select sub-regions of an array, and you'll use convolutional filters to detect interesting features. You'll also use SciPy's ndimage module, which contains a treasure trove of image processing tools.
In this chapter, you'll get to the heart of image analysis: object measurement. Using a 4D cardiac time series, you'll determine if a patient is likely to have heart disease. Along the way, you'll learn the fundamentals of image segmentation, object labeling, and morphological measurement.
For the final chapter, you'll need to use your brain... and hundreds of others! Drawing data from more than 400 open-access MR images, you'll learn the basics of registration, resampling, and image comparison. Then, you'll use the extracted measurements to evaluate the effect of Alzheimer's Disease on brain structure.",[],"['Stephen Bailey', 'Lore Dirick', 'Becca Robins', 'Sara Snell']","[('RSNA Hand Radiograph', 'https://assets.datacamp.com/production/repositories/2085/datasets/61bc2353b17eb6929d6169109bff447b6d00b6bc/hand.png'), ('OASIS Brain Measurements', 'https://assets.datacamp.com/production/repositories/2085/datasets/bbf1f4e91437f8d830880d30b31ab930578a7b4b/oasis_all_volumes.csv'), ('Sunnybrook Cardiac MRI', 'https://assets.datacamp.com/production/repositories/2085/datasets/fabaa1f1675549d624eb8f5d1bc94e0b11e30a8e/sunnybrook-cardiac-mr.zip'), ('TCIA Chest CT (Sample)', 'https://assets.datacamp.com/production/repositories/2085/datasets/f44726fefae841afd24ddf83c58f34722212e67a/tcia-chest-ct-sample.zip')]",['Intermediate Python for Data Science'],https://www.datacamp.com/courses/biomedical-image-analysis-in-python,Data Manipulation,Python
24,Bond Valuation and Analysis in R,4,13,43,"7,032","3,350",Bond Valuation and Analysis in R,"Bond Valuation and Analysis in R

The fixed income market is large and filled with complex instruments. In this course, we focus on plain vanilla bonds to build solid fundamentals you will need to tackle more complex fixed income instruments.  In this chapter, we demonstrate the mechanics of valuing bonds by focusing on an annual coupon, fixed rate, fixed maturity, and option-free bond.
Estimating Yield To Maturity - The YTM measures the expected return to bond investors if they hold the bond until maturity. This number summarizes the compensation investors demand for the risk they are bearing by investing in a particular bond. We will discuss how one can estimate YTM of a bond.
Interest rate risk is the biggest risk that bond investors face. When interest rates rise, bond prices fall. Because of this, much attention is paid to how sensitive a particular bond's price is to changes in interest rates. In this chapter, we start the discussion with a simple measure of bond price volatility - the Price Value of a Basis Point. Then, we discuss duration and convexity, which  are two common measures that are used to manage interest rate risk.
We will put all of the techniques that the student has learned from Chapters One through Three into one comprehensive example. The student will be asked to value a bond by using the yield on a comparable bond and estimate the bond's duration and convexity.","['Applied Finance with R', 'Quantitative Analyst with R']","['Clifford Ang', 'Lore Dirick']",[],"['Introduction to R for Finance', 'Intermediate R for Finance', 'Importing and Managing Financial Data in R']",https://www.datacamp.com/courses/bond-valuation-and-analysis-in-r,Applied Finance,R
25,Building Chatbots in Python,4,15,49,"38,461","4,100",Building Chatbots,"Building Chatbots in Python
Messaging and voice-controlled devices are the next big platforms, and conversational computing has a big role to play in creating engaging augmented and virtual reality experiences. This course will get you started on the path toward building such applications. There are a number of unique challenges to building these kinds of programs, like how do I turn human language into instructions for machines? In this course, you'll tackle this first with rule-based systems and then with machine learning. Some chat systems are designed to be useful, while others are just good fun. You will build one of each and put everything together to make a helpful, friendly chatbot. Once you complete the course, you’ll also learn how to  connect your chatbot to Facebook Messenger!
In this chapter, you'll learn how to build your first chatbot. After gaining a bit of historical context, you'll set up a basic structure for receiving text and responding to users, and then learn how to add the basic elements of personality. You'll then build rule-based systems for parsing text.
Here, you'll use machine learning to turn natural language into structured data using spaCy, scikit-learn, and rasa NLU. You'll start with a refresher on the theoretical foundations and then move onto building models using the ATIS dataset, which contains thousands of sentences from real people interacting with a flight booking system.
In this chapter, you'll build a personal assistant to help you plan a trip. It will be able to respond to questions like ""are there any cheap hotels in the north of town?"" by looking inside a hotel’s database for matching results.
Everything you've built so far has statelessly mapped intents to actions and responses. It's amazing how far you can get with that! But to build more sophisticated bots you will always want to add some statefulness. That's what you'll do here, as you build a chatbot that helps users order coffee.",[],"['Alan Nichol', 'Hugo Bowne-Anderson', 'Yashas Roy']","[('ATIS (Airline Travel Information System)', 'https://assets.datacamp.com/production/repositories/925/datasets/bc9aa8fd897dfee464fb48b21ea0182f7d57edaa/atis.zip'), ('Hotels database', 'https://assets.datacamp.com/production/repositories/925/datasets/309bcfc999bb88b593066a0acd7d8d83bd5e175e/hotels.db')]",['Natural Language Processing Fundamentals in Python'],https://www.datacamp.com/courses/building-chatbots-in-python,Machine Learning,Python
26,Building Dashboards with flexdashboard,4,14,50,"3,153","4,150",Building Dashboards flexdashboard,"Building Dashboards with flexdashboard
Communication is a key part of the data science process. Dashboards are a popular way to present data in a cohesive visual display. In this course you'll learn how to assemble your results into a polished dashboard using the flexdashboard package. This can be as simple as adding a few lines of R Markdown to your existing code, or as rich as a fully interactive Shiny-powered experience. You will learn about the spectrum of dashboard creation tools available in R and complete this course with the ability to produce a professional quality dashboard.
In this chapter you will learn how R Markdown and the flexdashboard package are used to create a dashboard, and how to customize the layout of components on your dashboard.
This chapter will introduce the many options for including data visualizations in your dashboard. You'll learn about how to optimize your plots for display on the web.
In this chapter you will learn about other components that will allow you to create a complete dashboard. This includes ways to present everything from a single value to a complete dataset.
This chapter will demonstrate how you can use Shiny to make your dashboard interactive. You'll keep working with the San Francisco bike sharing data and build a dashboard for exploring this data set.",['Shiny Fundamentals with R'],"['Elaine McVey', 'Chester Ismay', 'Nick Solomon']","[('San Francisco bike share data', 'https://assets.datacamp.com/production/repositories/1448/datasets/1f12031000b09ad096880bceb61f6ca2fd95e2eb/sanfran_bikeshare_joined_oneday.csv'), ('San Francisco bike share station data', 'https://assets.datacamp.com/production/repositories/1448/datasets/38f4fbe05ad1b7b13a6a8f5c680eeeed67cd7cf0/stations_data.csv')]","['Building Web Applications in R with Shiny', 'Reporting with R Markdown']",https://www.datacamp.com/courses/building-dashboards-with-flexdashboard,Reporting,R
27,Building Dashboards with shinydashboard,4,13,45,"11,250","3,750",Building Dashboards shinydashboard,"Building Dashboards with shinydashboard
Once you've started learning tools for building interactive web applications with shiny, this course will translate this knowledge into building dashboards. Dashboards, a common data science deliverable, are pages that collate information, often tracking metrics from a live-updating data source. You'll gain more expertise using shiny while learning to build and design these dynamic dashboards. In the process, you'll pick up tips to optimize performance as well as best practices to create a visually appealing product.
In this chapter you will learn the basic structure of a Shiny Dashboard and how to fill it with static content.
In this chapter you will learn how to add dynamic content to your Shiny Dashboard.
In this chapter you will focus on customizing the style of your Shiny Dashboard.
In this chapter you will participate in a case study, practicing the skills you have acquired in the previous chapters.",['Shiny Fundamentals with R'],"['Lucy D’Agostino McGowan', 'Chester Ismay', 'Nick Solomon']","[('NASA fireball dataset', 'https://assets.datacamp.com/production/repositories/1661/datasets/6a69952e67540acd76ffa28386e534297c1db32b/nasa_fireball.rda'), ('Starwars dataset', 'https://assets.datacamp.com/production/repositories/1661/datasets/2d751e7a11001e8d4d4ac263ac9878361cad959d/starwars.csv')]",['Building Web Applications in R with Shiny'],https://www.datacamp.com/courses/building-dashboards-with-shinydashboard,Reporting,R
28,Building Recommendation Engines with PySpark,4,15,56,"3,531","4,550",Building Recommendation Engines PySpark,"Building Recommendation Engines with PySpark
This course will show you how to build  recommendation engines using Alternating Least Squares in PySpark. Using the popular MovieLens dataset and the Million Songs dataset, this course will take you step by step through the intuition of the Alternating Least Squares algorithm as well as the code to train, test and implement ALS models on various types of customer data. 
This chapter will show you how powerful recommendations engines can be, and provide important distinctions between collaborative-filtering engines and content-based engines as well as the different types of implicit and explicit data that recommendation engines can use. You will also learn a very powerful way to uncover hidden features (latent features) that you may not even know exist in customer datasets.
In this chapter you will review basic concepts of matrix multiplication and matrix factorization, and dive into how the Alternating Least Squares algorithm works  and what arguments and hyperparameters it uses to return the best recommendations possible. You will also learn important techniques for properly preparing your data for ALS in Spark.
In this chapter you will be introduced to the MovieLens dataset. You will walk through how to assess it's use for ALS, build out a full cross-validated ALS model on it, and learn how to evaluate it's performance. This will be the foundation for all subsequent ALS models you build using Pyspark.
In most real-life situations, you won't not have ""perfect"" customer data available to build an ALS model. This chapter will teach you how to use your customer behavior data to ""infer"" customer ratings and use those inferred ratings to build an ALS recommendation engine. Using the Million Songs Dataset as well as another version of the MovieLens dataset, this chapter will show you how to use the data available to you to build a recommendation engine using ALS and evaluate it's performance.",[],"['Jamen Long', 'Lore Dirick', 'Nick Solomon', 'Adrián Soto']",[],"['Introduction to PySpark', 'Supervised Learning with scikit-learn']",https://www.datacamp.com/courses/recommendation-engines-in-pyspark,Machine Learning,Python
29,Building Response Models in R,4,13,53,811,"4,600",Building Response Models in R,"Building Response Models in R
Almost every company collects digital information as part of their marketing campaigns and uses it to improve their marketing tactics. Data scientists are often tasked with using this information to develop statistical models that enable marketing professionals to see if their actions are paying off. In this course, you will learn how to uncover patterns of marketing actions and customer reactions by building simple models of market response. In particular, you will learn how to quantify the impact of marketing variables, such as price and different promotional tactics, using aggregate sales and individual-level choice data.
The first chapter introduces you to the basic principles and concepts of market response models. Here, you will learn how to build simple response models for product sales. In addition, you will learn about the theoretical and practical differences between linear and non-linear models for sales responses.
An effective marketing strategy combines all the tools available to communicate the benefits of a product. The key is crafting the right mix of these tools to achieve sales increases and market share goals. In the second chapter, you will learn how to incorporate the effects of advertising and promotion in your sales-response model and how to identify the marketing strategy that is most likely to succeed.
A company can only be successful in the market if its products have a competitive advantage over those of its rivals. To develop an effective marketing strategy in a competitive environment, it is essential to understand the interrelationship between marketing activity and customer behavior. In this chapter, you will learn how to explain the effects of temporary price changes on customer brand choice by employing logistic and probit response models.
The main goal of response modeling is to enable marketers to not only see a payoff for their actions today, but also tomorrow. In order to view this future payoff, a simple but reliable statistical model is required. In this last chapter, you will learn how to evaluate the predictive performance of logistic response models.",['Marketing Analytics with R'],"['Kathrin Gruber', 'Chester Ismay', 'David Campos', 'Shon Inouye']","[('Beer sales dataset', 'https://assets.datacamp.com/production/repositories/2198/datasets/00ac05c43d83841590cc74bbbc5d83d956c41131/sales.data.RData'), ('Beer choice dataset', 'https://assets.datacamp.com/production/repositories/2198/datasets/f79717c6e2300cd40ddc7320f6681cc747100685/choice.data.RData')]",['Correlation and Regression'],https://www.datacamp.com/courses/building-response-models-in-r,Probability & Statistics,R
30,Building Web Applications in R with Shiny: Case Studies,4,16,59,"7,359","4,850",Building Web Applications in R Shiny: Case Studies,"Building Web Applications in R with Shiny: Case Studies
After learning the basics of using Shiny to build web applications, this course takes you to the next level by putting your newly acquired skills into practice. You'll get experience developing fun and realistic Shiny apps for different common use cases, such as using Shiny to explore a dataset, generate a customized plot, and even create a word cloud. With all this practice and new knowledge, you will be well-equipped to develop Shiny apps for your own use.
In the first chapter, you'll review the essentials of Shiny development. You'll get reintroduced to the basic structure of a Shiny application, as well as some core Shiny concepts such as inputs, outputs, and reactivity. Completing this chapter will help refresh your Shiny knowledge and ensure you have the required skills to develop Shiny apps for real-life scenarios.
Imagine you're preparing a figure for a manuscript using R. You spend a lot of time recreating the same plot over and over again by rerunning the same code but changing small parameters each time. The size of the points, the color of the points, the plot title, the data shown on the plot—these criteria all have to be just right before publishing the figure. To save you from the hassle of rerunning the code many times, you will learn how to create a Shiny app to make a customizable plot.
Let’s say your supervisor is impressed by the plot you created with Shiny and now wants to get familiar with the dataset you used in the plot. They don't want to simply have a raw data file, they want an interactive environment where they can view the data, filter it, and download it. This chapter will guide you in creating such an application—a Shiny app for exploring the Gapminder dataset.
Your friend really likes word clouds and has written an R function to generate them. They want to share this function with all their friends, but not all of them know how to use R. You offer to help by building a Shiny app that uses their function to let people create their own word clouds. This will allow all their friends—even the ones who are unfamiliar with R—to generate word clouds using a point-and-click interface. This chapter will guide you through the steps required to build this app.",['Shiny Fundamentals with R'],"['Dean Attali', 'Sascha Mayr']",[],['Building Web Applications in R with Shiny'],https://www.datacamp.com/courses/building-web-applications-in-r-with-shiny-case-studies,Reporting,R
31,Building and Optimizing Triggers in SQL Server,4,15,49,158,"3,800",Building and Optimizing Triggers in SQL Server,"Building and Optimizing Triggers in SQL Server
Auditing your SQL Server database and keeping data integrity can be a challenging task for DBAs and database developers. SQL Server triggers are special types of stored procedures designed to help you achieve consistency and integrity of your database. This course will teach you how to work with triggers and use them in real-life examples. Specifically, you will learn about the use cases and limitations of triggers and get practice designing and implementing them. You will also learn to optimize triggers to fit your specific needs.
An introduction to the basic concepts of SQL Server triggers. Create your first trigger using T-SQL code. Learn how triggers are used and what alternatives exist.
Learn about the different types of SQL Server triggers: AFTER triggers (DML), INSTEAD OF triggers (DML), DDL triggers, and logon triggers.
Find out known limitations of triggers, as well as common use cases for AFTER triggers (DML), INSTEAD OF triggers (DML) and DDL triggers.
Learn to delete and modify triggers. Acquaint yourself with the way trigger management is done. Learn how to investigate problematic triggers in practice.",[],"['Florin Angelescu', 'Mona Khalil', 'Becca Robins', 'Marianna Lamnina']","[('Discounts table', 'https://assets.datacamp.com/production/repositories/4414/datasets/198a4c88eaee60e0af88038abc73d84f2f968ba2/discounts.csv'), ('Orders table', 'https://assets.datacamp.com/production/repositories/4414/datasets/f3e3862ffc39d47aa7b260dc6dc3efbe4c7daead/orders.csv'), ('Products table', 'https://assets.datacamp.com/production/repositories/4414/datasets/72f2c1197f5baa4b5dee40b79fddf5cfff67c633/products.csv')]","['Introduction to Relational Databases in SQL', 'Intermediate SQL Server']",https://www.datacamp.com/courses/building-and-optimizing-triggers-in-sql-server,Data Manipulation,SQL
32,Business Process Analytics in R,4,16,58,"1,875","4,550",Business Process Analytics in R,"Business Process Analytics in R
Although you might not have realized, processes take up an indispensable role in our daily lives. Your actions and those of others generate an extensive amount of data. Whether you are ordering a book, a train crosses a red light, or your thermostat heats your bathroom, every second millions of events are taking place which are stored in data centers around the world. These enormous sets of event data can be used to gain insight into processes in a virtually unlimited range of fields. However, the analysis of this data requires its own set of specific formats and techniques. This course will introduce you to process mining with R and demonstrate the different steps needed to analyze business processes.
The amount of event data has grown enormously during the last decades. A considerable amount of this data is recorded within the context of various business process. In this chapter, you will discover a methodology for analyzing process data, consisting of three stages: extraction, processing and analysis. You will have our first encounter with the specific elements of process data which are required for analysis, and have a first deep dive into the world of activities and traces, which will allow you to reveal of first glimpse of the process.
A process can be looked at from different angles: the control-flow, the performance and the organizational background. In this chapter, you will make a deep dive into each of     these perspectives. The control-flow refers to the different ways in which the process can be executed, and thus, how it is structured. Considering performance, we are both interested in     discovering how long things take, as well as when they take place. Finally, the organizational perspective looks at the actors in the process.
Event data rarely comes in a form which is ready to analyze. Therefore, you often require a set of tools to get the data in the right shape, before we can answer our research question. At the end of this chapter, you will be familiar with three common preprocessing tasks: filtering data, aggregating events and enriching data.
In this final chapter we will use everything we have learned so far to do and end-to-end analysis of an order-to-cash process. Firstly, we will transform data from various sources to an event log. Secondly, we will have a helicopter view of the process, exploring the dimensions of the data and the different activities, stages and flows in the process. Finally, we will combine preprocessing and analysis tools to formulate an answer to several research questions.",[],"['Gert  Janssenswillen', 'Yashas Roy', 'Sascha Mayr']","[('Eating patterns', 'https://assets.datacamp.com/production/repositories/1747/datasets/368f1d44a01d0b76b14e8a0a358132f66d8908d7/log_eat_patterns.RDS'), ('Order-to-cash process', 'https://assets.datacamp.com/production/repositories/1747/datasets/d46101a7b94b9b13701a66c7677731676d0bf40e/otc.zip')]",['Working with Data in the Tidyverse'],https://www.datacamp.com/courses/business-process-analytics-in-r,Probability & Statistics,R
33,Case Studies in Statistical Thinking,4,16,61,"6,065","4,850",Case Studies in Statistical Thinking,"Case Studies in Statistical Thinking
Mastery requires practice. Having completed Statistical Thinking I and II, you developed your probabilistic mindset and the hacker stats skills to extract actionable insights from your data. Your foundation is in place, and now it is time practice your craft.

In this course, you will apply your statistical thinking skills, exploratory data analysis, parameter estimation, and hypothesis testing, to two new real-world data sets. First, you will explore data from the 2013 and 2015 FINA World Aquatics Championships, where you will quantify the relative speeds and variability among swimmers. You will then perform a statistical analysis to assess the ""current controversy"" of the 2013 Worlds in which swimmers claimed that a slight current in the pool was affecting result. Second, you will study the frequency and magnitudes of earthquakes around the world. Finally, you will analyze the changes in seismicity in the US state of Oklahoma after the practice of high pressure waste water injection at oil extraction sites became commonplace in the last decade. As you work with these data sets, you will take vital steps toward mastery as you cement your existing knowledge and broaden your abilities to use statistics and Python to make sense of your data.
To begin, you'll use two data sets from Caltech researchers to rehash the key points of Statistical Thinking I and II to prepare you for the following case studies!
In this chapter, you will practice your EDA, parameter estimation, and hypothesis testing skills on the results of the 2015 FINA World Swimming Championships.
Some swimmers said that they felt it was easier to swim in one direction versus another in the 2013 World Championships. Some analysts have posited that there was a swirling current in the pool. In this chapter, you'll investigate this claim!
References - Quartz Media, Washington Post, SwimSwam (and also here), and Cornett, et al.
Herein, you'll use your statistical thinking skills to study the frequency and magnitudes of earthquakes. Along the way, you'll learn some basic statistical seismology, including the Gutenberg-Richter law. This exercise exposes two key ideas about data science: 1) As a data scientist, you wander into all sorts of domain specific analyses, which is very exciting. You constantly get to learn. 2) You are sometimes faced with limited data, which is also the case for many of these earthquake studies. You can still make good progress!
Of course, earthquakes have a big impact on society, and recently are connected to human activity. In this final chapter, you'll investigate the effect that increased injection of saline wastewater due to oil mining in Oklahoma has had on the seismicity of the region.",['Statistics Fundamentals with Python'],"['Justin Bois', 'Hugo Bowne-Anderson', 'Yashas Roy']","[('Swimming results, 2013 World Aquatics Championships', 'https://assets.datacamp.com/production/repositories/1067/datasets/ed0ba2dca1d7d515d925c62aa0badf02ef00fad8/2013_FINA.csv'), ('Swimming results, 2015 World Aquatics Championships', 'https://assets.datacamp.com/production/repositories/1067/datasets/80dc54c31868c00a584bfa3a195525fa243d839e/2015_FINA.csv'), ('Zebrafish active bout lengths', 'https://assets.datacamp.com/production/repositories/1067/datasets/8885c23f1c156149b736ca2ea0d9b01bbc727ecd/gandhi_et_al_bouts.csv'), ('Oklahoma earthquakes (1950 to mid-2017)', 'https://assets.datacamp.com/production/repositories/1067/datasets/c12865c9df2b6e63a40a53eaeee7caffb6cf87ac/oklahoma_earthquakes_1950-2017.csv'), ('Bacterial growth', 'https://assets.datacamp.com/production/repositories/1067/datasets/8c69b496a875ae9597a4962269baae2ceab341f0/park_bacterial_growth.csv'), ('Parkfield earthquakes (1950 to mid-2017)', 'https://assets.datacamp.com/production/repositories/1067/datasets/dfefd6ab5cf704d0723ec08723c9e7c9978c1700/parkfield_earthquakes_1950-2017.csv')]","['Statistical Thinking in Python (Part 1)', 'Statistical Thinking in Python (Part 2)']",https://www.datacamp.com/courses/case-studies-in-statistical-thinking,Probability & Statistics,Python
34,Categorical Data in the Tidyverse,4,13,44,"3,207","3,600",Categorical Data in Tidyverse,"Categorical Data in the Tidyverse
As a data scientist, you will often find yourself working with non-numerical data, such as job titles, survey responses, or demographic information. R has a special way of representing them, called factors, and this course will help you master working with them using the tidyverse package forcats. We’ll also work with other tidyverse packages, including ggplot2, dplyr, stringr, and tidyr and use real world datasets, such as the fivethirtyeight flight dataset and Kaggle’s State of Data Science and ML Survey. Following this course, you’ll be able to identify and manipulate factor variables, quickly and efficiently visualize your data, and effectively communicate your results. Get ready to categorize!
In this chapter, you’ll learn all about factors. You’ll discover the difference between categorical and ordinal variables, how R represents them, and how to inspect them to find the number and names of the levels. Finally, you’ll find how forcats, a tidyverse package, can improve your plots by letting you quickly reorder variables by their frequency.
You’ll continue to dive into the forcats package, learning how to change the order and names of levels and even collapse them into one another.
Having gotten a good grasp of forcats, you’ll expand out to the rest of the tidyverse, learning and reviewing functions from dplyr, tidyr, and stringr. You’ll refine graphs with ggplot2 by changing axes to percentage scales, editing the layout of the text, and more.
In this final chapter, you’ll take all that you’ve learned and apply it in a case study. You’ll learn more about working with strings and summarizing data, then replicate a publication quality 538 plot.","['Data Analyst with R', 'Tidyverse Fundamentals with R']","['Emily Robinson', 'Chester Ismay', 'Becca Robins']","[('538 Flying Etiquette survey', 'https://assets.datacamp.com/production/repositories/1834/datasets/bef2c6e1ef42a2f230383e080fa7379912860017/flying-etiquette.csv'), ('Kaggle multiple choice responses', 'https://assets.datacamp.com/production/repositories/1834/datasets/584ec6ab685e3795f79963486ea9c751b90a4bf0/smc_with_js.csv')]","['Introduction to the Tidyverse', 'Working with Data in the Tidyverse']",https://www.datacamp.com/courses/categorical-data-in-the-tidyverse,Data Manipulation,R
35,ChIP-seq Workflows in R,4,13,46,969,"3,650",ChIP-seq Workflows in R,"ChIP-seq Workflows in R
ChIP-seq analysis is an important branch of bioinformatics. It provides a window into the machinery that makes the cells in our bodies tick. Whether it is a brain cell helping you to read this web page or an immune cell patrolling your body for microorganisms that would make you sick, they all carry the same genome. What differentiates them are the genes that are active at any given time. Which genes these are is determined by a complex system of proteins that can activate and deactivate genes. When this regulatory machinery gets out of control, it can lead to cancer and other debilitating diseases. ChIP-seq analysis allows us to understand the function of regulatory proteins, how they can contribute to disease and can provide insights into how we may be able to intervene to prevent cells from spinning out of control. In this course, you will explore a real dataset while learning how to process and analyze ChIP-seq data in R.
Introduction to ChIP-seq experiments. Why are they interesting? What sort of phenomena can be studied with ChIP-seq and what can we learn from these experiments.
Now the ChIP-seq analysis begins in earnest. This chapter introduces Bioconductor tools to import and clean the data.
This chapter introduces techniques to identify and visualise differences between ChIP-seq samples.
Being able to identify differential binding between groups of samples is great, but what does it mean?  This chapter discusses strategies to interpret differential binding results to go from peak calls to biologically meaningful insights.",[],"['Peter Humburg', 'Sascha Mayr', 'David Campos', 'Shon Inouye']","[('Androgen Receptor ChIP-seq Peaks dataset', 'https://assets.datacamp.com/production/repositories/1556/datasets/c8196863474828ad64357c8327eeab64a5f3a06d/androgen_receptor_binding_peaks.zip'), ('Chromosome 20 dataset', 'https://assets.datacamp.com/production/repositories/1556/datasets/c817df755a33469a50f455735cda02d34e452050/chr20_29729372-29929372.bam.txt')]","['Intermediate R', 'Introduction to Bioconductor']",https://www.datacamp.com/courses/chip-seq-workflows-in-r,Other,R
36,Cleaning Data in Python,4,17,58,"76,388","4,800",Cleaning Data,"Cleaning Data in Python
A vital component of data science involves acquiring raw data and getting it into a form ready for analysis. It is commonly said that data scientists spend 80% of their time cleaning and manipulating data, and only 20% of their time actually analyzing it. This course will equip you with all the skills you need to clean your data in Python, from learning how to diagnose problems in your data, to dealing with missing values and outliers. At the end of the course, you'll apply all of the techniques you've learned to a case study to clean a real-world Gapminder dataset.
Say you've just gotten your hands on a brand new dataset and are itching to start exploring it. But where do you begin, and how can you be sure your dataset is clean? This chapter will introduce you to data cleaning in Python. You'll learn how to explore your data with an eye for diagnosing issues such as outliers, missing values, and duplicate rows.
Learn about the principles of tidy data, and more importantly, why you should care about them and how they make data analysis more efficient. You'll gain first-hand experience with reshaping and tidying data using techniques such as pivoting and melting.
The ability to transform and combine your data is a crucial skill in data science, because your data may not always come in one monolithic file or table for you to load. A large dataset may be broken into separate datasets to facilitate easier storage and sharing. But it’s important to be able to run your analysis on a single dataset. You’ll need to learn how to combine datasets or clean each dataset separately so you can combine them later for analysis.
Dive into some of the grittier aspects of data cleaning. Learn about string manipulation and pattern matching to deal with unstructured data, and then explore techniques to deal with missing or duplicate data. You'll also learn the valuable skill of programmatically checking your data for consistency, which will give you confidence that your code is running correctly and that the results of your analysis are reliable.
In this final chapter, you'll apply all of the data cleaning techniques you've learned in this course toward tidying a real-world, messy dataset obtained from the Gapminder Foundation. Once you're done, not only will you have a clean and tidy dataset, you'll also be ready to start working on your own data science projects using Python.","['Data Analyst with Python', 'Data Scientist with Python', 'Importing & Cleaning Data with Python', 'Python Programmer']","['Daniel Chen', 'Hugo Bowne-Anderson', 'Yashas Roy']","[('Air quality', 'https://assets.datacamp.com/production/repositories/666/datasets/c16448e3f4219f900f540c455fdf87b0f3da70e0/airquality.csv'), ('DOB job application filings', 'https://assets.datacamp.com/production/repositories/666/datasets/b54f64ca50c859e38fd68bcc7c932d09976709b8/dob_job_application_filings_subset.csv'), ('Ebola', 'https://assets.datacamp.com/production/repositories/666/datasets/6da83b3d2017245217d35989960184234a6c4e7f/ebola.csv'), ('Gapminder', 'https://assets.datacamp.com/production/repositories/666/datasets/8e869c545c913547d94b61534b2f8d336a2c8c87/gapminder.csv'), ('Tuberculosis', 'https://assets.datacamp.com/production/repositories/666/datasets/cf05b5e01009dd5d61d7db5ac5fb790042e7fd09/tb.csv'), ('Tips', 'https://assets.datacamp.com/production/repositories/666/datasets/b064fa9e0684a38ac15b0a19845367c29fde978d/tips.csv'), ('NYC Uber data', 'https://assets.datacamp.com/production/repositories/666/datasets/c202eb5e7ae1ebf87036a30dcea577096f02c861/nyc_uber_2014.csv')]","['Introduction to Python', 'Intermediate Python for Data Science']",https://www.datacamp.com/courses/cleaning-data-in-python,Importing & Cleaning Data,Python
37,Cleaning Data in R,4,15,58,"95,292","4,700",Cleaning Data in R,"Cleaning Data in R
It's commonly said that data scientists spend 80% of their time cleaning and manipulating data and only 20% of their time actually analyzing it. For this reason, it is critical to become familiar with the data cleaning process and all of the tools available to you along the way. This course provides a very basic introduction to cleaning data in R using the tidyr, dplyr, and stringr packages. After taking the course you'll be able to go from raw data to awesome insights as quickly and painlessly as possible!
This chapter will give you an overview of the process of data cleaning with R, then walk you through the basics of exploring raw data.
This chapter will give you an overview of the principles of tidy data, how to identify messy data, and what to do about it.
This chapter will teach you how to prepare your data for analysis. We will look at type conversion, string manipulation, missing and special values, and outliers and obvious errors.
In this chapter, you will practice everything you've learned from the first three chapters in order to clean a messy dataset using R.","['Data Analyst with R', 'Data Scientist with R', 'Importing & Cleaning Data with R']","['Nick Carchedi', 'Jeff Paadre']","[('Messy weather data', 'https://assets.datacamp.com/production/repositories/34/datasets/b3c1036d9a60a9dfe0f99051d2474a54f76055ea/weather.rds'), ('BMI data', 'https://assets.datacamp.com/production/repositories/34/datasets/a0a569ebbb34500d11979eba95360125127e6434/bmi_clean.csv'), ('Census data', 'https://assets.datacamp.com/production/repositories/34/datasets/f82ab0a3ccb95fe40e18c6eac5644d288cd126ea/census-retail.csv'), ('Student data (with dates)', 'https://assets.datacamp.com/production/repositories/34/datasets/f75a87dbbdf2cf79e2286f97b2af22146cb717b1/students_with_dates.csv')]",['Introduction to R'],https://www.datacamp.com/courses/cleaning-data-in-r,Importing & Cleaning Data,R
38,Cleaning Data with Apache Spark in Python,4,16,53,799,"4,150",Cleaning Data Apache Spark,"Cleaning Data with Apache Spark in Python
Working with data is tricky - working with millions or even billions of rows is worse.
Did you receive some data processing code written on a laptop with fairly pristine data?
Chances are you’ve probably been put in charge of moving a basic data process from prototype to production.
You may have worked with real world datasets, with missing fields, bizarre formatting, and orders of magnitude more data. Even if this is all new to you, this course helps you learn what’s needed to prepare data processes using Python with Apache Spark.
You’ll learn terminology, methods, and some best practices to create a performant, maintainable, and understandable data processing platform.
A review of DataFrame fundamentals and the importance of data cleaning.
A look at various techniques to modify the contents of DataFrames in Spark.
Improve data cleaning tasks by increasing performance or reducing resource requirements.
Learn how to process complex real-world data using Spark and the basics of pipelines.",[],"['Mike Metzger', 'Hadrien Lacroix', 'Hillary Green-Lerman']","[('Dallas Council Votes', 'https://assets.datacamp.com/production/repositories/4336/datasets/ea700976560a6f1760782c6e7310a662120c63b5/DallasCouncilVotes.csv.gz'), ('Dallas Council Voters', 'https://assets.datacamp.com/production/repositories/4336/datasets/c0aa672020bce21eef0d875a484a4fd44da042cf/DallasCouncilVoters.csv.gz'), ('Flights - 2014', 'https://assets.datacamp.com/production/repositories/4336/datasets/f412f5acbef38630147c2956d8703bafbcd0f74c/AA_DFW_2014_Departures_Short.csv.gz'), ('Flights - 2015', 'https://assets.datacamp.com/production/repositories/4336/datasets/475d2803541ba8facb2c39024dd0d9497859dc6c/AA_DFW_2015_Departures_Short.csv.gz'), ('Flights - 2016', 'https://assets.datacamp.com/production/repositories/4336/datasets/c1abacafea802998597d6c68b27c7b8650a18ab8/AA_DFW_2016_Departures_Short.csv.gz'), ('Flights - 2017', 'https://assets.datacamp.com/production/repositories/4336/datasets/04db01ffbd39f7bf2f88ffd5b7924b2de0419168/AA_DFW_2017_Departures_Short.csv.gz')]","['Intermediate Python for Data Science', 'Introduction to PySpark']",https://www.datacamp.com/courses/cleaning-data-with-apache-spark-in-python,Importing & Cleaning Data,Python
39,Cluster Analysis in R,4,16,52,"15,639","3,800",Cluster Analysis in R,"Cluster Analysis in R
Cluster analysis is a powerful toolkit in the data science workbench. It is used to find groups of observations (clusters) that share similar characteristics. These similarities can inform all kinds of business decisions; for example, in marketing, it is used to identify distinct groups of customers for which advertisements can be tailored. In this course, you will learn about two commonly used clustering methods -  hierarchical clustering and k-means clustering. You won't just learn how to use these methods, you'll build a  strong intuition for how they work and how to interpret their results. You'll develop this intuition by exploring three different datasets: soccer player positions, wholesale customer spending data, and longitudinal occupational wage data.
Cluster analysis seeks to find groups of observations that are similar to one another, but the identified groups are different from each other. This similarity/difference is captured by the metric called distance. In this chapter, you will learn how to calculate the distance between observations for both continuous and categorical features. You will also develop an intuition for how the scales of your features can affect distance.
This chapter will help you answer the last question from chapter 1 - how do you find groups of similar observations (clusters) in your data using the distances that you have calculated? You will learn about the fundamental principles of hierarchical clustering - the linkage criteria and the dendrogram plot - and how both are used to build clusters. You will also explore data from a  wholesale distributor in order to perform market segmentation of clients using their spending habits.
In this chapter, you will build an understanding of the principles behind the k-means algorithm,  learn how to select the right k when it isn't previously known, and revisit the wholesale data from a different perspective.
In this chapter, you will apply the skills you have learned to explore how the average salary amongst professions have changed over time.","['Data Scientist with R', 'Unsupervised Machine Learning with R']","['Dmitriy Gorenshteyn', 'Yashas Roy', 'Richie Cotton']","[('Soccer player positions', 'https://assets.datacamp.com/production/repositories/1219/datasets/94af7037c5834527cc8799a9723ebf3b5af73015/lineup.rds'), ('Occupational Employment Statistics (OES)', 'https://assets.datacamp.com/production/repositories/1219/datasets/1e1ec9f146a25d7c71a6f6f0f46c3de7bcefd36c/oes.rds'), ('Wholesale customer spending', 'https://assets.datacamp.com/production/repositories/1219/datasets/3558d2b5564714d85120cb77a904a2859bb3d03e/ws_customers.rds')]",['Intermediate R'],https://www.datacamp.com/courses/cluster-analysis-in-r,Machine Learning,R
40,Clustering Methods with SciPy,4,14,46,"3,032","3,650",Clustering Methods SciPy,"Clustering Methods with SciPy
You have probably come across Google News, which automatically groups similar news articles under a topic. Have you ever wondered what process runs in the background to arrive at these groups? In this course, you will be introduced to unsupervised learning through clustering using the SciPy library in Python. This course covers pre-processing of data and application of hierarchical and k-means clustering. Through the course, you will explore player statistics from a popular football video game, FIFA 18. After completing the course, you will be able to quickly apply various clustering algorithms on data, visualize the clusters formed and analyze results.
Before you are ready to classify news articles, you need to be introduced to the basics of clustering. This chapter familiarizes you with a class of machine learning algorithms called unsupervised learning and then introduces you to clustering, one of the popular unsupervised learning algorithms. You will know about two popular clustering techniques - hierarchical clustering and k-means clustering. The chapter concludes with basic pre-processing steps before you start clustering data.
This chapter focuses on a popular clustering algorithm - hierarchical clustering - and its implementation in SciPy. In addition to the procedure to perform hierarchical clustering, it attempts to help you answer an important question - how many clusters are present in your data? The chapter concludes with a discussion on the limitations of hierarchical clustering and discusses considerations while using hierarchical clustering.
This chapter introduces a different clustering algorithm - k-means clustering - and its implementation in SciPy. K-means clustering overcomes the biggest drawback of hierarchical clustering that was discussed in the last chapter. As dendrograms are specific to hierarchical clustering, this chapter discusses one method to find the number of clusters before running k-means clustering. The chapter concludes with a discussion on the limitations of k-means clustering and discusses considerations while using this algorithm.
Now that you are familiar with two of the most popular clustering techniques, this chapter helps you apply this knowledge to real-world problems. The chapter first discusses the process of finding dominant colors in an image, before moving on to the problem discussed in the introduction - clustering of news articles. The chapter concludes with a discussion on clustering with multiple variables, which makes it difficult to visualize all the data.",[],"['Shaumik Daityari', 'Hillary Green-Lerman', 'Sara Billen']","[('FIFA sample', 'https://assets.datacamp.com/production/repositories/3842/datasets/10b1fd2d470d12f2486be7ffb05ab96a1b745631/fifa_18_sample_data.csv'), ('FIFA', 'https://assets.datacamp.com/production/repositories/3842/datasets/2f0473692782600a2b7c0f7d4a0dc38295c87015/fifa_18_dataset.csv'), ('Movies', 'https://assets.datacamp.com/production/repositories/3842/datasets/8bae4cc436725404038a278f6439b096bebbfd34/movies_plot.csv')]",['Intermediate Python for Data Science'],https://www.datacamp.com/courses/clustering-methods-with-scipy,Machine Learning,Python
41,Command Line Automation in Python,4,16,51,453,"3,950",Command Line Automation,"Command Line Automation in Python
There are certain skills that will stay with you your entire life. One of those skills is learning to automate things. There is a motto for automation that gets straight to the point, ""If it isn't automated...it's broken"". In this course, you learn to adopt this mindset. In one of the many examples, you will create automation code that will traverse a filesystem, find files that match a pattern, and then detect which files are duplicates. Following the course, you will be able to automate many common file system tasks and be able to manage and communicate with Unix processes.
Learn to use powerful IPython shell commands that will enhance your day to day coding.  These commands include  SList objects that can sort and filter shell output all from the comfort of the IPython terminal.
Learn to harness Unix processes with the subprocess module.   By combining the output and input of scripts, processes, and applications, you'll create pipelines to automate complex tasks.
Use the pathlib module to perform file system operations in Python.  You'll learn to write tools to walk the filesystem, write files and archive directories all with a few lines of code.
Learn how to use functions to automate complex workflows.  You'll use the click command line tool module to create sophisticated command line tools in a few lines of code.",[],"['Noah  Gift', 'Hillary Green-Lerman', 'Adrián Soto']",[],"['Intermediate Python for Data Science', 'Introduction to Shell for Data Science']",https://www.datacamp.com/courses/command-line-automation-in-python,Programming,Python
42,Communicating with Data in the Tidyverse,4,15,53,"8,497","4,350",Communicating Data in Tidyverse,"Communicating with Data in the Tidyverse
They say that a picture is worth a thousand words. Indeed, successfully promoting your data analysis is not only a matter of accurate and effective graphics, but also of aesthetics and uniqueness. This course teaches you how to leverage the power of ggplot2 themes for producing publication-quality graphics that stick out from the mass of boilerplate plots out there. It shows you how to tweak and get the most out of ggplot2 in order to produce unconventional plots that draw attention on social media. In the end, you will combine that knowledge to produce a slick and custom-styled report with RMarkdown and CSS – all of that within the powerful tidyverse.
In this chapter, you will have a first look at the data you're going to work with throughout this course: the relationship between weekly working hours and monetary compensation in European countries, according to the International Labour Organization (ILO). After that, you'll dive right in and discover a stunning correlation by employing an exploratory visualization. You will then apply a custom look to that graphic – you'll turn an ordinary plot into an aesthetically pleasing and unique data visualization.
Barcharts, scatter plots, and histograms are probably the most common and effective data visualizations. Yet, sometimes, there are even better ways to visually highlight the finding you want to communicate to your audience. So-called ""dot plots"" make us better grasp and understand changes in data: development over time, for example. In this chapter, you'll build a custom and unique visualization that emphasizes and explains exactly one aspect of the story you want to tell.
Back in the old days, researchers and data analysts used to generate plots in R and then tediously copy them into their LaTeX or Word documents. Nowadays, whole reports can be produced and reproduced from within R and RStudio, using the RMarkdown language – combining R chunks, formatted prose, tables and plots. In this chapter, you'll take your previous findings, results, and graphics and integrate them into such a report to tell the story that needs to be told.
Your boss, your client, or your professor usually expects your results to be accurate and presented in a clear and concise structure. However, coming up with a nicely formatted and unique report on top of that is certainly a plus and RMarkdown can be customized to accomplish this. In this last chapter, you'll take your report from the last chapter and brand it with your own custom and unique style.","['Data Analyst with R', 'Data Scientist with R', 'Tidyverse Fundamentals with R']","['Timo Grossenbacher', 'Yashas Roy', 'Chester Ismay']","[('Hourly Compensation (ILO)', 'https://assets.datacamp.com/production/repositories/1464/datasets/a252b4b4a25229cb654fc4e4864cb1ea78e68c03/ilo_hourly_compensation.RData'), ('Weekly Working Hours (ILO)', 'https://assets.datacamp.com/production/repositories/1464/datasets/49e22cc7d46a440348c920c621e75b0681120edb/ilo_working_hours.RData')]",['Introduction to the Tidyverse'],https://www.datacamp.com/courses/communicating-with-data-in-the-tidyverse,Data Visualization,R
43,Conda Essentials,3,0,28,"6,663","2,100",Conda Essentials,"Conda Essentials
Software is constantly evolving, so data scientists need a way to update the software they are using without breaking things that already work.  Conda is an open source, cross-platform tool for managing packages and working environments for many different programming languages.  This course explains how to use its core features to manage your software so that you and your colleagues can reproduce your working environments reliably with minimum effort.
This chapter shows you how to install, update and remove packages using conda.
In this chapter you will learn how to search and install packages from various channels with Conda.
This chapter shows you how work with Conda environments.
This chapter shows you how to easily manage projects using environments.","['Data Scientist with Python', 'Python Programmer']","['David Mertz', 'Albert DeFusco', 'Dhavide Aruliah', 'Sumedh Panchadhar']",[],['Introduction to Shell for Data Science'],https://www.datacamp.com/courses/conda-essentials,Programming,Shell
44,Conda for Building & Distributing Packages,3,0,28,"1,119","2,150",Conda Building & Distributing Packages,"Conda for Building & Distributing Packages
Now that you're proficient in many areas of data science with Python it's time to share your code and data with others. In this course you'll learn the fundamentals of sharing your data science assets. You'll learn how to leverage Anaconda Projects to package data, code, and conda environments into a single archive for other data scientists to run. You'll learn the basics of creating Python packages that provide importable modules. Finally, you'll learn how to write Conda recipes for your packages, build them, and share them on Anaconda Cloud.
Anaconda Projects allow you to package code, data, and Conda environments for others to use easily. Starting with with simple data science applications you'll create Anaconda Project archives that enable reproducible data science.
In this chapter you'll learn how to transform your Python scipts into modules and packages. You'll learn how to use setuptools to specify important metadata like version numbers and licenses.
Now that you have prepared your Python package using setuptools in this chapter you'll learn how to write a Conda recipe. Conda recipes describe the required Conda packages to build and run your package. You'll then build cross-platform packages and upload them to Anaconda Cloud.",[],"['Albert DeFusco', 'David Mertz', 'Dhavide Aruliah']",[],['Conda Essentials'],https://www.datacamp.com/courses/conda-for-building-distributing-packages,Programming,Shell
45,Conditional Formatting in Spreadsheets,4,14,51,"1,162","4,400",Conditional Formatting in Spreadsheets,"Conditional Formatting in Spreadsheets
Spreadsheets often suffer from having too much data. If you want to tell the underlying story that is in the data without creating additional reports, conditional formatting can help! Whether it's showing the age of your inventory by highlighting the items using a color scale, or accentuating the largest variances in year over year financial data, conditional formatting has built-in options that can be used without any complex code. It can be used instead of sorting or filtering since it works with the data that is already there! By the end, you will be creating your own report using conditional formatting to analyze a company's payroll.
Learn what conditional formatting is and how it can be used to emphasize the important data in a spreadsheet. We will discuss a variety of the built-in options you can use to apply conditional formatting rules to your data.
In this chapter, you will learn how to apply conditional formatting in more flexible ways. We'll discuss a variety of functions you can use to create conditional formatting rules with custom formulas.
Learn tricks to use conditional formatting in unique ways! In this chapter, you will learn more functions, build your own searches, and make interactive task lists with checkboxes.
In this chapter, you will use everything you have learned about conditional formatting to analyze a company's payroll. You will be working with dates, looking for duplicates, and checking for errors to create your report.",[],"['Adam Steinfurth', 'Chester Ismay', 'Amy Peterson']",[],[],https://www.datacamp.com/courses/conditional-formatting-in-spreadsheets,Data Manipulation,Spreadsheets
46,Convolutional Neural Networks for Image Processing,4,13,45,"8,809","3,650",Convolutional Neural Networks Image Processing,"Convolutional Neural Networks for Image Processing
Deep learning methods use data to train neural network algorithms to do a variety of machine learning tasks, such as classification of different classes of objects. Convolutional neural networks are deep learning algorithms that are particularly powerful for analysis of images. This course will teach you how to construct, train and evaluate convolutional neural networks. You will also learn how to improve their ability to learn from data, and how to interpret the results of the training.
Convolutional neural networks use the data that is represented in images to learn. In this chapter, we will probe data in images, and we will learn how to use Keras to train a neural network to classify objects that appear in images.
Convolutions are the fundamental building blocks of convolutional neural networks. In this chapter, you will be introducted to convolutions and learn how they operate on image data. You will also see how you incorporate convolutions into Keras neural networks.
Convolutional neural networks gain a lot of power when they are constructed with multiple layers (deep networks). In this chapter, you will learn how to stack multiple convolutional layers into a deep network. You will also learn how to keep track of the number of parameters, as the network grows, and how to control this number.
There are many ways to improve training by neural networks. In this chapter, we will focus on our ability to track how well a network is doing, and explore approaches towards improving convolutional neural networks.",[],"['Ariel Rokem', 'Lore Dirick', 'Eunkyung Park', 'Sumedh Panchadhar']","[('Shutterstock straight', 'https://assets.datacamp.com/production/repositories/1820/datasets/7ae58c178550ca7d108bcec7a9af0957b7a6a571/shutterstock_straight.jpg')]",['Deep Learning in Python'],https://www.datacamp.com/courses/convolutional-neural-networks-for-image-processing,Machine Learning,Python
47,Correlation and Regression,4,18,58,"43,009","4,200",Correlation and Regression,"Correlation and Regression
Ultimately, data analysis is about understanding relationships among variables. Exploring data with multiple variables requires new, more complex tools, but enables a richer set of comparisons. In this course, you will learn how to describe relationships between two numerical quantities. You will characterize these relationships graphically, in the form of summary statistics, and through simple linear regression models.
In this chapter, you will learn techniques for exploring bivariate relationships.
This chapter introduces correlation as a means of quantifying bivariate relationships.
With the notion of correlation under your belt, we'll now turn our attention to simple linear models in this chapter.
This chapter looks at how to interpret the coefficients in a regression model.
In this final chapter, you'll learn how to assess the ""fit"" of a simple linear regression model.","['Data Analyst with R', 'Data Scientist with R', 'Statistics Fundamentals with R']","['Ben Baumer', 'Nick Carchedi', 'Tom Jeon']",[],"['Introduction to R', 'Introduction to Data', 'Exploratory Data Analysis']",https://www.datacamp.com/courses/correlation-and-regression,Probability & Statistics,R
48,Course Creation at DataCamp,3,20,69,540,"4,050",Course Creation at DataCamp,"Course Creation at DataCamp
Welcome to the DataCamp family! You are about to begin creating a course that, in just a few months, will be available to over 3 million students worldwide! If you're new to eLearning, you'll soon learn that teaching an online course is very different from teaching in a classroom. But we're here to help! This course will provide a guide to the DataCamp Course Creation process; an introduction to the tools we use, including GitHub, Asana, and our very own course editor; and the different types of exercises and slides you can use, and how to make sure you're reaching students at the other end of the screen. While creating your course, you will find you have other questions, such as, ""How will my course be marketed?"", ""How do I recommend other instructors to DataCamp?"", or ""When do I get paid?"". This course will also provide you with direction on where to find answers to all your questions. Following this course, you should be familiar with the DataCamp Course Creation process and be ready to start your very own DataCamp course. Have fun and see you in the course!
Are you interested in creating a DataCamp course, but not sure what exactly to expect? This introductory chapter will you give an overview of the different phases of course creation and the people you'll work with during each phase. You'll focus on the first two phases: course design and course development and meet Curriculum Leads and Content Developers, who will be your guides.
Before diving deep into pedagogy and the nitty-gritty details of DataCamp exercises, it's important to learn the values we hold ourselves and our instructors to when building a course, namely accountability, predictability, and transparency. Furthermore, it is vital to  understand the tools we use, how they work, and how they support our values.
At DataCamp, we strive for quality in our content, our product, and our instructors. We do this by building our courses with a specific structure around learning objectives. We've built this structure so that our students get the best eLearning experience. We also know that this can be a challenge, so in this chapter, we provide a few tips and tricks on how to teach effectively on an eLearning platform.
Now that you know our tools and the tricks to making a great course, dive into the nitty-gritty of DataCamp courses. In this chapter, you'll learn about how to create videos and the different types of interactive exercises we support on our platform. You'll learn about the different parts of interactive exercises and the guidelines we follow to ensure our courses keep our students engaged. Lastly, you'll return to GitHub and learn about how it's used to get all the content from your videos and exercises reviewed to ensure it's top quality!
Many things happen after a course has been designed and developed; namely, it must be launched! In this chapter, you will learn about the different aspects of course launch, the work that goes into a course following its launch, and importantly, how you will get paid for your course. If you have enjoyed creating a course, and want to make more DataCamp content, you will find out all we have to offer!",[],"['Content Team', 'Chester Ismay', 'Yashas Roy', 'Adrián Soto', 'Nick Carchedi', 'Becca Robins', 'Mari Nazary', 'Hadrien Lacroix', 'Martijn Theuwissen', 'Amy Peterson', 'Sara Billen', 'Hillary Green-Lerman', 'Mona Khalil', 'Jen Bricker', 'David Venturi', 'David Campos', 'kaelen medeiros', 'Sascha Mayr', 'Jeroen Hermans', 'Shon Inouye', 'Sumedh Panchadhar']",[],[],https://www.datacamp.com/courses/course-creation-at-datacamp,Other,R
49,Creating Robust Python Workflows,4,16,47,748,"3,900",Creating Robust Python Workflows,"Creating Robust Python Workflows
The decisions we make in life are guided by our principles. No one is born with a life philosophy, instead everyone creates their own over time. In this course, you will develop a set of principles for your data science and software development projects. These principles will save time, prevent frustration, and build your confidence as a data scientist and software developer. In addition to best practices in the Python programming language, You will learn to leverage hidden gems in the Python standard library and well-known tools from Python's excellent ecosystem, such as pandas and scikit-learn. The time you invest in this course will yield dividends for you and others throughout your career. Your colleagues, community members, and future self will thank you.
In this chapter, we will discuss three principles that guide decisions made by Python programmers. You will put these principles into practice in the coding exercises and throughout the rest of the course!
Documentation and tests are often overlooked, despite being essential to the success of all projects. In this chapter, you will learn how to include documentation in our code and practice Test-Driven Development (TDD), a process that puts tests first!
Shell scripting is an essential part of any Python workflow. In this chapter, you will learn how to build command-line interfaces (CLIs) for Python programs and to automate common tasks related to version control, virtual environments, and Python packaging.
In the final chapter of this course, you will learn how to facilitate and standardize project setup using project templates. You will also consider the benefits of zipped executable projects, Jupyter notebooks parameterization, and parallel computing.",[],"['Martin Skarzynski', 'Chester Ismay', 'Sara Billen']",[],"['Python Data Science Toolbox (Part 2)', 'Supervised Learning with scikit-learn']",https://www.datacamp.com/courses/creating-robust-python-workflows,Programming,Python
50,Credit Risk Modeling in Python,4,15,57,164,"4,850",Credit Risk Modeling,"Credit Risk Modeling in Python
If you've ever applied for a credit card or loan, you know that financial firms process your information before making a decision.  This is because giving you a loan can have a serious financial impact on their business.  But how do they make a decision?  In this course, you will learn how to prepare credit application data.  After that, you will apply machine learning and business rules to reduce risk and ensure profitability.  You will use two data sets that emulate real credit applications while focusing on business value. Join me and learn the expected value of credit risk modeling!
In this first chapter, we will discuss the concept of credit risk and define how it is calculated.  Using cross tables and plots, we will explore a real-world data set.  Before applying machine learning, we will process this data by finding and resolving problems.
With the loan data fully prepared, we will discuss the logistic regression model which is a standard in risk modeling.  We will understand the components of this model as well as how to score its performance.   Once we've created predictions, we can explore the financial impact of utilizing this model.
Decision trees are another standard credit risk model.  We will go beyond decision trees by using the trendy XGBoost package in Python to create gradient boosted trees.  After developing sophisticated models, we will stress test their performance and discuss column selection in unbalanced data.
After developing and testing two powerful machine learning models, we use key performance metrics to compare them.  Using advanced model selection techniques specifically for financial modeling, we will select one model.  With that model, we will: develop a business strategy, estimate portfolio value, and minimize expected loss.",[],"['Michael Crabtree', 'Mona Khalil', 'Ruanne Van Der Walt']","[('Raw credit data', 'https://assets.datacamp.com/production/repositories/4876/datasets/a2d8510b4aec8d0ac14ab9bee61ba3c085805967/cr_loan2.csv'), ('Clean credit data (outliers and missing data removed)', 'https://assets.datacamp.com/production/repositories/4876/datasets/33e400c8f73329d290c6c25eef33de458b4db1bf/cr_loan_nout_nmiss.csv'), ('Credit data (ready for modeling)', 'https://assets.datacamp.com/production/repositories/4876/datasets/2f6c17f10d5156a29670d1926fdf7125c002e038/cr_loan_w2.csv')]","['Intro to Python for Finance', 'Intermediate Python for Data Science']",https://www.datacamp.com/courses/credit-risk-modeling-in-python,Applied Finance,Python
51,Credit Risk Modeling in R,4,16,52,"32,551","4,000",Credit Risk Modeling in R,"Credit Risk Modeling in R

This chapter begins with a general introduction to credit risk models. We'll explore a real-life data set, then preprocess the data set such that it's in the appropriate format before applying the credit risk models.
Logistic regression is still a widely used method in credit risk modeling. In this chapter, you will learn how to apply logistic regression models on credit data in R.
Classification trees are another popular method in the world of credit risk modeling. In this chapter, you will learn how to build classification trees using credit data in R.
In this chapter, you'll learn how you can evaluate and compare the results obtained through several credit risk models.",['Quantitative Analyst with R'],['Lore Dirick'],"[('Loan Data Chapter 1', 'https://assets.datacamp.com/production/repositories/162/datasets/8f48a2cbb6150e7ae32435e55f271cad5b4b8ecf/loan_data_ch1.rds'), ('Loan Data Chapter 2, 3 and 4', 'https://assets.datacamp.com/production/repositories/162/datasets/89fa0b5120b58ae561ac53073163bd133240ac06/loan_data_ch2.rds')]","['Introduction to R for Finance', 'Intermediate R for Finance']",https://www.datacamp.com/courses/introduction-to-credit-risk-modeling-in-r,Applied Finance,R
52,Customer Analytics & A/B Testing in Python,4,16,49,"4,389","3,750",Customer Analytics & A/B Testing,"Customer Analytics & A/B Testing in Python
The most successful companies today are the ones that know their customers so well that they can anticipate their needs.
Customer analytics and in particular A/B Testing are crucial parts of
leveraging quantitative know-how to help make business decisions that generate value. This course
 covers the ins and outs of how to use Python to
 analyze customer behavior and business trends as well as how to create, run,
 and analyze A/B tests to make proactive, data-driven business decisions.
This chapter provides a brief introduction to the content that will be covered throughout the course before transitioning into a discussion of Key Performance Indicators or KPIs. You'll learn how to identify and define meaningful KPIs through a combination of critical thinking and leveraging Python tools. These techniques are all presented in a highly practical and generalizable way. Ultimately these topics serve as the core foundation for the A/B testing discussion that follows.
This chapter teaches you how to visualize, manipulate, and explore KPIs as they change over time. Through a variety of examples, you'll learn how to work with datetime objects to calculate metrics per unit time. Then we move to the techniques for how to graph different segments of data, and apply various smoothing functions  to reveal hidden trends. Finally we walk through a complete example of how to pinpoint issues through exploratory data analysis of customer data. Throughout this chapter various functions are introduced and explained in a highly generalizable way.
In this chapter you will dive fully into A/B testing. You will learn the mathematics and knowledge needed to design and successfully plan an A/B test from determining an experimental unit to finding how large a sample size is needed. Accompanying this will be an introduction to the functions and code needed to calculate the various quantities associated with a statistical test of this type.
After running an A/B test, you must analyze the data and then effectively communicate the results. This chapter begins by interleaving the theory of statistical significance and confidence intervals with the tools you need to calculate them yourself from the data. Next we discuss  how to effectively visualize and communicate these results. This chapter is the culmination of all the knowledge built over the entire course.",[],"['Ryan Grossman', 'Lore Dirick', 'Yashas Roy', 'Eunkyung Park']","[('Customer dataset', 'https://assets.datacamp.com/production/repositories/1646/datasets/c3a701a4729471ae0b92d8c300b470fd2ec0a73a/user_demographics_v1.csv'), ('In-App Purchases dataset', 'https://assets.datacamp.com/production/repositories/1646/datasets/5decd183ef3710475958bbc903160fd6354379d5/purchase_data_v1.csv'), ('Daily Revenue dataset', 'https://assets.datacamp.com/production/repositories/1646/datasets/3afb49cad9fb91c02b71b52a2ddc0071ea13764c/daily_revenue.csv'), ('User Demographics Paywall dataset', 'https://assets.datacamp.com/production/repositories/1646/datasets/01054025eb094ac1086edf8d206b313b84d911c5/user_demographics_paywall.csv'), ('AB Testing Results', 'https://assets.datacamp.com/production/repositories/1646/datasets/2751adce60684a03d8b4132adeadab8a0b95ee56/AB_testing_exercise.csv')]","['Python Data Science Toolbox (Part 1)', 'Data Types for Data Science', 'pandas Foundations', 'Manipulating DataFrames with pandas']",https://www.datacamp.com/courses/customer-analytics-ab-testing-in-python,Probability & Statistics,Python
53,Customer Segmentation in Python,4,17,55,"5,047","4,400",Customer Segmentation,"Customer Segmentation in Python
The most successful companies today are the ones that know their customers so well that they can anticipate their needs. Data analysts play a key role in unlocking these in-depth insights, and segmenting the customers to better serve them. In this course, you will learn real-world techniques on customer segmentation and behavioral analytics, using a real dataset containing anonymized customer transactions from an online retailer. You will first run cohort analysis to understand customer trends. You will then learn how to build easy to interpret customer segments. On top of that, you will prepare the segments you created, making them ready for machine learning. Finally, you will make your segments more powerful with k-means clustering, in just few lines of code! By the end of this course, you will be able to apply practical customer behavioral analytics and segmentation techniques.
In this first chapter, you will learn about cohorts and how to analyze them. You will create your own customer cohorts, get some metrics and visualize your results.
In this second chapter, you will learn about customer segments. Specifically, you will get exposure to recency, frequency and monetary value, create customer segments based on these concepts, and analyze your results.
Once you created some segments, you want to make predictions. However, you first need to master practical data preparation methods to ensure your k-means clustering algorithm will uncover well-separated, sensible segments.
In this final chapter, you will use the data you pre-processed in Chapter 3 to identify customer clusters based on their recency, frequency, and monetary value.",[],"['Karolis Urbonas', 'Hadrien Lacroix', 'Mari Nazary']","[('Chapter 1 datasets', 'https://assets.datacamp.com/production/repositories/3202/datasets/40378e0b8f88bffddc938f335bc68baa8fdf0b18/chapter_1.zip'), ('Chapter 2 datasets', 'https://assets.datacamp.com/production/repositories/3202/datasets/9c670a495912949de0166c3ce690bad536ccf621/chapter_2.zip'), ('Chapter 3 datasets', 'https://assets.datacamp.com/production/repositories/3202/datasets/cc496bdfda1d59a462bf7ff3e4117bcd34c76b35/chapter_3.zip'), ('Chapter 4 datasets', 'https://assets.datacamp.com/production/repositories/3202/datasets/eb6a32ed7e5faa4c4b237ab8afb94df55bb4b3a5/chapter_4.zip')]","['Manipulating DataFrames with pandas', 'Supervised Learning with scikit-learn']",https://www.datacamp.com/courses/customer-segmentation-in-python,Machine Learning,Python
54,Data Analysis with Spreadsheets,3,0,27,"19,730","2,700",Data Analysis Spreadsheets,"Data Analysis with Spreadsheets
This course will dig deeper into some of the core functionality of Google Sheets. There's a whole bunch of predefined functions we'll cover, like `SUM()` and `AVERAGE()`, and `VLOOKUP()`. We'll apply these techniques to do some analysis on your grades in school, look at performance statistics within a company, track monthly sales, and look at some real geographical information about the countries of the world.
This chapter introduces a very useful feature in Google Sheets: predefined functions. You'll use these functions to solve complex problems without having to worry about specific calculations. We’ll cover a lot of predefined functions,  including functions for numbers, functions for strings, and functions for dates.
In the last chapter of the course, you'll master more advanced functions like IF() and VLOOKUP(). Conditional and lookup functions won’t seem so scary after you completed this chapter.",[],"['Vincent Vankrunkelsven', 'Sascha Mayr']",[],[],https://www.datacamp.com/courses/data-analysis-with-spreadsheets,Programming,Spreadsheets
55,Data Manipulation in R with data.table,4,15,59,"3,387","5,050",Data Manipulation in R data.table,"Data Manipulation in R with data.table
The data.table package provides a high-performance version of base R's data.frame with syntax and feature enhancements for ease of use, convenience and programming speed. This course shows you how to create, subset, and manipulate data.tables. You'll also learn about the database-inspired features of data.tables, including built-in groupwise operations. The course concludes with fast methods of importing and exporting tabular text data such as CSV files. Upon completion of the course, you will be able to use data.table in R for a more efficient manipulation and analysis process. Throughout the course you'll explore the San Francisco Bay Area bike share trip dataset from 2014.
This chapter introduces data.tables as a drop-in replacement for data.frames and shows how to use data.table's i argument to filter rows.
Just as the i argument lets you filter rows, the j argument of data.table lets you select columns and also perform computations. The syntax is far more convenient and flexible when compared to data.frames.
This chapter introduces data.table's by argument that lets you perform computations by groups.  By the end of this chapter, you will master the concise DT[i, j, by] syntax of data.table.
You will learn about a unique feature of data.table in this chapter: modifying existing data.tables in place. Modifying data.tables in place makes your operations incredibly fast and is easy to learn.
Not only does the data.table package help you perform incredibly fast computations, it can also help you read and write data to disk with amazing speeds. This chapter focuses on data.table's fread() and fwrite() functions which let you import and export flat files quickly and easily!","['Data Analyst with R', 'Data Manipulation with R']","['Matt Dowle', 'Arun Srinivasan', 'Sascha Mayr', 'Benjamin  Feder', 'Eunkyung Park', 'Sumedh Panchadhar']",[],"['Introduction to R', 'Intermediate R']",https://www.datacamp.com/courses/data-manipulation-in-r-with-datatable,Programming,R
56,Data Manipulation with dplyr in R,4,13,46,869,"3,850",Data Manipulation dplyr in R,"Data Manipulation with dplyr in R
Say you've found a great dataset and would like to learn more about it. How can you start to answer the questions you have about the data? You can use dplyr to answer those questions—it can also help with basic transformations of your data. You'll also learn to aggregate your data and add, remove, or change the variables. Along the way, you'll explore a dataset containing information about counties in the United States. You'll finish the course by applying these tools to the babynames dataset to explore trends of baby names in the United States.
Learn verbs you can use to transform your data, including select, filter, arrange, and mutate. You'll use these functions to modify the counties dataset to view particular observations and answer questions about the data.
Now that you know how to transform your data, you'll want to know more about how to aggregate your data to make it more interpretable. You'll learn a number of functions you can use to take many observations in your data and summarize them, including count, group_by, summarize, ungroup, and top_n.
Learn advanced methods to select and transform columns. Also learn about select helpers, which are functions that specify criteria for columns you want to choose, as well as the rename and transmute verbs.
Work with a new dataset that represents the names of babies born in the United States each year. Learn how to use grouped mutates and window functions to ask and answer more complex questions about your data. And use a combination of dplyr and ggplot2 to make interesting graphs to further explore your data.",[],"['Chris Cardillo', 'Amy Peterson']","[('counties', 'https://assets.datacamp.com/production/repositories/4984/datasets/a924bf7063f02a5445e1f49cc1c75c78e018ac4c/counties.rds'), ('babynames', 'https://assets.datacamp.com/production/repositories/4984/datasets/a924ac5d86adba2e934d489cb9db446236f62b2c/babynames.rds')]",['Introduction to the Tidyverse'],https://www.datacamp.com/courses/data-manipulation-with-dplyr-in-r,Data Manipulation,R
57,Data Privacy and Anonymization in R,4,13,45,"1,998","3,650",Data Privacy and Anonymization in R,"Data Privacy and Anonymization in R
With social media and big data everywhere, data privacy has been a growing, public concern. Recognizing this issue, entities such as Google, Apple, and the US Census Bureau are promoting better privacy techniques; specifically differential privacy, a mathematical condition that quantifies privacy risk. In this course, you will learn to code basic data privacy methods and a differentially private algorithm based on various differentially private properties. With these tools in hand, you will learn how to generate a basic synthetic (fake) data set with the differential privacy guarantee for public data release.
This chapter covers some basic data privacy techniques that statisticians use to anonymize data. You'll first learn how to remove identifiers and then generate synthetic data from probability distributions.
After covering the basic data privacy techniques, you'll learn conceptually about differential privacy as well as how to implement the most popular and common differentially private algorithm called the Laplace mechanism.
In this chapter, you will learn the various properties of differential privacy, such as the combination rules and post-processing, to properly implement the Laplace mechanism for various kinds data questions.
In this chapter, you will learn how to release simple data sets publicly using differentially private data synthesis techniques.",['R Programmer'],"['Claire Bowen', 'Chester Ismay', 'Sumedh Panchadhar']","[('Data sets', 'https://assets.datacamp.com/production/repositories/1939/datasets/5c7ae991cdefeb4897bc38c6102b11dec40889fd/data.RData')]","['Intermediate R', 'Foundations of Probability in R', 'Introduction to the Tidyverse']",https://www.datacamp.com/courses/data-privacy-and-anonymization-in-r,Other,R
58,Data Processing in Shell,4,13,46,102,"3,550",Data Processing in Shell,"Data Processing in Shell
We live in a busy world with tight deadlines.  As a result, we fall back on what is familiar and easy, favoring GUI interfaces like Anaconda and RStudio. However, taking the time to learn data analysis on the command line is a great long-term investment because it makes us stronger and more productive data people.
 

In this course, we will take a practical approach to learn simple, powerful, and data-specific command-line skills.  Using publicly available Spotify datasets, we will learn how to download, process, clean, and transform data, all via the command line.  We will also learn advanced techniques such as command-line based SQL database operations.  Finally, we will combine the powers of command line and Python to build a data pipeline for automating a predictive model.
In this chapter, we learn how to download data files from web servers via the command line. In the process, we also learn about documentation manuals, option flags, and multi-file processing.
We continue our data journey from data downloading to data processing.  In this chapter, we utilize the command line library csvkit to convert, preview, filter and manipulate files to prepare our data for further analyses.
In this chapter, we dig deeper into all that csvkit library has to offer.  In particular, we focus on database operations we can do on the command line, including table creation, data pull, and various ETL transformation.
In the last chapter, we bridge the connection between command line and other data science languages and learn how they can work together.  Using Python as a case study, we learn to execute Python on the command line, to install dependencies using the package manager pip, and to build an entire model pipeline using the command line.",[],"['Susan Sun', 'Hillary Green-Lerman', 'Adrián Soto']","[('Spotify Songs Popularity Ranking', 'https://assets.datacamp.com/production/repositories/4180/datasets/82c41048fc89f03f3b0a4122642bc4fd39306071/Spotify_Popularity.csv'), ('Spotify Song Attributes', 'https://assets.datacamp.com/production/repositories/4180/datasets/513986f5ea7ed9a8565bba20d088d21c10e099dc/Spotify_MusicAttributes.csv')]","['Introduction to Shell for Data Science', 'Intermediate Python for Data Science', 'Intro to SQL for Data Science']",https://www.datacamp.com/courses/data-processing-in-shell,Data Manipulation,Shell
59,Data Science for Managers,4,14,51,"11,000","3,350",Data Science Managers,"Data Science for Managers
What is data science and how can you use it to strengthen your organization? This course will teach you about the skills you need on your data team, and how you can structure that team to meet your organization's needs. Data is everywhere! This course will provide you with an understanding of data sources your company can use and how to store that data. You'll also discover ways to analyze and visualize your data through dashboards and A/B tests. To wrap up the course, we'll discuss exciting topics in machine learning, including clustering, time series prediction, natural language processing (NLP), deep learning, and explainable AI! Along the way, you'll learn about a variety of real-world applications of data science and gain a better understanding of these concepts through practical exercises.
We'll start the course by defining what data science is. We'll cover the data science workflow, and how data science is applied to real-world business problems. We'll finish the chapter by learning about ways to structure your data team to meet your organization's needs.
Now that we understand the data science workflow, we'll dive deeper into the first step: data collection. We'll learn about the different data sources your company can draw from, and how to store that data once it's collected.
In this chapter, we'll discuss ways to explore and visualize data through dashboards. We'll discuss the elements of a dashboard and how to make a directed request for a dashboard. This chapter will also cover making ad hoc data requests and A/B tests, which are a powerful analytics tool that de-risk decision-making.
In this final chapter, we'll discuss the buzziest topic in data science: machine learning! We'll cover supervised and unsupervised machine learning, and clustering. Then, we'll move on to special topics in machine learning, including time series prediction, natural language processing, deep learning, and explainable AI!",[],"['Mari Nazary', 'Michael Chow', 'kaelen medeiros', 'Ramnath Vaidyanathan', 'Amy Peterson', 'Hillary Green-Lerman']",[],[],https://www.datacamp.com/courses/data-science-for-managers,Management,Theory
60,Data Types for Data Science,4,18,58,"12,577","4,850",Data Types Data Science,"Data Types for Data Science
Have you got your basic Python programming chops down for Data Science but are  yearning for more? Then this is the course for you. Herein, you'll consolidate and practice your knowledge of lists, dictionaries, tuples, sets, and date times. You'll see their relevance in working with lots of real data and how to leverage several of them in concert to solve multistep problems, including an extended case  study using Chicago metropolitan area transit data. You'll also learn how to use many of the objects in the Python Collections module, which will allow you to store and manipulate your data for a variety of Data Scientific purposes. After taking this course, you'll be ready to tackle many Data Science challenges Pythonically.
This chapter will introduce you to the fundamental Python data types - lists, sets, and tuples.  These data containers are critical as they provide the basis for storing and looping over ordered data. To make things interesting, you'll apply what you learn about these types to answer questions about the New York Baby Names dataset!
At the root of all things Python is a dictionary. Herein, you'll learn how to use them to safely handle data that can viewed in a variety of ways to answer even more questions about the New York Baby Names dataset. You'll explore how to loop through data in a dictionary, access nested data, add new data, and come to appreciate all of the wonderful capabilities of Python dictionaries.
The collections module is part of Python's standard library and holds some more advanced data containers. You'll learn how to use the Counter, defaultdict, OrderedDict and namedtuple in the context of answering questions about the Chicago transit dataset.
Handling times can seem daunting at time, but here, you'll dig in and learn how to create datetime objects, print them, look to the past and to the future. Additionally, you'll learn about some third party modules that can make all of this easier. You'll continue to use the Chicago Transit dataset to answer questions about transit times.
Time for a case study to reinforce all of your learning so far! You'll use all the containers and data types you've learned about to answer several real world questions about a dataset containing information about crime in Chicago. Have fun!",[],"['Jason Myers', 'Hugo Bowne-Anderson', 'Yashas Roy']","[('Baby names', 'https://assets.datacamp.com/production/repositories/906/datasets/8043b235dab7ca9b3667df9195459bc6bf754c2a/baby_names.csv'), ('Chicago crime', 'https://assets.datacamp.com/production/repositories/906/datasets/7fe0304955dbf05e3a0d57c8959578dcef479e81/crime_sampler.csv'), ('CTA daily station totals', 'https://assets.datacamp.com/production/repositories/906/datasets/b7806a5db41c23931fd1adf02af54ac10c15e61c/cta_daily_station_totals.csv'), ('CTA daily summary totals', 'https://assets.datacamp.com/production/repositories/906/datasets/0c8af86b914fd9edfd3d907b6006fefaadaf827b/cta_daily_summary_totals.csv')]","['Introduction to Python', 'Intermediate Python for Data Science']",https://www.datacamp.com/courses/data-types-for-data-science,Programming,Python
61,Data Visualization in R,4,15,60,"33,694","5,250",Data Visualization in R,"Data Visualization in R

This chapter gives a brief overview of some of the things you can do with base graphics in R. This graphics system is one of four available in R and it forms the basis for this course because it is both the easiest to learn and extremely useful both in preparing exploratory data visualizations to help you see what's in a dataset and in preparing explanatory data visualizations to help others see what we have found.
This chapter introduces several Base R supported plot types that are particularly useful for visualizing important features in a dataset. We start with simple tools like histograms and density plots for characterizing one variable at a time, move on to scatter plots and other useful tools for showing how two variables relate, and finally introduce some tools for visualizing more complex relationships in our dataset.
Most base R graphics functions support many optional arguments and parameters that allow us to customize our plots to get exactly what we want. In this chapter, we will learn how to modify point shapes and sizes, line types and widths, add points and lines to plots, add explanatory text and generate multiple plot arrays.
As we have seen, base R graphics provides tremendous flexibility in creating plots with multiple lines, points of different shapes and sizes, and added text, along with arrays of multiple plots. If we attempt to add too many details to a plot or too many plots to an array, however, the result can become too complicated to be useful. This chapter focuses on how to manage this visual complexity so the results remain useful to ourselves and to others.
This final chapter introduces a number of important topics, including the use of numerical plot details returned invisibly by functions like barplot() to enhance our plots, and saving plots to external files so they don't vanish when we end our current R session. This chapter also offers some guidelines for using color effectively in data visualizations, and it concludes with a brief introduction to the other three graphics systems in R.",['Data Visualization with R'],"['Ronald Pearson', 'Nick Carchedi', 'Tom Jeon']",[],['Introduction to R'],https://www.datacamp.com/courses/data-visualization-in-r,Data Visualization,R
62,Data Visualization in R with lattice,4,17,60,"3,276","4,950",Data Visualization in R lattice,"Data Visualization in R with lattice
Visualization is an essential component of interactive data analysis in R. Traditional (base) graphics is powerful, but limited in its ability to deal with multivariate data. Trellis graphics is the natural successor to traditional graphics, extending its simple philosophy to gracefully handle common multivariable data visualization tasks. This course introduces the lattice package, which implements Trellis graphics for R, and illustrates its basic use.
Introduction to some basic plotting functions in lattice. Draw histograms, scatter plots, density plots, and box and whisker plots.
These exercises will teach you to create ""conditioned"" plots  consisting of multiple panels using the formula interface.
Learn how to control and customize axis limits and visual appearance.
Learn to use panel and prepanel functions to enhance existing displays or create new ones.
The lattice package is not just meant to be used as a standalone collection of plotting functions. Rather, it is a framework that is used as a base by many other packages. Some of these are very specialized and beyond the scope of this course. Here we give a brief survey of extensions that are generally useful to enhance displays or create new ones.",['Data Visualization with R'],"['Deepayan Sarkar', 'Tom Jeon', 'Sascha Mayr']",[],"['Introduction to R', 'Data Visualization in R']",https://www.datacamp.com/courses/data-visualization-in-r-with-lattice,Data Visualization,R
63,Data Visualization in Spreadsheets,4,16,55,"3,159","4,700",Data Visualization in Spreadsheets,"Data Visualization in Spreadsheets
A picture can tell a thousand words - but only if you use the right picture! This course teaches you the fundamentals of data visualization with Google Sheets. You'll learn how to create common chart types like bar charts, histograms, and scatter charts, as well as more advanced types, such as sparkline and candlestick charts. You will look at how to prepare your data and use Data Validation and VLookup formulas to target specific data to chart. You'll learn how to use Conditional Formatting to apply a format to a cell or a range of cells based on certain criteria, and finally, how to create a dashboard showing plots and data together. Along the way, you'll use data from the Olympics, sharks attacks, and Marine Technology from the ASX.
Learn about business intelligence and dashboards for analyzing information in todays data-driven world.  Create a basic dashboard and master setting up your data to get the most out of it.
Create and format a column chart to showcase data and learn a few smart tricks along the way.  Look at using named ranges to refer to cells in your worksheet, making them user-friendly and easy to work with.
A dashboard is like a control panel.  Look at ways to allow a user to use this control panel to get different results from your dashboard.
A picture paints a thousand words.  Look at what types of charts to use in what situation to showcase your data.
Learn how to use rules based on criteria you set to format certain cells on your dashboard. See the formatting change as the values in the cells change.",[],"['Raina Hawley', 'Sascha Mayr', 'Amy Peterson']",[],['Intermediate Spreadsheets for Data Science'],https://www.datacamp.com/courses/data-visualization-in-spreadsheets,Data Visualization,Spreadsheets
64,Data Visualization with Seaborn,4,13,50,"7,286","4,200",Data Visualization Seaborn,"Data Visualization with Seaborn
Do you want to make beautiful, informative visualizations with ease? If so, then you must learn seaborn! Seaborn is a visualization library that is an essential part of the python data science toolkit. In this course, you will learn how to use seaborn's sophisticated visualization tools to analyze multiple real world datasets including the American Housing Survey, college tuition data, and guests from the popular television series, The Daily Show. Following this course, you will be able to use seaborn functions to visualize your data in several different formats and customize seaborn plots for your unique needs.
Introduction to the Seaborn library and where it fits in the Python visualization landscape.
Overview of functions for customizing the display of Seaborn plots.
Overview of more complex plot types included in Seaborn.
Using Seaborn to draw multiple plots in a single figure.",[],"['Chris Moffitt', 'Kara Woo', 'Becca Robins', 'Sara Snell']","[('US Housing and Urban Development FY 2018 Fair Market Rent', 'https://assets.datacamp.com/production/repositories/2210/datasets/a1fb97d60bfbcf0661e320a35a4615f4e8661a68/FY18_4050_FMRs.csv'), ('Washington DC Bike Share', 'https://assets.datacamp.com/production/repositories/2210/datasets/fb4f2c1039e3df2c2e2624a8c95de5a1980861c6/bike_share.csv'), ('2018 College Scorecard Tuition', 'https://assets.datacamp.com/production/repositories/2210/datasets/794e0759b73a2d80baa5d8fb88636a47965139d3/college_datav3.csv'), ('Daily Show Guests', 'https://assets.datacamp.com/production/repositories/2210/datasets/4eead0f82a80136cdc0068cfb54b97fe47c23c15/daily_show_guests_cleaned.csv'), ('Automobile Insurance Premiums', 'https://assets.datacamp.com/production/repositories/2210/datasets/1a8176dc594fc0a13a9f1a7b207d30ed312f2e4a/insurance_premiums.csv'), ('2010 US School Improvement Grants', 'https://assets.datacamp.com/production/repositories/2210/datasets/205443d734f177d36dad2f0bdf821a57b2c4cc13/schoolimprovement2010grants.csv')]","['pandas Foundations', 'Introduction to Python']",https://www.datacamp.com/courses/data-visualization-with-seaborn,Data Visualization,Python
65,Data Visualization with ggplot2 (Part 1),5,14,62,"122,997","5,250",Data Visualization ggplot2,"Data Visualization with ggplot2 (Part 1)
The ability to produce meaningful and beautiful data visualizations is an essential part of your skill set as a data scientist. This course, the first R data visualization tutorial in the series, introduces you to the principles of good visualizations and the grammar of graphics plotting concepts implemented in the ggplot2 package. ggplot2 has become the go-to tool for flexible and professional plots in R. Here, we’ll examine the first three essential layers for making a plot - Data, Aesthetics and Geometries. By the end of the course you will be able to make complex exploratory plots.
In this chapter we’ll get you into the right frame of mind for developing meaningful visualizations with R. You’ll understand that as a communications tool, visualizations require you to think about your audience first. You’ll also be introduced to the basics of ggplot2 - the 7 different grammatical elements (layers) and aesthetic mappings.
The structure of your data will dictate how you construct plots in ggplot2. In this chapter, you’ll explore the iris dataset from several different perspectives to showcase this concept. You’ll see that making your data conform to a structure that matches the plot in mind will make the task of visualization much easier through several R data visualization examples.
Aesthetic mappings are the cornerstone of the grammar of graphics plotting concept. This is where the magic happens - converting continuous and categorical data into visual scales that provide access to a large amount of information in a very short time. In this chapter you’ll understand how to choose the best aesthetic mappings for your data.
A plot’s geometry dictates what visual elements will be used. In this chapter, we’ll familiarize you with the geometries used in the three most common plot types you’ll encounter - scatter plots, bar charts and line plots. We’ll look at a variety of different ways to construct these plots.
In this chapter you'll learn about qplot; it is a quick and dirty form of ggplot2. It’s not as intuitive as the full-fledged ggplot() function but may be useful in specific instances. This chapter also features a wrap-up video and corresponding data visualization exercises.","['Data Analyst with R', 'Data Scientist with R', 'Data Visualization with R']","['Rick Scavetta', 'Vincent Vankrunkelsven', 'Filip Schouwenaars']","[('Subset of 1,000 diamonds', 'https://assets.datacamp.com/production/repositories/236/datasets/20c77eaab1d045693bdc3e6b3c9e72ad2db53746/diamonds.RData'), ('Fish datasets', 'https://assets.datacamp.com/production/repositories/236/datasets/eb4457a6db78d48de3720bb10b47e5c740a21234/fish.RData'), ('Iris datasets', 'https://assets.datacamp.com/production/repositories/236/datasets/7f714f993f1ad4c3d26412ae1e537ce6355b1b54/iris.RData'), ('Recession', 'https://assets.datacamp.com/production/repositories/236/datasets/9f738e79062e6a207c3981533c3cab060f348ebd/recess.RData')]","['Introduction to R', 'Intermediate R']",https://www.datacamp.com/courses/data-visualization-with-ggplot2-1,Data Visualization,R
66,Data Visualization with ggplot2 (Part 2),5,11,55,"45,129","4,750",Data Visualization ggplot2,"Data Visualization with ggplot2 (Part 2)
This ggplot2 tutorial builds on your knowledge from the first course to produce meaningful explanatory plots. We'll explore the last four optional layers. Statistics will be calculated on the fly and we’ll see how Coordinates and Facets aid in communication. Publication quality plots will be produced directly in R using the Themes layer. We’ll also discuss details on data visualization best practices with ggplot2 to help make sure you have a sound understanding of what works and why. By the end of the course, you’ll have all the tools needed to make a custom plotting function to explore a large data set, combining statistics and excellent visuals.
In this chapter, we’ll delve into how to use R ggplot2 as a tool for graphical data analysis, progressing from just plotting data to applying a variety of statistical methods. This includes a variety of linear models, descriptive and inferential statistics (mean, standard deviation and confidence intervals) and custom functions.
The Coordinates and Facets layers offer specific and very useful tools for efficiently and accurately communicating data. In this chapter we’ll look at the various ways of effectively using these two layers.
Now that you’ve built high-quality plots, it’s time to make them pretty. This is the last step in the data viz process. The Themes layer will enable you to make publication quality plots directly in R.
Once you have the technical skill to make great visualizations, it’s important that you make them as meaningful as possible. In this chapter we’ll go over three plot types that are mostly discouraged in the data viz community - heat maps, pie charts and dynamite plots. We’ll understand what the problems are with these plots and what the alternatives are.
In this case study, we’ll explore the large, publicly available California Health Interview Survey dataset from 2009. We’ll go step-by-step through the development of a new plotting method - a mosaic plot - combining statistics and flexible visuals. At the end, we’ll generalize our new plotting method to use on a variety of datasets we’ve seen throughout the first two courses.",['Data Visualization with R'],"['Rick Scavetta', 'Vincent Vankrunkelsven', 'Filip Schouwenaars']","[('CHIS adult-response dataset, 2009', 'https://assets.datacamp.com/production/repositories/235/datasets/3b6fc2923b599058584b57d8c605c6bef454d273/CHIS2009_reduced_2.Rdata')]","['Introduction to R', 'Intermediate R', 'Data Visualization with ggplot2 (Part 1)']",https://www.datacamp.com/courses/data-visualization-with-ggplot2-2,Data Visualization,R
67,Data Visualization with ggplot2 (Part 3),6,19,86,"12,529","7,550",Data Visualization ggplot2 (Part 3),"Data Visualization with ggplot2 (Part 3)
In this third ggplot2 course, we'll dive into some advanced topics including geoms commonly used in maths and sciences, strategies for handling large data sets, a variety of specialty plots, and some useful features of ggplot2 internals.
Actually, all the plots you've explored in the first two ggplot2 courses can be considered 'statistical plots'. Here, however, you'll consider those that are intended for a specialist audience that is familiar with the data: box plots and density plots.
In this chapter, you'll explore useful specialty plots for specific data types such as ternary plots, networks and maps. You'll also look at how to use ggplot2 to convert typical base package plots that are used to evaluate the results of statistical methods. Finally, you'll take a look at a couple ways in which you can make and appropriately use animations.
In this chapter, we'll continue our discussion of plots for specific data types by diving into the world of maps. You'll also have a look at animations to make your data come to life!
In this chapter, you'll delve into ggplot2 internals, exploring the grid package and ggproto. You'll learn how to use these tools to create unique plots.
In this chapter, you'll draw on some of the many tools for effective data visualization that we've covered over the three ggplot2 courses and combine them with some data munging techniques.",[],"['Rick Scavetta', 'Filip Schouwenaars']","[('Movies (subset of 10000 observations)', 'https://assets.datacamp.com/production/repositories/414/datasets/a8e67e7190bc3a7ddc7a34a76bdef0fe136adcfb/ch1_movies_small.RDS'), ('Test datasets', 'https://assets.datacamp.com/production/repositories/414/datasets/9f0326fb6c2c53d97b49e8977c1d7126ca3d9586/test_datasets.RData'), ('Mammals', 'https://assets.datacamp.com/production/repositories/414/datasets/26c594b09095fc5e29b28b74b1faf48fa63cdc62/mammals.RDS'), ('Africa', 'https://assets.datacamp.com/production/repositories/414/datasets/8eaf914265a420d7e240bde1ba9e949a4498e5bb/africa.RData'), ('US Cities', 'https://assets.datacamp.com/production/repositories/414/datasets/24739149e0dbdbdc84dcbf275b68616cb2481005/US_Cities.txt'), ('US States', 'https://assets.datacamp.com/production/repositories/414/datasets/7eef36579d107fefbcb38d0c314c963e608c9609/US_States.txt'), ('Germany unemployment data', 'https://assets.datacamp.com/production/repositories/414/datasets/bdedafb52d7060a90f9bf320cf11a274ce02bcfd/germany_unemployment.txt'), ('Population of Japan', 'https://assets.datacamp.com/production/repositories/414/datasets/f2efc9d1f2f07a22843aabef510094c6e5474616/japanPOP.txt'), ('Shape files', 'https://assets.datacamp.com/production/repositories/414/datasets/1e3d8c75d1c8085a0ed893a4a5b4f49e3311fde2/shape_files.zip'), ('Paris weather data', 'https://assets.datacamp.com/production/repositories/414/datasets/df61e885cc58b88db51968a13ca7827897b098e8/FRPARIS.txt'), ('Reykavik weather data', 'https://assets.datacamp.com/production/repositories/414/datasets/45d984ccc4d2afa2023b7139824116040aac3a54/ILREYKJV.txt'), ('New York weather data', 'https://assets.datacamp.com/production/repositories/414/datasets/c37fafe15bfa05a338f8c835e79ee5e242400438/NYNEWYOR.txt'), ('London weather data', 'https://assets.datacamp.com/production/repositories/414/datasets/89250a654c2f83331a90e6538f89d501aa966181/UKLONDON.txt')]","['Introduction to R', 'Intermediate R', 'Data Visualization with ggplot2 (Part 1)', 'Data Visualization with ggplot2 (Part 2)']",https://www.datacamp.com/courses/data-visualization-with-ggplot2-part-3,Data Visualization,R
68,Data-Driven Decision Making in SQL,4,15,54,"3,801","4,550",Data-Driven Decision Making in SQL,"Data-Driven Decision Making in SQL
In this course, you will learn how to use SQL to support decision making. It is based on a case study about an online movie rental company with a database about customer information, movie ratings, background information on actors and more. You will learn to apply SQL queries to study for example customer preferences, customer engagement, and sales development. This course also covers SQL extensions for online analytical processing (OLAP), which makes it easier to obtain key insights from multidimensional aggregated data.
The first chapter is an introduction to the use case of an online movie rental company, called MovieNow and focuses on using simple SQL queries to extract and aggregated data from its database.
More complex queries with GROUP BY, LEFT JOIN and sub-queries are used to gain insight into customer preferences.
The concept of nested queries and correlated nested queries is introduced and the functions EXISTS and UNION  are used to categorize customers, movies, actors, and more.
The OLAP extensions in SQL are introduced and applied to aggregated data on multiple levels.  These extensions are the CUBE, ROLLUP and GROUPING SETS operators.",[],"['Irene Ortner', 'Tim Verdonck', 'Bart Baesens', 'Hadrien Lacroix', 'Mona Khalil']","[('MovieNow', 'https://assets.datacamp.com/production/repositories/4068/datasets/6abeae4810d472a18df091e19ed36373ebed410e/MovieNow.sql')]",['Intermediate SQL'],https://www.datacamp.com/courses/data-driven-decision-making-with-sql,Reporting,SQL
69,Dealing With Missing Data in R,4,14,52,"2,958","4,350",Dealing With Missing Data in R,"Dealing With Missing Data in R
Missing data is part of any real world data analysis. It can crop up in unexpected places, making analyses challenging to understand. In this course, you will learn how to use tidyverse tools and the naniar R package to visualize missing values. You'll tidy missing values so they can be used in analysis and explore missing values to find bias in the data. Lastly, you'll reveal other underlying patterns of missingness. You will also learn how to ""fill in the blanks"" of missing values with imputation models, and how to visualize, assess, and make decisions based on these imputed datasets.
Chapter 1 introduces you to missing data, explaining what missing values are, their behavior in R, how to detect them, and how to count them. We then introduce missing data summaries and how to summarise missingness across cases, variables, and how to explore across groups within the data. Finally, we discuss missing data visualizations, how to produce overview visualizations for the entire dataset and over variables, cases, and other summaries, and how to explore these across groups.
In chapter two, you will learn how to uncover hidden missing values like ""missing"" or ""N/A"" and replace them with `NA`. You will learn how to efficiently handle implicit missing values - those values implied to be missing, but not explicitly listed. We also cover how to explore missing data dependence, discussing Missing Completely at Random (MCAR), Missing At Random (MAR), Missing Not At Random (MNAR), and what they mean for your data analysis.
In this chapter, you will learn about workflows for working with missing data. We introduce special data structures, the shadow matrix, and nabular data, and demonstrate how to use them in workflows for exploring missing data so that you can link summaries of missingness back to values in the data. You will learn how to use ggplot to explore and visualize how values changes as other variables go missing. Finally, you learn how to visualize missingness across two variables, and how and why to visualize missings in a scatterplot.
In this chapter, you will learn about filling in the missing values in your data, which is called imputation. You will learn how to impute and track missing values, and what the good and bad features of imputations are so that you can explore, visualise, and evaluate the imputed data against the original values. You will learn how to use, evaluate, and compare different imputation models, and explore how different imputation models affect the inferences you can draw from the models.",['Intermediate Tidyverse Toolbox'],"['Nicholas Tierney', 'David Campos', 'Shon Inouye', 'Chester Ismay', 'Sascha Mayr']",[],"['Introduction to R', 'Introduction to the Tidyverse']",https://www.datacamp.com/courses/dealing-with-missing-data-in-r,Importing & Cleaning Data,R
70,Dealing with Missing Data in Python,4,14,46,103,"3,800",Dealing Missing Data,"Dealing with Missing Data in Python
Tired of working with messy data? Did you know that most of a data scientist's time is spent in finding, cleaning and reorganizing data?! Well turns out you can clean your data in a smart way! In this course Dealing with Missing Data in Python, you'll do just that!  You'll learn to address missing values for numerical, and categorical data as well as time-series data. You'll learn to see the patterns the missing data exhibits! While working with air quality and diabetes data, you'll also learn to analyze, impute and evaluate the effects of imputing the data.
Get familiar with missing data and how it impacts your analysis! Learn about different null value operations in your dataset, how to find missing data and summarizing missingness in your data.
Analyzing the type of missingness in your dataset is a very important step towards treating missing values. In this chapter, you'll learn in detail how to establish patterns in your missing and non-missing data, and how to appropriately treat the missingness using simple techniques such as listwise deletion.
Embark on the world of data imputation! In this chapter, you will apply basic imputation techniques to fill in missing data and visualize your imputations to be able to evaluate your imputations' performance.
Finally, go beyond simple imputation techniques and make the most of your dataset by using advanced imputation techniques that rely on machine learning models, to be able to accurately impute and evaluate your missing data. You will be using methods such as KNN and MICE in order to get the most out of your missing data!",[],"['Suraj Donthi', 'Adel Nehme']","[('Diabetes', 'https://assets.datacamp.com/production/repositories/4584/datasets/459359643874ba6411189d4a5251204b6142dd7d/pima-indians-diabetes data.csv'), ('Air Quality', 'https://assets.datacamp.com/production/repositories/4584/datasets/2f7155541f430d0e0e94d30a0becabcdc7dabded/air-quality.csv')]","['pandas Foundations', 'Supervised Learning with scikit-learn', 'Introduction to Data Visualization with Python']",https://www.datacamp.com/courses/dealing-with-missing-data-in-python,Data Manipulation,Python
71,Deep Learning in Python,4,17,50,"133,108","3,500",Deep Learning,"Deep Learning in Python
Deep learning is the machine learning technique behind the most exciting capabilities in diverse areas like robotics, natural language processing, image recognition, and artificial intelligence, including the famous AlphaGo. In this course, you'll gain hands-on, practical knowledge of how to use deep learning with Keras 2.0, the latest version of a cutting-edge library for deep learning in Python.
In this chapter, you'll become familiar with the fundamental concepts and terminology used in deep learning, and understand why deep learning techniques are so powerful today. You'll build simple neural networks and generate predictions with them.
Learn how to optimize the predictions generated by your neural networks. You'll use a method called backward propagation, which is one of the most important techniques in deep learning. Understanding how it works will give you a strong foundation to build on in the second half of the course.
In this chapter, you'll use the Keras library to build deep learning models for both regression and classification. You'll learn about the Specify-Compile-Fit workflow that you can use to make predictions, and by the end of the chapter, you'll have all the tools necessary to build deep neural networks.
Learn how to optimize your deep learning models in Keras. Start by learning how to validate your models, then understand the concept of model capacity, and finally, experiment with wider and deeper networks.","['Data Scientist with Python', 'Machine Learning with Python']","['Dan Becker', 'Hugo Bowne-Anderson', 'Yashas Roy']","[('Hourly wages', 'https://assets.datacamp.com/production/repositories/654/datasets/8a57adcdb5bfb3e603dad7d3c61682dfe63082b8/hourly_wages.csv'), ('MNIST', 'https://assets.datacamp.com/production/repositories/654/datasets/24769dae9dc51a77b9baa785d42ea42e3f8f7538/mnist.csv'), ('Titanic', 'https://assets.datacamp.com/production/repositories/654/datasets/92b75b9bc0c0a8a30999d76f4a1ee786ef072a9c/titanic_all_numeric.csv')]","['Introduction to Python', 'Intermediate Python for Data Science', 'Supervised Learning with scikit-learn']",https://www.datacamp.com/courses/deep-learning-in-python,Machine Learning,Python
72,Deep Learning with Keras in Python,4,15,59,"1,425","4,950",Deep Learning Keras,"Deep Learning with Keras in Python
Deep learning is here to stay! It's the go-to technique to solve complex problems that arise with unstructured data and an incredible tool for innovation. Keras is one of the frameworks that make it easier to start developing deep learning models, and it's versatile enough to build industry-ready models in no time. In this course, you will learn regression and save the earth by predicting asteroid trajectories, apply binary classification to distinguish between real and fake dollar bills, use multiclass classification to decide who threw which dart at a dart board, learn to use neural networks to reconstruct noisy images and much more. Additionally, you will learn how to better control your models during training and how to tune them to boost their performance.
In this first chapter, you will get introduced to neural networks, understand what kind of problems they can solve, and when to use them. You will also build several networks and save the earth by training a regression model that approximates the orbit of a meteor that is approaching us!
By the end of this chapter, you will know how to solve binary, multi-class, and multi-label problems with neural networks. All of this by solving problems like detecting fake dollar bills, deciding who threw which dart at a board, and building an intelligent system to water your farm. You will also be able to plot model training metrics and to stop training and save your models when they no longer improve.
In the previous chapters, you've trained a lot of models! You will now learn how to interpret learning curves to understand your models as they train. You will also visualize the effects of activation functions, batch-sizes, and batch-normalization. Finally, you will learn how to perform automatic hyperparameter optimization to your Keras models using sklearn.
It's time to get introduced to more advanced architectures! You will create an autoencoder to reconstruct noisy images, visualize convolutional neural network activations, use deep pre-trained models to classify images and learn more about recurrent neural networks and working with text as you build a network that predicts the next word in a sentence.",[],"['Miguel Esteban', 'Hillary Green-Lerman', 'Sara Billen']","[('Darts', 'https://assets.datacamp.com/production/repositories/4335/datasets/a6f91a00c922a4fa7204787a583461831437d647/darts.csv'), ('Banknotes', 'https://assets.datacamp.com/production/repositories/4335/datasets/40eb98aaa7c03af87689d363a3e08ab59e38077c/banknotes.csv'), ('MNIST', 'https://assets.datacamp.com/production/repositories/4335/datasets/1c42fb4e5245742f7c3ed188682e2f7e2275f459/MNIST.zip'), ('Irrigation Machine', 'https://assets.datacamp.com/production/repositories/4335/datasets/e8e07e4d8969b5fb8f1d2eae9615feaa2ff5f319/irrigation_machine.csv'), ('Digits', 'https://assets.datacamp.com/production/repositories/4335/datasets/01772a23927623e41fbaaab2ab456e00ba4fcb92/Digits.zip')]",['Supervised Learning with scikit-learn'],https://www.datacamp.com/courses/deep-learning-with-keras-in-python,Machine Learning,Python
73,Deep Learning with PyTorch,4,17,53,"1,522","4,300",Deep Learning PyTorch,"Deep Learning with PyTorch
Neural networks have been at the forefront of Artificial Intelligence research during the last few years, and have provided solutions to many difficult problems like image classification, language translation or Alpha Go. PyTorch is one of the leading deep learning frameworks, being at the same time both powerful and easy to use. In this course you will use PyTorch to first learn about the basic concepts of neural networks, before building your first neural network to predict digits from MNIST dataset. You will then learn about convolutional neural networks, and use them to build much more powerful models which give more accurate results. You will evaluate the results and use different techniques to improve them. Following the course, you will be able to delve deeper into neural networks and start your career in this fascinating field.
In this first chapter, we introduce basic concepts of neural networks and deep learning using PyTorch library.
In this second chapter, we delve deeper into Artificial Neural Networks, learning how to train them with real datasets.
In this third chapter, we introduce convolutional neural networks, learning how to train them and how to use them to make predictions.
In this last chapter, we learn how to make neural networks work well in practice, using concepts like regularization, batch-normalization and transfer learning.",[],"['Ismail Elezi', 'Hadrien Lacroix', 'Hillary Green-Lerman']",[],"['Supervised Learning with scikit-learn', 'Object-Oriented Programming in Python']",https://www.datacamp.com/courses/deep-learning-with-pytorch,Machine Learning,Python
74,Defensive R Programming,4,16,51,861,"3,400",Defensive R Programming,"Defensive R Programming
Writing R scripts is easy. Writing good R code is hard. In this course, we'll discuss defensive programming - a set of standard techniques that will help reduce bugs and aid working in teams. We examine techniques for avoiding common errors and also how to handle the inevitable error that arises in our code. The course will conclude looking at when to make the transition from script to project to package.
In this first chapter, you'll learn what defensive programming is,  and how to use existing packages for increased efficiency. You will then learn to manage the packages loaded in your environment and the potential conflicts that may arise.
Programming is simpler when you get feedback on your code execution. In R, we use messages, warnings and errors to signal to keep the user informed. This chapter will discuss when and where you should use these communication tools.
We can avoid making mistakes using a consistent programming approach. In this chapter, we will introduce you to R best practices.
Creating a script is nice, but working on a project with several scripts and assets requires structure. This final chapter will teach you good organization practices, so you can go from script to package with an optimal workflow.",[],"['Colin Gillespie', 'Hadrien Lacroix', 'Sascha Mayr']",[],['Intermediate R'],https://www.datacamp.com/courses/defensive-r-programming,Programming,R
75,Designing Machine Learning Workflows in Python,4,16,51,"1,873","4,200",Designing Machine Learning Workflows,"Designing Machine Learning Workflows in Python
Deploying machine learning models in production seems easy with modern tools, but often ends in disappointment as the model performs worse in production than in development. This course will give you four superpowers that will make you stand out from the data science crowd and build pipelines that stand the test of time: how to exhaustively tune every aspect of your model in development; how to make the best possible use of available domain expertise; how to monitor your model in performance and deal with any performance deterioration; and finally how to deal with poorly or scarcely labelled data. Digging deep into the cutting edge of sklearn, and dealing with real-life datasets from hot areas like personalized healthcare and cybersecurity, this course reveals a view of machine learning from the frontline.
In this chapter, you will be reminded of the basics of a supervised learning workflow, complete with model fitting, tuning and selection, feature engineering and selection, and data splitting techniques. You will understand how these steps in a workflow depend on each other, and recognize how they can all contribute to, or fight against overfitting: the data scientist's worst enemy. By the end of the chapter, you will already be fluent in supervised learning, and ready to take the dive towards more advanced material in later chapters.
In the previous chapter, you perfected your knowledge of the standard supervised learning workflows. In this chapter, you will critically examine the ways in which expert knowledge is incorporated in supervised learning. This is done through the identification of the appropriate unit of analysis which might require feature engineering across multiple data sources, through the sometimes imperfect process of labeling examples, and through the specification of a loss function that captures the true business value of errors made by your machine learning model.
In the previous chapter, you employed different ways of incorporating feedback from experts in your workflow, and evaluating it in ways that are aligned with business value. Now it is time for you to practice the skills needed to productize your model and ensure it continues to perform well thereafter by iteratively improving it. You will also learn to diagnose dataset shift and mitigate the effect that a changing environment can have on your model's accuracy.
In the previous chapters you established a solid foundation in supervised learning, complete with knowledge of deploying models in production but always assumed you a labeled dataset would be available for your analysis. In this chapter, you take on the challenge of modeling data without any, or with very few, labels. This takes you into a journey into anomaly detection, a kind of unsupervised modeling, as well as distance-based learning, where beliefs about what constitutes similarity between two examples can be used in place of labels to help you achieve levels of accuracy comparable to a supervised workflow. Upon completing this chapter, you will clearly stand out from the crowd of data scientists in confidently knowing what tools to use to modify your workflow in order to overcome common real-world challenges.",[],"['Christoforos Anagnostopoulos', 'Chester Ismay', 'Sara Billen']","[('Credit', 'https://assets.datacamp.com/production/repositories/3554/datasets/e02f7e59fc8b6cbd9fc7032fe595038f4171ef16/credit.csv'), ('Flows', 'https://assets.datacamp.com/production/repositories/3554/datasets/18a574bfeef99241c2fe45db6314fdaeb4b288fe/lanl_flows.csv'), ('Attacks', 'https://assets.datacamp.com/production/repositories/3554/datasets/e52ee3093aee49f8599dd30dc0acada35cbc2873/redteam.csv'), ('Hepatitis', 'https://assets.datacamp.com/production/repositories/3554/datasets/7a8662884e2157642c3eb287bee39346040c8bef/hep.csv'), ('Proteins', 'https://assets.datacamp.com/production/repositories/3554/datasets/76399b36f4b8a83a3a441f39cf1cc1171171db5c/proteins_exercises.csv'), ('Arrhythmia', 'https://assets.datacamp.com/production/repositories/3554/datasets/eb59119dbc87d95d89b446b825cb38854a59411e/arrh.csv')]","['Python Data Science Toolbox (Part 2)', 'Supervised Learning with scikit-learn', 'Unsupervised Learning in Python']",https://www.datacamp.com/courses/designing-machine-learning-workflows-in-python,Machine Learning,Python
76,Designing and Analyzing Clinical Trials in R,4,15,48,"1,482","4,000",Designing and Analyzing Clinical Trials in R,"Designing and Analyzing Clinical Trials in R
Clinical trials are scientific experiments that are conducted to assess whether treatments are effective and safe. They are used by a variety of organizations, including pharmaceutical companies for drug development. Biostatisticians play a key role in ensuring the success of a clinical trial. In this course you will gain an overview of the important principles and a practical introduction to commonly used statistical analyses. This course would be valuable for data analysts, medical students, clinicians, medical researchers and others interested in learning about the design and analysis of clinical trials.
In this chapter you will be introduced to the important principles of clinical trials.
In this chapter you will be introduced to randomization methods and different types of trial designs.
By the end of this chapter you will be able to calculate the numbers of patients needed for a clinical trial under a range of scenarios.
In this chapter we will explore additional statistical techniques that are commonly used to analyze data from clinical trials.",[],"['Tamuno Alfred', 'Sascha Mayr', 'David Campos', 'Shon Inouye']","[('Acupuncture dataset', 'https://assets.datacamp.com/production/repositories/1956/datasets/4e5e58dcff952229111ee184bb8a1823f6fa3c7a/Ex1_1_1.Rds'), ('Fact dataset', 'https://assets.datacamp.com/production/repositories/1956/datasets/b23c39b7e793e05d89bc33becd78f0d858287b2b/fact.data.Rds'), ('PK dataset', 'https://assets.datacamp.com/production/repositories/1956/datasets/5d31c5d48a9384dd5dd401e018af6fb452476aaf/PKData.Rds')]","['Introduction to R', 'Introduction to Data', 'Exploratory Data Analysis']",https://www.datacamp.com/courses/designing-and-analyzing-clinical-trials-in-r,Case Studies,R
77,Developing R Packages,4,16,56,"1,946","4,200",Developing R Packages,"Developing R Packages
In this course, you will learn the end-to-end process for creating an R package from scratch.  You will start off by  creating the basic structure for your package, and adding in important details like functions and metadata.  Once the basic components of your package are in place, you will learn about how to document your package, and why this is important for creating quality packages that other people - as well as your future self - can use with ease.  Once you have created the components of your package, you will learn how to test they work properly, by creating tests, running checks, and building your package.  By the end of this course you can expect to have all the necessary skills to create and share your own R packages.
In this chapter, you will learn the basics of creating an R package.  You will learn about the structure of R packages, set up a package, and write a function and include it in your package.  You will also learn about the metadata stored in the DESCRIPTION and NAMESPACE files.
In this chapter, you will learn how to document your package.  You will learn why documentation is important, and how to provide documentation for your package, its functions, and other components.  You will also learn about what it means to export a function and how to implement this in your package.
In this chapter, you will learn about how to run checks to ensure that your R package is correctly structured and can be installed.  You will learn how to correct common problems, and get your package ready to be built so it can be shared with others.
In the final chapter, you will learn how to add tests to your package to ensure your code runs as expected if the package is updated or changes. You will look at how to test functions to ensure they produce expected values, and also how to test for other aspects of functionality such as expected errors.  Once you've written tests for your functions, you'll finally learn how to run your tests and what to do in the case of a failing test.",[],"['Aimee Gott', 'Nic Crane', 'Richie Cotton', 'Sumedh Panchadhar', 'Eunkyung Park']",[],['Writing Functions in R'],https://www.datacamp.com/courses/developing-r-packages,Programming,R
78,Differential Expression Analysis in R with limma,4,15,47,"1,477","3,900",Differential Expression Analysis in R limma,"Differential Expression Analysis in R with limma
Functional genomic technologies like microarrays, sequencing, and mass spectrometry enable scientists to gather unbiased measurements of gene expression levels on a genome-wide scale. Whether you are generating your own data or want to explore the large number of publicly available data sets, you will first need to learn how to analyze these types of experiments. In this course, you will be taught how to use the versatile R/Bioconductor package limma to perform a differential expression analysis on the most common experimental designs. Furthermore, you will learn how to pre-process the data, identify and correct for batch effects, visually assess the results, and perform enrichment testing. After completing this course, you will have general analysis strategies for gaining insight from any functional genomics study.
To begin, you'll review the goals of differential expression analysis, manage gene expression data using R and Bioconductor, and run your first differential expression analysis with limma.
In this chapter, you'll learn how to construct linear models to test for differential expression for common experimental designs.
Now that you've learned how to perform differential expression tests, next you'll learn how to normalize and filter the feature data, check for technical batch effects, and assess the results.
In this final chapter, you'll use your new skills to perform an end-to-end differential expression analysis of a study that uses a factorial design to assess the impact of the cancer drug doxorubicin on the hearts of mice with different genetic backgrounds.",[],"['John Blischak', 'Richie Cotton', 'David Campos', 'Shon Inouye']","[('Doxorubicin dataset', 'https://assets.datacamp.com/production/repositories/1626/datasets/bb773b0ece1e325dc23933f8e492ef4d1a17cddd/dox.rds'), ('Leukemia dataset', 'https://assets.datacamp.com/production/repositories/1626/datasets/0decef2850200efcf87b107b080959b31ec681ba/cll-eset.rds'), ('Hypoxia dataset', 'https://assets.datacamp.com/production/repositories/1626/datasets/db8dbd1c9889333384a3a78a30c745b4251e6c06/stem-eset.rds')]","['Introduction to R', 'Introduction to Data']",https://www.datacamp.com/courses/differential-expression-analysis-in-r-with-limma,Other,R
79,Dimensionality Reduction in Python,4,16,58,"2,283","4,700",Dimensionality Reduction,"Dimensionality Reduction in Python
High-dimensional datasets can be overwhelming and leave you not knowing where to start. Typically, you’d visually explore a new dataset first, but when you have too many dimensions the classical approaches will seem insufficient. Fortunately, there are visualization techniques designed specifically for high dimensional data and you’ll be introduced to these in this course. After exploring the data, you’ll often find that many features hold little information because they don’t show any variance or because they are duplicates of other features. You’ll learn how to detect these features and drop them from the dataset so that you can focus on the informative ones. In a next step, you might want to build a model on these features, and it may turn out that some don’t have any effect on the thing you’re trying to predict. You’ll learn how to detect and drop these irrelevant features too, in order to reduce dimensionality and thus complexity. Finally, you’ll learn how feature extraction techniques can reduce dimensionality for you through the calculation of uncorrelated principal components.
You'll be introduced to the concept of dimensionality reduction and will learn when an why this is important. You'll learn the difference between feature selection and feature extraction and will apply both techniques for data exploration.  The chapter ends with a lesson on t-SNE, a powerful feature extraction technique that will allow you to visualize a high-dimensional dataset.
In this first out of two chapters on feature selection, you'll learn about the curse of dimensionality and how dimensionality reduction can help you overcome it. You'll be introduced to a number of techniques to detect and remove features that bring little added value to the dataset. Either because they have little variance, too many missing values, or because they are strongly correlated to other features.
In this second chapter on feature selection, you'll learn how to let models help you find the most important features in a dataset for predicting a particular target feature. In the final lesson of this chapter, you'll combine the advice of multiple, different, models to decide on which features are worth keeping.
This chapter is a deep-dive on the most frequently used dimensionality reduction algorithm, Principal Component Analysis (PCA). You'll build intuition on how and why this algorithm is so powerful and will apply it both for data exploration and data pre-processing in a modeling pipeline.  You'll end with a cool image compression use case.",[],"['Jeroen Boeye', 'Aleksandra Vercauteren', 'Hadrien Lacroix', 'Hillary Green-Lerman', 'Chester Ismay']","[('ANSUR Female', 'https://assets.datacamp.com/production/repositories/3515/datasets/802fc5cdbe3a29248483e496a966627ea9629e7a/ANSUR_II_FEMALE.csv'), ('ANSUR Male', 'https://assets.datacamp.com/production/repositories/3515/datasets/28edd853c0a6aa7316b0d84a21f8e0d821e5010d/ANSUR_II_MALE.csv'), ('Diabetes', 'https://assets.datacamp.com/production/repositories/3515/datasets/87ced33d5371cdc13f9301ecb99ead36a63c8197/PimaIndians.csv'), ('Grocery store sales', 'https://assets.datacamp.com/production/repositories/3515/datasets/236dfa1d124bf01147dd5b3da595066fcf84a1a4/grocery_sales.csv'), ('Boston Public Schools', 'https://assets.datacamp.com/production/repositories/3515/datasets/8d23ca278dcc6c6b59629a47e1474afd93ad960c/Public_Schools2.csv'), ('Pokemon', 'https://assets.datacamp.com/production/repositories/3515/datasets/9b0682ecacc5a3429f62947794d1adbeecbd5a11/pokemon.csv')]",['Supervised Learning with scikit-learn'],https://www.datacamp.com/courses/dimensionality-reduction-in-python,Machine Learning,Python
80,Dimensionality Reduction in R,4,14,46,"2,141","3,450",Dimensionality Reduction in R,"Dimensionality Reduction in R
Real-world datasets often include values for dozens, hundreds, or even thousands of variables. Our minds cannot efficiently process such high-dimensional datasets to come up with useful, actionable insights. How do you deal with these multi-dimensional swarms of data points? How do you uncover and visualize hidden patterns in the data? In this course, you'll learn how to answer these questions by mastering three fundamental dimensionality reduction techniques - Principal component analysis (PCA), non-negative matrix factorisation (NNMF), and exploratory factor analysis (EFA).
As a data scientist, you'll frequently have to deal with messy and high-dimensional datasets. In this chapter, you'll learn how to use Principal Component Analysis (PCA) to effectively reduce the dimensionality of such datasets so that it becomes easier to extract actionable insights from them.
Here, you'll build on your knowledge of PCA by tackling more advanced applications, such as dealing with missing data. You'll also become familiar with another essential dimensionality reduction technique called Non-negative matrix factorization (NNMF) and how to use it in R.
Become familiar with exploratory factor analysis (EFA), another dimensionality reduction technique that is a natural extension to PCA.
Round out your mastery of dimensionality reduction in R by extending your knowledge of EFA to cover more advanced applications.",[],"['Alexandros Tantos', 'Yashas Roy', 'Richie Cotton', 'Benjamin  Feder']","[('BBC dataset', 'https://assets.datacamp.com/production/course_4249/datasets/bbc_res.rds'), ('Humor Styles Questionnaire dataset', 'https://assets.datacamp.com/production/course_4249/datasets/humor_dataset.csv'), ('Short Dark Triad dataset', 'https://assets.datacamp.com/production/course_4249/datasets/SD3.RDS')]",['Unsupervised Learning in R'],https://www.datacamp.com/courses/dimensionality-reduction-in-r,Machine Learning,R
81,Ensemble Methods in Python,4,15,52,850,"4,050",Ensemble Methods,"Ensemble Methods in Python
Continue your machine learning journey by diving into the wonderful world of ensemble learning methods! These are an exciting class of machine learning techniques that combine multiple individual algorithms to boost performance and solve complex problems at scale across different industries. Ensemble techniques regularly win online machine learning
competitions as well! In this course, you’ll learn all about these advanced ensemble techniques, such as bagging, boosting, and stacking. You’ll apply them to real-world datasets using cutting edge Python machine learning libraries such as scikit-learn, XGBoost, CatBoost, and mlxtend.
Do you struggle to determine which of the models you built is the best for your problem? You should give up on that, and use them all instead! In this chapter, you'll learn how to combine multiple models into one using ""Voting"" and ""Averaging"". You'll use these to predict the ratings of apps on the Google Play Store, whether or not a Pokémon is legendary, and which characters are going to die in Game of Thrones!
Bagging is the ensemble method behind powerful machine learning algorithms such as random forests. In this chapter you'll learn the theory behind this technique and build your own bagging models using scikit-learn.
Boosting is class of ensemble learning algorithms that includes award-winning models such as AdaBoost. In this chapter, you'll learn about this award-winning model, and use it to predict the revenue of award-winning movies! You'll also learn about gradient boosting algorithms such as CatBoost and XGBoost.
Get ready to see how things stack up! In this final chapter you'll learn about the stacking ensemble method. You'll learn how to implement it from scratch as well as using the mlxtend library! You'll apply stacking to predict the edibility of North American mushrooms, and revisit the ratings of Google apps with this more advanced approach.",[],"['Román de las Heras', 'Hillary Green-Lerman', 'Yashas Roy']","[('App ratings', 'https://assets.datacamp.com/production/repositories/4024/datasets/f29456ea573c318fa53362fdf91871d0c7849bb2/googleplaystore.csv'), ('App reviews', 'https://assets.datacamp.com/production/repositories/4024/datasets/be1aeb4c05850973c671d689575b6613fd8c8553/googleplaystore_user_reviews.csv'), ('Game of Thrones', 'https://assets.datacamp.com/production/repositories/4024/datasets/02627e1959ac37b28bde9ec9d28400d776dbc123/character-predictions.csv'), ('Pokémon', 'https://assets.datacamp.com/production/repositories/4024/datasets/2dd4cab3c792e2755e7dafe355a14bdb06973c5d/Pokemon.csv'), ('SECOM (Semiconductor Manufacturing)', 'https://assets.datacamp.com/production/repositories/4024/datasets/68204a108133375b21076bdd7cb560d4bb7ce4b8/uci-secom.csv'), ('TMDb (The Movie Database)', 'https://assets.datacamp.com/production/repositories/4024/datasets/f3b1b3b8ee260b447b146f156b9fbc72e51f2131/tmdb_5000_movies.csv')]","['Linear Classifiers in Python', 'Machine Learning with Tree-Based Models in Python']",https://www.datacamp.com/courses/ensemble-methods-in-python,Machine Learning,Python
82,Equity Valuation in R,4,16,58,"3,821","4,750",Equity Valuation in R,"Equity Valuation in R
How do we know when a stock is cheap or expensive? To do this, we need to compare the stock's price with its value. The price of the stock can be obtained by looking at various public sources, such as Yahoo Finance or Google Finance. The value of the stock though is much harder to identify. Every investor has to form his or her valuation of the stock. In this course, you will learn the fundamentals of valuing stocks using present value approaches, such as free cash flow to equity and dividend discount models, and valuation multiples. By the end of this course, you will be able to build your own valuation models.
Many individuals and institutions invest in equities. To do so effectively, the investor must have a solid understanding of how the value of the equity compares to the stock price. In this course, we focus on fundamental concepts of equity valuation. We begin with a discussion of time value of money and then move on to the first of two discounted cash flow methods we will discuss - the free cash flow to equity valuation model.
One of the critical components of free cash flow to equity valuation is using reliable projections. In the first part of this chapter, we will discuss ways to analyze the projections to help us identify the right questions to ask. In the second part of this chapter, we will go through the second of our discounted cash flow models - the dividend discount model. In this approach, we discount expected dividends instead of free cash flows.
To be able to discount cash flows, we need a discount rate. For the free cash flow to equity and dividend discount model, the cost of equity is the appropriate discount rate. In this chapter, we will discuss how each of the components of the cost of equity are calculated.
Relative valuation allows us to use the valuation of comparable companies to infer the value of our subject firm. In this chapter, we discuss how to identify comparable companies and how to calculate valuation multiples. We also show how to analyze the determinants of multiples.
This chapter combines the lessons from Chapters 1 to 4 in a series of exercises. You will be asked to inspect the data and to value the firm using discounted cash flow and relative valuation approaches. At the end, you will combine the results in a summary table.",[],"['Clifford Ang', 'Lore Dirick', 'Sumedh Panchadhar']","[('Historical returns', 'https://assets.datacamp.com/production/repositories/941/datasets/47503ad99e5539567fc9211b5df956a2260f9305/damodaran_histret.rda'), ('US Treasury data', 'https://assets.datacamp.com/production/repositories/941/datasets/338824497e458d5bd1bdd68ca301565eba3c9de7/fred_10yr.rda'), ('S&P 400 Midcap Index', 'https://assets.datacamp.com/production/repositories/941/datasets/552725670438351d1e704e69cb3e566e38fa330e/midcap400.rda'), ('Mylan prices', 'https://assets.datacamp.com/production/repositories/941/datasets/949f5196c681dd0abe9fa8a752c40a4711fa1a75/myl_spy_prices.rda')]","['Introduction to R for Finance', 'Intermediate R for Finance', 'Importing and Managing Financial Data in R']",https://www.datacamp.com/courses/equity-valuation-in-r,Applied Finance,R
83,Error and Uncertainty in Spreadsheets,4,15,61,38,"4,950",Error and Uncertainty in Spreadsheets,"Error and Uncertainty in Spreadsheets
You rely on predictions every day: you might check the weather app before choosing your outfit or peek at the traffic before starting your commute. Perhaps you are responsible for setting your organization’s strategy in the future. Do you find yourself wondering how accurate predictions are, how you can see into the future, and why the weatherman always seems to be wrong? In our Error and Uncertainty course, you’ll make some predictions yourself, learn to distinguish real differences from random noise, and explore psychological crutches we use that interfere with our rational decision making. You will uncover patterns in Seattle crime data, predict students’ final grades, prevent Nashville traffic accidents, and determine whether a bakery’s menu needs to change. Join us! We’re certain you’ll enjoy learning about error and uncertainty.
The first chapter presents common terminology, introduces methods for determining significant differences between groups, and outlines the kinds of error and uncertainty involved. We will specifically look at Seattle crime data and evaluate crime rate differences between precincts and neighborhoods. This chapter will equip learners to identify threats to the validity and accuracy of their conclusions.
The second chapter outlines both rudimentary (e.g., moving average, seasonal average, yearly average) and more complicated methods (e.g., linear regression) for making predictions and outlines the kinds of error and uncertainty involved. We will specifically look at anonymized student grades data and evaluate the accuracy of our predictions for given students. Throughout the chapter, we will identify threats to the validity and accuracy of our predictions.
Chapter 3 encourages learners to test the assumptions of their predictions using data on car crashes. Specifically, they will determine how to allocate resources to reduce injuries and fatalities from auto accidents. Learners will discuss the impact of outliers in prediction accuracy, evaluate the importance of normally distributed data in making predictions, employ consequence-likelihood matrices in risk management, and adapt psychological heuristics to discussions of numerical uncertainty and risk.
The final chapter integrates all the previous lessons into a constructed-world scenario. Learners are tasked with updating the menu at their small business: the Risky Business Bakery. They need to figure out whether to add or drop menu items based on whether there are significant differences in sales by baked good; whether their predicted sales figures from their accountant are accurate.",[],"['Evan Kramer', 'Chester Ismay', 'Becca Robins', 'Ruanne Van Der Walt']","[('Seattle Crime Data', 'https://assets.datacamp.com/production/repositories/4311/datasets/dea1de7f70b77c0dc0bdad4de5154ef4f6d5ceaa/1_seattle_crime.csv'), ('Student Math Scores', 'https://assets.datacamp.com/production/repositories/4311/datasets/036f0e2199b2670d6da2fbe8fa799ce70787ee96/2_math_scores.csv'), ('Risky Business Bakery', 'https://assets.datacamp.com/production/repositories/4311/datasets/a983a410dc7065058970013c8ebdf1963735dd7f/4_bakery_sales.csv')]",['Data Analysis with Spreadsheets'],https://www.datacamp.com/courses/error-and-uncertainty-in-spreadsheets,Probability & Statistics,Spreadsheets
84,Experimental Design in Python,4,16,53,293,"4,400",Experimental Design,"Experimental Design in Python
Data is all around us and can help us to understand many things. Making a pretty graph is great, but how can we tell the difference between a few outliers on a graph and a real, reliable effect? Is a trend that we see on a graph a reliable result or just random chance playing tricks? In this course, you will learn how to interrogate datasets in a rigorous way, giving clear answers to your questions. You will learn a range of statistical tests, how to apply them, how to understand their results, and how to deal with their shortcomings. Along the way, you will explore Olympic athlete data and the differences between populations of continents.
In this chapter, you will learn how to explore your data and ask meaningful questions. Then, you will discover how to answer these question by using your first statistical hypothesis tests: the t-test, the Chi-Square test, the Fisher exact test, and the Pearson correlation test.
In this chapter, you will learn how to examine and multiple factors at once, controlling for the effect of confounding variables and examining interactions between variables. You will learn how to use randomization and blocking to build robust tests and how to use the powerful ANOVA method.
In this chapter, you will focus on ways to avoid drawing false conclusions, whether false positives (type I errors) or false negatives (type II errors). Central to avoiding false negatives is understanding the interplay between sample size, power analysis, and effect size.
In this final chapter, you will examine the assumptions underlying statistical tests and learn about how that influences your experimental design. This will include learning whether a variable follows a normal distribution and when you should use non-parametric statistical tests like the Wilcoxon rank-sum test and the Spearman correlation test.",[],"['Luke Hayden', 'Chester Ismay', 'Amy Peterson']","[('Olympic dataset', 'https://assets.datacamp.com/production/repositories/4371/datasets/8fd0a14bfbc5f13719d92334eaf77b23f2e914d6/olyathswim.csv'), ('UN dataset', 'https://assets.datacamp.com/production/repositories/4371/datasets/f5c1016b818f97ec200236fb161ae711944fb2cb/undata_country_profile_variables.csv')]",['Introduction to Python'],https://www.datacamp.com/courses/experimental-design-in-python,Probability & Statistics,Python
85,Experimental Design in R,4,12,52,"3,890","4,400",Experimental Design in R,"Experimental Design in R
Experimental design is a crucial part of data analysis in any field, whether you work in business, health or tech. If you want to use data to answer a question, you need to design an experiment! In this course you will learn about basic experimental design, including block and factorial designs, and commonly used statistical tests, such as the t-tests and ANOVAs. You will use built-in R data and real world datasets including the CDC NHANES survey, SAT Scores from NY Public Schools, and Lending Club Loan Data. Following the course, you will be able to design and analyze your own experiments!
An introduction to key parts of experimental design plus some power and sample size calculations.
Explore the Lending Club dataset plus build and validate basic experiments, including an A/B test.
Use the NHANES data to build a RCBD and BIBD experiment, including model validation and design tips to make sure the BIBD is valid.
Evaluate the NYC SAT scores data and deal with its missing values, then evaluate Latin Square, Graeco-Latin Square, and Factorial experiments.",['Statistics Fundamentals with R'],"['kaelen medeiros', 'Sascha Mayr', 'Becca Robins']","[('sample of Lending Club data', 'https://assets.datacamp.com/production/repositories/1793/datasets/e14dbe91a0840393e86e4fb9a7ec1b958842ae39/lendclub.csv'), ('NHANES Body Measures', 'https://assets.datacamp.com/production/repositories/1793/datasets/ee832ef6c2fa7036704c53e90dc1e710a3b50dbc/nhanes_bodymeasures.csv'), ('NHANES Demographics', 'https://assets.datacamp.com/production/repositories/1793/datasets/2be5ca94453a63e825bc30ccefd1429b7683c19c/nhanes_demo.csv'), ('NHANES final combined dataset', 'https://assets.datacamp.com/production/repositories/1793/datasets/c74d60f37456fd0bbf0323d6ef88ff6ca91366a3/nhanes_final.csv'), ('NHANES Medical Conditions', 'https://assets.datacamp.com/production/repositories/1793/datasets/d34921a9255422617cdc42f6a3fbcd189f51c19d/nhanes_medicalconditions.csv'), ('NYC SAT Scores', 'https://assets.datacamp.com/production/repositories/1793/datasets/6eee2fcc47c8c8dbb2e9d4670cf2eabeda52b705/nyc_scores.csv')]","['Introduction to Data', 'Introduction to the Tidyverse']",https://www.datacamp.com/courses/experimental-design-in-r,Probability & Statistics,R
86,Exploratory Data Analysis,4,15,54,"36,879","3,950",Exploratory Data Analysis,"Exploratory Data Analysis
When your dataset is represented as a table or a database, it's difficult to observe much about it beyond its size and the types of variables it contains. In this course, you'll learn how to use graphical and numerical techniques to begin uncovering the structure of your data. Which variables suggest interesting relationships? Which observations are unusual? By the end of the course, you'll be able to answer these questions and more, while generating graphics that are both insightful and beautiful.
In this chapter, you will learn how to create graphical and numerical summaries of two categorical variables.
In this chapter, you will learn how to graphically summarize numerical data.
Now that we've looked at exploring categorical and numerical data, you'll learn some useful statistics for describing distributions of data.
Apply what you've learned to explore and summarize a real world dataset in this case study of email spam.","['Data Analyst with R', 'Data Scientist with R', 'Statistics Fundamentals with R']","['Andrew Bray', 'Nick Carchedi', 'Tom Jeon']","[('Cars data', 'https://assets.datacamp.com/production/repositories/537/datasets/c0366d5da5ee8dce49919a5443685cf2e50c6a96/cars04.csv'), ('Comics data', 'https://assets.datacamp.com/production/repositories/537/datasets/8860af2c0ef67fc77a8c704a73bbb93a395debcf/comics.csv'), ('Immigration data', 'https://assets.datacamp.com/production/repositories/537/datasets/d6b811836c453d2afaaf76c6d62b592e673e93ae/immigration.csv'), ('Raw life expectancy data', 'https://assets.datacamp.com/production/repositories/537/datasets/e079a96a639aa10afc478359da45f2f75f7efd2e/life_exp_raw.csv'), ('Names data', 'https://assets.datacamp.com/production/repositories/537/datasets/7dc95cdac26db11e7dd46542741435dbb09fb613/names.txt'), ('Raw U.S. income data', 'https://assets.datacamp.com/production/repositories/537/datasets/813eb74f670b7dd1c7806375bc9607472fe976db/us_income_raw.csv')]","['Introduction to R', 'Introduction to Data']",https://www.datacamp.com/courses/exploratory-data-analysis,Probability & Statistics,R
87,Exploratory Data Analysis in Python,4,16,52,"2,438","4,150",Exploratory Data Analysis,"Exploratory Data Analysis in Python
How do we get from data to answers?  Exploratory data analysis is a process for exploring datasets, answering questions, and visualizing results.  This course presents the tools you need to clean and validate data, to visualize distributions and relationships between variables, and to use regression models to predict and explain.  You'll explore data related to demographics and health, including the National Survey of Family Growth and the General Social Survey.  But the methods you learn apply to all areas of science, engineering, and business. You'll use Pandas, a powerful library for working with data, and other core Python libraries including NumPy and SciPy, StatsModels for regression, and Matplotlib for visualization.  With these tools and skills, you will be prepared to work with real data, make discoveries, and present compelling results.
The first step of almost any data project is to read the data, check for errors and special cases, and prepare data for analysis. This is exactly what you'll do in this chapter, while working with a dataset obtained from the National Survey of Family Growth.
In the first chapter, having cleaned and validated your data, you began exploring it by using histograms to visualize distributions. In this chapter, you'll learn how to represent distributions using Probability Mass Functions (PMFs) and Cumulative Distribution Functions (CDFs). You'll learn when to use each of them, and why, while working with a new dataset obtained from the General Social Survey.
Up until this point, you've only looked at one variable at a time. In this chapter, you'll explore relationships between variables two at a time, using scatter plots and other visualizations to extract insights from a new dataset obtained from the Behavioral Risk Factor Surveillance Survey (BRFSS). You'll also learn how to quantify those relationships using correlation and simple regression.
Explore multivariate relationships using multiple regression to describe non-linear relationships and logistic regression to explain and predict binary variables.",[],"['Allen Downey', 'Chester Ismay', 'Yashas Roy']","[('National Survey of Family Growth (NSFG)', 'https://assets.datacamp.com/production/repositories/4025/datasets/513eca1637050a1fa75874dc5ceabfe89e9d2668/nsfg.hdf5'), ('General Social Survey (GSS)', 'https://assets.datacamp.com/production/repositories/4025/datasets/01de76fde7ef43c629a7dbfb11ce91cde0210417/gss.hdf5'), ('Behavioral Risk Factor Surveillance System (BRFSS)', 'https://assets.datacamp.com/production/repositories/4025/datasets/0bfd1b5298cbaf58f3b4dc2c035120a8b6156d73/brfss.hdf5')]",['Python Data Science Toolbox (Part 2)'],https://www.datacamp.com/courses/exploratory-data-analysis-in-python,Case Studies,Python
88,Exploratory Data Analysis in R: Case Study,4,15,58,"24,605","4,800",Exploratory Data Analysis in R: Case Study,"Exploratory Data Analysis in R: Case Study
Once you've started learning tools for data manipulation and visualization like dplyr and ggplot2, this course gives you a chance to use them in action on a real dataset. You'll explore the historical voting of the United Nations General Assembly, including analyzing differences in voting between countries, across time, and among international issues. In the process you'll gain more practice with the dplyr and ggplot2 packages, learn about the broom package for tidying model output, and experience the kind of start-to-finish exploratory analysis common in data science.
The best way to learn data wrangling skills is to apply them to a specific case study. Here you'll learn how to clean and filter the United Nations voting dataset using the dplyr package, and how to summarize it into smaller, interpretable units.
Once you've cleaned and summarized data, you'll want to visualize them to understand trends and extract insights. Here you'll use the ggplot2 package to explore trends in United Nations voting within each country over time.
While visualization helps you understand one country at a time, statistical modeling lets you quantify trends across many countries and interpret them together. Here you'll learn to use the tidyr, purrr, and broom packages to fit linear models to each country, and understand and compare their outputs.
In this chapter, you'll learn to combine multiple related datasets, such as incorporating information about each resolution's topic into your vote analysis. You'll also learn how to turn untidy data into tidy data, and see how tidy data can guide your exploration of topics and countries over time.","['Data Analyst with R', 'Data Manipulation with R', 'Data Scientist with R']","['David Robinson', 'Nick Carchedi', 'Tom Jeon']","[('United Nations voting dataset', 'https://assets.datacamp.com/production/repositories/420/datasets/ddfa750d993c73026f621376f3c187f276bf0e2a/votes.rds'), ('Topic information for each country (Descriptions)', 'https://assets.datacamp.com/production/repositories/420/datasets/a438432333a31a6f4aba2d5507df9a44e513b518/descriptions.rds')]","['Introduction to R', 'Data Visualization with ggplot2 (Part 1)']",https://www.datacamp.com/courses/exploratory-data-analysis-in-r-case-study,Case Studies,R
89,Exploring Pitch Data with R,4,14,69,"7,427","5,750",Exploring Pitch Data R,"Exploring Pitch Data with R

Velocity is a key component in the arsenal of many pitchers. In this chapter, you will examine whether there was an uptick in Zack Greinke's velocity during his impressive July in 2015. The chapter will introduce how to deal with dates, plotting distributions with histograms, and using the very handy tapply() function.
Pitchers throw various types of pitches with different velocities and trajectories in order to make it more difficult for the batter to hit the ball. This chapter will introduce pitch types and make heavy use of tables to examine changes to pitch type choices by Greinke in July, as well as in other important situations.
As with velocity and pitch type, pitch location can play a key role in pitching success. This chapter leverages the rich information about location provided in the MLB Statcast data to visualize changes in Greinke's pitch location choice in July and in different ball-strike counts. You will also make use of the very important for loop in the context of plotting data.
In this chapter, you'll bring it all together. Minimizing damage on each pitch is the key to run prevention by the pitcher. Therefore, you will look closely at outcomes from pitches thrown by Greinke in different months. We'll also introduce the ggplot2 package to create high quality visualizations of hitter exit speed when Greinke throws to different locations.",[],"['Brian M. Mills', 'Nick Carchedi', 'Tom Jeon', 'Jeff Paadre']","[('greinke2015', 'https://assets.datacamp.com/production/course_943/datasets/greinke2015.csv')]","['Introduction to R', 'Intermediate R']",https://www.datacamp.com/courses/exploring-pitch-data-with-r,Case Studies,R
90,Extreme Gradient Boosting with XGBoost,4,16,49,"14,448","3,750",Extreme Gradient Boosting XGBoost,"Extreme Gradient Boosting with XGBoost
Do you know the basics of supervised learning and want to use state-of-the-art models on real-world datasets? Gradient boosting is currently one of the most popular techniques for efficient modeling of tabular datasets of all sizes. XGboost is a very fast, scalable implementation of gradient boosting, with models using XGBoost regularly winning online data science competitions and being used at scale across different industries. In this course, you'll learn how to use this powerful library alongside pandas and scikit-learn to build and tune supervised learning models. You'll work with real-world datasets to solve classification and regression problems.
This chapter will introduce you to the fundamental idea behind XGBoost—boosted learners. Once you understand how XGBoost works, you'll apply it to solve a common classification problem found in industry: predicting whether a customer will stop being a customer at some point in the future.
After a brief review of supervised regression, you'll apply XGBoost to the regression task of predicting house prices in Ames, Iowa. You'll learn about the two kinds of base learners that XGboost can use as its weak learners, and review how to evaluate the quality of your regression models.
This chapter will teach you how to make your XGBoost models as performant as possible. You'll learn about the variety of parameters that can be adjusted to alter the behavior of XGBoost and how to tune them efficiently so that you can supercharge the performance of your models.
Take your XGBoost skills to the next level by incorporating your models into two end-to-end machine learning pipelines. You'll learn how to tune the most important XGBoost hyperparameters efficiently within a pipeline, and get an introduction to some more advanced preprocessing techniques.",[],"['Sergey Fogelson', 'Hugo Bowne-Anderson', 'Yashas Roy']","[('Ames housing prices (preprocessed)', 'https://assets.datacamp.com/production/repositories/943/datasets/4dbcaee889ef06fb0763e4a8652a4c1f268359b2/ames_housing_trimmed_processed.csv'), ('Ames housing prices (original)', 'https://assets.datacamp.com/production/repositories/943/datasets/17a7c5c0acd7bfa253827ea53646cf0db7d39649/ames_unprocessed_data.csv'), ('Chronic kidney disease', 'https://assets.datacamp.com/production/repositories/943/datasets/82c231cd41f92325cf33b78aaa360824e6b599b9/chronic_kidney_disease.csv')]","['Supervised Learning with scikit-learn', 'Machine Learning with the Experts: School Budgets']",https://www.datacamp.com/courses/extreme-gradient-boosting-with-xgboost,Machine Learning,Python
91,Factor Analysis in R,4,13,45,"3,208","3,600",Factor Analysis in R,"Factor Analysis in R
The world is full of unobservable variables that can't be directly measured. You might be interested in a construct such as math ability, personality traits, or workplace climate. When investigating constructs like these, it's critically important to have a model that matches your theories and data. This course will help you understand dimensionality and show you how to conduct exploratory and confirmatory factor analyses. With these statistical techniques in your toolkit, you'll be able to develop, refine, and share your measures. These analyses are foundational for diverse fields including psychology, education, political science, economics, and linguistics.
In Chapter 1, you will learn how to conduct an EFA to examine the statistical properties of a measure designed around one construct.
This chapter will show you how to extend the single-factor EFA you learned in Chapter 1 to multidimensional data.
This chapter will cover conducting CFAs with the sem package. Both theory-driven and EFA-driven CFA structures will be covered.
This chapter will reinforce the difference between EFAs and CFAs and offer suggestions for improving your model and/or measure.",['Unsupervised Machine Learning with R'],"['Jennifer Brussow', 'Chester Ismay', 'Becca Robins']","[('Generic Conspiracist Beliefs Scale (GCBS) dataset', 'https://assets.datacamp.com/production/repositories/2136/datasets/869615371e66021e97829feb7e19e38037ed0c14/GCBS_data.rds')]","['Intermediate R', 'Foundations of Inference']",https://www.datacamp.com/courses/factor-analysis-in-r,Probability & Statistics,R
92,Feature Engineering for Machine Learning in Python,4,16,53,"1,946","4,350",Feature Engineering Machine Learning,"Feature Engineering for Machine Learning in Python
Every day you read about the amazing breakthroughs in how the newest applications of machine learning are changing the world. Often this reporting glosses over the fact that a huge amount of data munging and feature engineering must be done before any of these fancy models can be used. In this course, you will learn how to do just that. You will work with Stack Overflow Developers survey, and historic US presidential inauguration addresses, to understand how best to preprocess and engineer features from categorical, continuous, and unstructured data. This course will give you hands-on experience on how to prepare any data for your own machine learning models.
In this chapter, you will explore what feature engineering is and how to get started with applying it to real-world data. You will load, explore and visualize a survey response dataset, and in doing so you will learn about its underlying data types and why they have an influence on how you should engineer your features. Using the pandas package you will create new features from both categorical and continuous columns.
This chapter introduces you to the reality of messy and incomplete data. You will learn how to find where your data has missing values and explore multiple approaches on how to deal with them. You will also use string manipulation techniques to deal with unwanted characters in your dataset.
In this chapter, you will focus on analyzing the underlying distribution of your data and whether it will impact your machine learning pipeline. You will learn how to deal with skewed data and situations where outliers may be negatively impacting your analysis.
Finally, in this chapter, you will work with unstructured text data, understanding ways in which you can engineer columnar features out of a text corpus. You will compare how different approaches may impact how much context is being extracted from a text, and how to balance the need for context, without too many features being created.",[],"[""Robert O'Callaghan"", 'Sumedh Panchadhar', 'Hillary Green-Lerman']","[('Stack Overflow Survey Responses (Modified)', 'https://assets.datacamp.com/production/repositories/3752/datasets/19699a2441073ad6459bf5e3e17690e2cae86cf1/Combined_DS_v10.csv'), ('US Presidential Inauguration Addresses', 'https://assets.datacamp.com/production/repositories/3752/datasets/cdc15798dd6698003ee33c6af185242faf896187/inaugural_speeches.csv')]","['pandas Foundations', 'Supervised Learning with scikit-learn']",https://www.datacamp.com/courses/feature-engineering-for-machine-learning-in-python,Machine Learning,Python
93,Feature Engineering for NLP in Python,4,15,52,"2,379","4,200",Feature Engineering NLP,"Feature Engineering for NLP in Python
In this course, you will learn techniques that will allow you to extract useful information from text and process them into a format suitable for applying ML models. More specifically, you will learn about POS tagging, named entity recognition, readability scores, the n-gram and tf-idf models, and how to implement them using scikit-learn and spaCy. You will also learn to compute how similar two documents are to each other.  In the process, you will predict the sentiment of movie reviews and build movie and Ted Talk recommenders. Following the course, you will be able to engineer critical features out of any text and solve some of the most challenging problems in data science!
Learn to compute basic features such as number of words, number of characters, average word length and number of special characters (such as Twitter hashtags and mentions).  You will also learn to compute readability scores and determine the amount of education required to comprehend a piece of text.
In this chapter, you will learn about tokenization and lemmatization. You will then learn how to perform text cleaning, part-of-speech tagging, and named entity recognition using the spaCy library. Upon mastering these concepts, you will proceed to make the Gettysburg address machine-friendly, analyze noun usage in fake news, and identify people mentioned in a TechCrunch article.
Learn about n-gram modeling and use it to perform sentiment analysis on movie reviews.
Learn how to compute tf-idf weights and the cosine similarity score between two vectors. You will use these concepts to build a movie and a TED Talk recommender. Finally, you will also learn about word embeddings and using word vector representations, you will compute similarities between various Pink Floyd songs.",[],"['Rounak Banik', 'Hillary Green-Lerman', 'Adrián Soto']","[('Russian Troll Tweets', 'https://assets.datacamp.com/production/repositories/4375/datasets/f67c0cd351c8431bde5ac9724f9031102e38edb3/russian_tweets.csv'), ('Movie Overviews and Taglines', 'https://assets.datacamp.com/production/repositories/4375/datasets/83f27c4ad045c098d3db5596154316e4ee0a28a8/movie_overviews.csv'), ('Preprocessed Movie Reviews', 'https://assets.datacamp.com/production/repositories/4375/datasets/4281f3352173b69c17965c8f5261603cc18c7d0b/movie_reviews_clean.csv'), ('TED Talk Transcripts', 'https://assets.datacamp.com/production/repositories/4375/datasets/923cfcdab7e4297c2e3c4c859a5add798ae51d3b/ted.csv'), ('Real and Fake News Headlines', 'https://assets.datacamp.com/production/repositories/4375/datasets/dd0cbaa4d6df483b6cb8fb8365152f5e3d743990/fakenews.csv')]","['pandas Foundations', 'Natural Language Processing Fundamentals in Python', 'Supervised Learning with scikit-learn']",https://www.datacamp.com/courses/feature-engineering-for-nlp-in-python,Machine Learning,Python
94,Feature Engineering in R,4,13,44,"1,671","3,500",Feature Engineering in R,"Feature Engineering in R
Feature engineering helps you uncover useful insights from your machine learning models. The model building process is iterative and requires creating new features using existing variables that make your model more efficient. In this course, you will explore different data sets and apply a variety of feature engineering techniques to both continuous and discrete variables.
In this chapter, you will learn how to change categorical features into numerical representations that models can interpret.  You'll learn about one-hot encoding and using binning for categorical features.
In this chapter, you will learn how to manipulate numerical features to create meaningful features that can give better insights into your model.  You will also learn how to work with dates in the context of feature engineering.
In this chapter, you will learn about using transformation techniques, like Box-Cox and Yeo-Johnson, to address issues with non-normally distributed features. You'll also learn about methods to scale features, including mean centering and z-score standardization.
In the final chapter, we will use feature crossing to create features from two or more variables. We will also discuss principal component analysis, and methods to explore and visualize those results.",[],"['Jose Hernandez', 'Chester Ismay', 'Amy Peterson']",[],[],https://www.datacamp.com/courses/feature-engineering-in-r,Machine Learning,R
95,Feature Engineering with PySpark,4,16,60,"3,549","5,000",Feature Engineering PySpark,"Feature Engineering with PySpark
The real world is messy and your job is to make sense of it. Toy datasets like MTCars and Iris are the result of careful curation and cleaning, even so the data needs to be transformed for it to be useful for powerful machine learning algorithms to extract meaning, forecast, classify or cluster. This course will cover the gritty details that data scientists are spending 70-80% of their time on; data wrangling and feature engineering. With size of datasets now becoming ever larger, let's use PySpark to cut this Big Data problem down to size!
Get to know a bit about your problem before you dive in! Then learn how to statistically and visually inspect your dataset!
Real data is rarely clean and ready for analysis. In this chapter learn to remove unneeded information, handle missing values and add additional data to your analysis.
In this chapter learn how to create new features for your machine learning model to learn from. We'll look at generating them by combining fields, extracting values from messy columns or encoding them for better results.
In this chapter we'll learn how  to choose which type of model we want. Then we will learn how to apply our data to the model and evaluate it. Lastly,  we'll learn how to interpret the results and save the model for later!",[],"['John Hogue', 'Adrián Soto', 'Nick Solomon']","[('2017 St Paul MN Real Estate Dataset', 'https://assets.datacamp.com/production/repositories/1704/datasets/d26c25f46746882d0a0f474cc6709c629f69872c/2017_StPaul_MN_Real_Estate.csv')]","['Supervised Learning with scikit-learn', 'Introduction to PySpark']",https://www.datacamp.com/courses/feature-engineering-with-pyspark,Data Manipulation,Python
96,Financial Analytics in R,4,17,59,"3,369","4,750",Financial Analytics in R,"Financial Analytics in R
This course is an introduction to the world of finance where cash is king and time is money. In this course, you will learn how to use R to quantify the value of projects, opportunities, and actions and drive decision-making. Students will use the R language to explore cashflow statements, compute profitability metrics, apply decision rules, and compare alternatives. You will end this case-motivated course with an understanding of key financial concepts and the skills needed to conceptualize an communicate the value of you or your teams' projects in a corporate setting.
Introducing the motivation for and basic concepts of discounted cashflow valuations analysis.
An overview of time-value of money and related concepts.
Understanding different ways to summarize cashflow output.
Piecing it altogether with sensitivty and scenario analysis.",[],"['Emily Riederer', 'Sascha Mayr', 'David Campos', 'Shon Inouye']",[],"['Introduction to R', 'Introduction to the Tidyverse']",https://www.datacamp.com/courses/financial-analytics-in-r,Applied Finance,R
97,Financial Analytics in Spreadsheets,4,15,56,"5,718","4,650",Financial Analytics in Spreadsheets,"Financial Analytics in Spreadsheets
Monitoring the evolution of traded assets is key in finance. In this course, you will learn how to build a graphical dashboard with spreadsheets to track the performance of financial securities. You will focus on historical prices and dividends of the hypothetical stock ABC. You will learn how to visualize its prices, how to measure essential reward and risk indicators, and see if your investment in ABC outperformed a benchmark index. At the end of the course, you should be able to use spreadsheets to build great monitoring tools used by traders and financial analysts in their day-to-day business life!
In the first chapter, you’ll be introduced to the problem: you have a time series of monthly (historical) prices for the hypothetical stock ABC from which you have to extract some meaningful information. You’ll be given some definitions (what is a stock? what are dividends?), and at the end of the chapter, you’ll be able to graphically represent the evolution of a stock price over a specific period.
In this chapter, the core of the analysis will switch from historical prices to historical returns. You’ll learn (and compute) the main performance indicators of past returns, both in terms of reward and risk. Finally, you’ll be introduced to risk-adjusted performance measures: indicators that take into account both reward and risk.
In this chapter, you'll look at the full distribution of historical returns. First, you’ll learn how to build a histogram to describe the distribution of historical returns. Second, you’ll be introduced to the Gaussian distribution, a commonly used model for stock returns. You'll visually inspect if the Gaussian model is reasonable for the ABC stock returns. Finally, you'll understand potential flaws with the Gaussian model.
In this final chapter, you’ll benchmark ABC stock against a market index and verify whether ABC outperformed the benchmark or not. The comparison process will be done through several steps/metrics. First, you’ll analyze the cumulative wealth. Next, you’ll extend the comparison using different indicators such as Sharpe Ratio and Drawdown. Finally, you’ll examine the linear relation between ABC stock and the benchmark through the correlation coefficient. At the end of the chapter, you’ll be introduced to more powerful and advanced spreadsheet features that introduce interactivity in your analysis.",[],"['David Ardia', 'Riccardo Mancini', 'Chester Ismay', 'Sara Billen']","[('Stock ABC', 'https://assets.datacamp.com/production/repositories/3915/datasets/51f1898dae27f03a058601c2a7585f4775a1afe9/Dataset.csv')]",['Intermediate Spreadsheets for Data Science'],https://www.datacamp.com/courses/financial-analytics-in-spreadsheets,Applied Finance,Spreadsheets
98,Financial Forecasting in Python,4,12,49,"2,492","4,050",Financial Forecasting,"Financial Forecasting in Python
In Financial Forecasting in Python, you will step into the role of CFO and learn how to advise a board of directors on key metrics while building a financial forecast, the basics of income statements and balance sheets, and cleaning messy financial data. During the course, you will examine real-life datasets from Netflix, Tesla, and Ford, using the pandas package. Following the course, you will be able to calculate financial metrics, work with assumptions and variances, and build your own forecast in Python!
In this chapter, we will learn the basics of financial statements, with a specific focus on the income statement, which provides details on our sales, costs, and profits. We will learn how to calculate profitability metrics and finish off what we have learned by building our profit forecast for Tesla!
In this chapter, we will learn a bit more about the balance sheet, covering assets and liabilities and specific ratios to help evaluate the financial health and efficiency of a company, as well as how these ratios can assist us in building a great forecast.
We have gotten a basic understanding of income statements and balance sheets. However, consolidating data for forecasting is complex, so in this chapter, we will look at some basic tools to help solve some of the complexities specifically relating to finance - working with dates and different financial periods, and formatting our raw data into the correct format for financial forecasting.
In this chapter, we will be exploring two more aspects to creating a good forecast. First, we will look at assumptions, what drives them and what happens when an assumption changes? Next, we will look at variances, as a forecast is built at one point in time, but what happens when the actual results do not correspond to our forecast? We need to build a sensitive forecast that can be sensitive to changes in both assumptions and take into account variances, and this is what we will explore in this chapter.",[],"['Victoria Clark', 'Becca Robins', 'Sara Snell']","[('Ford Balance Sheet', 'https://assets.datacamp.com/production/repositories/1882/datasets/9f3f116318e2471b55d9f0a6c5c709d4cbfb94b7/F-Balance-Sheet.csv'), ('Netflix Forecast', 'https://assets.datacamp.com/production/repositories/1882/datasets/21336aacbe41c511358594c5baead41b7673f89b/Netflix.csv'), ('Tesla Income Statement', 'https://assets.datacamp.com/production/repositories/1882/datasets/c87f9f462d0a8e04b1595ac86b2fa2fbfde75737/TSLA-Income-Statement.csv')]","['Introduction to Python', 'Intermediate Python for Data Science']",https://www.datacamp.com/courses/financial-forecasting-in-python,Applied Finance,Python
99,Financial Modeling in Spreadsheets,4,13,52,862,"4,550",Financial Modeling in Spreadsheets,"Financial Modeling in Spreadsheets
Have you ever wanted to plan for retirement, understand the stock market, or create a cash flow for your business? In this course, you will learn how to build business and financial models in Sheets. Google Sheets is an excellent technology for business models! You can create a framework for your goal, like understanding the growth of investments, and then update that framework based on current data. You will learn the basics of business modeling focusing on cash flows, investments, annuities, loan amortization, and saving for retirement. By the end of the course, you will have gained referencing and function skills in Sheets that you can apply to all sorts of models.
An introduction to modeling financial statements in Sheets focusing on balance and income statements, which help create cash flow models.
Learn Sheet's financial model functions by creating investment models with the fv, pv, pmt, and nper functions. You will also learn how to pay off debts in a loan amortization table.
Saving for retirement is tricky, but in this chapter, you will learn how to create models that help you plan to save and use your money after retirement.
Stock prices go up and down but can we model them? Learn about volatility and simulating stock prices in this final chapter.",[],"['Erin Buchanan', 'Chester Ismay', 'Amy Peterson']",[],"['Spreadsheet Basics', 'Data Analysis with Spreadsheets', 'Intermediate Spreadsheets for Data Science']",https://www.datacamp.com/courses/financial-modeling-in-spreadsheets,Applied Finance,Spreadsheets
100,Financial Trading in R,5,20,65,"14,094","5,050",Financial Trading in R,"Financial Trading in R
This course will cover the basics on financial trading and will give you an overview of how to use quantstrat to build signal-based trading strategies in R. It will teach you how to set up a quantstrat strategy, apply transformations of market data called indicators, create signals based on the interactions of those indicators, and even simulate orders. Lastly, it will explain how to analyze your results both from statistical and visual perspectives.
In this chapter, you will learn the definition of trading, the philosophies of trading, and the pitfalls that exist in trading. This chapter covers both momentum and oscillation trading, along with some phrases to identify these types of philosophies. You will learn about overfitting and how to avoid it, obtaining and plotting financial data, and using a well-known indicator in trading.
Before building a strategy, the quantstrat package requires you to initialize some settings. In this chapter you will learn how this is done. You will cover a series of functions that deal with initializing a time zone, currency, the instruments you'll be working with, along with quantstrat's various frameworks that will allow it to perform analytics. Once this is done, you will have the knowledge to set up a quantstrat initialization file, and know how to change it.
Indicators are crucial for your trading strategy. They are transformations of market data that allow a clearer understanding of its overall behavior, usually in exchange for lagging the market behavior. Here, you will be working with both trend types of indicators as well as oscillation indicators. You will also learn how to use pre-programmed indicators available in other libraries as well as implement one of your own.
When constructing a quantstrat strategy, you want to see how the market interacts with indicators and how indicators interact with each other. In this chapter you'll learn how indicators can generate signals in quantstrat. Signals are interactions of market data with indicators, or indicators with other indicators. There are four types of signals in quantstrat: sigComparison, sigCrossover, sigThreshold, and sigFormula. By the end of this chapter, you'll know all about these signals, what they do, and how to use them.
In this chapter, you'll learn how to shape your trading transaction once you decide to execute on a signal. This chapter will cover a basic primer on rules, and how to enter and exit positions. You'll also learn how to send inputs to order-sizing functions. By the end of this chapter, you'll learn the gist of how rules function, and where you can continue learning about them.
After a quantstrat strategy has been constructed, it's vital to know how to actually analyze the strategy's performance. This chapter details just that. You will learn how to read vital trade statistics, and view the performance of your trading strategy over time. You will also learn how to get a reward to risk ratio called the Sharpe ratio in two different ways. This is the last chapter.","['Applied Finance with R', 'Quantitative Analyst with R']","['Ilya  Kipnis', 'Lore Dirick']","[('SPY data from 2000 through 2016', 'https://assets.datacamp.com/production/repositories/378/datasets/add0628410cb0ca07efffaf6756517c455186eb5/spy_000101_160630.RData')]","['Introduction to R for Finance', 'Intermediate R for Finance']",https://www.datacamp.com/courses/financial-trading-in-r,Applied Finance,R
101,Forecasting Product Demand in R,4,13,50,"4,083","4,200",Forecasting Product Demand in R,"Forecasting Product Demand in R
Accurately predicting demand for products allows a company to stay ahead of the market. By knowing what things shape demand, you can drive behaviors around your products better. This course unlocks the process of predicting product demand through the use of R. You will learn how to identify important drivers of demand, look at seasonal effects, and predict demand for a hierarchy of products from a real world example. By the end of the course you will be able to predict demand for multiple products across a region of a state in the US. Then you will roll up these predictions across many different regions of the same state to form a complete hierarchical forecasting system.
When it comes to forecasting, time series modeling is a great place to start! You need to forecast out the future values of sales demand and a good baseline approach would be ARIMA models. In this chapter you'll learn how to quickly implement ARIMA models and get good initial forecasts for future product demand.
Economic theory has a lot to say about predicting values of demand. Obviously, external factors like price, seasonality, and timing of promotions will drive some aspects of product demand. In this chapter you'll learn about the basics around price elasticity models and how to incorporate seasonality and promotion timing factors into our product demand forecasts.
Time series models and pricing regressions don't have to be thought of as separate approaches to product demand forecasting. They can be combined! In this chapter you'll learn about two ways of ""combining"" the information gained in both modeling approaches - transfer functions and forecast ensembling.
Everything up until this point deals with making individual models for forecasting product demand. However, we haven't taken advantage of the fact that all of these products form a product hierarchy of sales. Products make up regions and regions make up states. How can we ensure that our forecasts reconcile correctly up and down the hierarchy? In this chapter you'll learn about hierarchical forecasting and how to use it to your advantage in forecasting product demand.",[],"['Aric LaBarr', 'Yashas Roy', 'Richie Cotton']","[('Beverage producer sales', 'https://assets.datacamp.com/production/course_6021/datasets/Bev.csv')]",['Intermediate R'],https://www.datacamp.com/courses/forecasting-product-demand-in-r,Probability & Statistics,R
102,Forecasting Using ARIMA Models in Python,4,15,57,"1,716","4,850",Forecasting Using ARIMA Models,"Forecasting Using ARIMA Models in Python
Have you ever tried to predict the future? What lies ahead is a mystery which is usually only solved by waiting. In this course, you will stop waiting and learn to use the powerful ARIMA  class models to forecast the future. You will learn how to use the statsmodels package to analyze time series, to build tailored models, and to forecast under uncertainty. How will the stock market move in the next 24 hours? How will the levels of CO2 change in the next decade? How many earthquakes will there be next year? You will learn to solve all these problems and more.
Dive straight in and learn about the most important properties of time series. You'll learn about stationarity and how this is important for ARMA models. You'll learn how to test for stationarity by eye and with a standard statistical test. Finally, you'll learn the basic structure of ARMA models and use this to generate some ARMA data and fit an ARMA model.
What lies ahead in this chapter is you predicting what lies ahead in your data. You'll learn how to use the elegant statsmodels package to fit ARMA, ARIMA and ARMAX models.  Then you'll use your models to predict the uncertain future of stock prices!
In this chapter, you will become a modeler of discerning taste. You'll learn how to identify promising model orders from the data itself, then, once the most promising models have been trained, you'll learn how to choose the best model from this fitted selection. You'll also learn a great framework for structuring your time series projects.
In this final chapter, you'll learn how to use seasonal ARIMA models to fit more complex data. You'll learn how to decompose this data into seasonal and non-seasonal parts and then you'll get the chance to utilize all your ARIMA tools on one last global forecast challenge.",[],"['James Fulton', 'Chester Ismay', 'Adel Nehme']","[('US Monthly Candy Production', 'https://assets.datacamp.com/production/repositories/4567/datasets/0707fe926ef5f110ed889fcd2a09c9417e2ffbb6/candy_production.csv'), ('Monthly Record of CO2', 'https://assets.datacamp.com/production/repositories/4567/datasets/d358460aae958f23ba20968aba924cd3eea2e969/co2.csv'), ('Amazon Daily Closing Stock Price', 'https://assets.datacamp.com/production/repositories/4567/datasets/4543d63de229cec637e58f90973b64417e5dc24c/amazon_close.csv'), ('Monthly Milk Production', 'https://assets.datacamp.com/production/repositories/4567/datasets/1213fc15035051ef7fe5a0dac44176df7223a93a/milk_production.csv'), ('Yearly Earthquakes', 'https://assets.datacamp.com/production/repositories/4567/datasets/96dadbe9fcb8985ff2f89c5c9f5ada3d4180e65a/earthquakes.csv')]",['Supervised Learning with scikit-learn'],https://www.datacamp.com/courses/forecasting-using-arima-models-in-python,Machine Learning,Python
103,Forecasting Using R,5,18,55,"24,397","4,450",Forecasting Using R,"Forecasting Using R
Forecasting involves making predictions about the future. It is required in many situations: deciding whether to build another power generation plant in the next ten years requires forecasts of future demand; scheduling staff in a call centre next week requires forecasts of call volumes; stocking an inventory requires forecasts of stock requirements. Forecasts can be required several years in advance (for the case of capital investments), or only a few minutes beforehand (for telecommunication routing). Whatever the circumstances or time horizons involved, forecasting is an important aid to effective and efficient planning. This course provides an introduction to time series forecasting using R. 
The first thing to do in any data analysis task is to plot the data. Graphs enable many features of the data to be visualized, including patterns, unusual observations, and changes over time. The features that are seen in plots of the data must then be incorporated, as far as possible, into the forecasting methods to be used.
In this chapter, you will learn general tools that are useful for many different forecasting situations. It will describe some methods for benchmark forecasting, methods for checking whether a forecasting method has adequately utilized the available information, and methods for measuring forecast accuracy. Each of the tools discussed in this chapter will be used repeatedly in subsequent chapters as you develop and explore a range of forecasting methods.
Forecasts produced using exponential smoothing methods are weighted averages of past observations, with the weights decaying exponentially as the observations get older. In other words, the more recent the observation, the higher the associated weight. This framework generates reliable forecasts quickly and for a wide range of time series, which is a great advantage and of major importance to applications in business.
ARIMA models provide another approach to time series forecasting. Exponential smoothing and ARIMA models are the two most widely-used approaches to time series forecasting, and provide complementary approaches to the problem. While exponential smoothing models are based on a description of the trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data.
The time series models in the previous chapters work well for many time series, but they are often not good for weekly or hourly data, and they do not allow for the inclusion of other information such as the effects of holidays, competitor activity, changes in the law, etc. In this chapter, you will look at some methods that handle more complicated seasonality, and you consider how to extend ARIMA models in order to allow other information to be included in the them.","['Quantitative Analyst with R', 'Time Series with R']","['Rob J. Hyndman', 'Lore Dirick', 'Davis Vaughan']","[('Excelfile in the first exercise', 'https://assets.datacamp.com/production/repositories/684/datasets/d46ad7146f174e01407d01b7a8ef906f0bb7cdd6/exercise1.xlsx')]","['Introduction to R', 'Intermediate R']",https://www.datacamp.com/courses/forecasting-using-r,Probability & Statistics,R
104,Foundations of Functional Programming with purrr,4,13,44,"3,081","3,750",Functional Programming purrr,"Foundations of Functional Programming with purrr
Lists can be difficult to both understand and manipulate, but they can pack a ton of information and are very powerful. In this course, you will learn to easily extract, summarize, and manipulate lists and how to export the data to your desired object, be it another list, a vector, or even something else! Throughout the course, you will work with the purrr package and a variety of datasets from the repurrrsive package, including data from Star Wars and Wes Anderson films and data collected about GitHub users and GitHub repos. Following this course, your list skills will be purrrfect!
Iteration is a powerful way to make the computer do the work for you. It can also be an area of coding where it is easy to make lots of typos and simple mistakes. The purrr package helps simplify iteration so you can focus on the next step, instead of finding typos.
purrr is much more than a for loop; it works well with pipes, we can use it to run models and simulate data, and make nested loops!
Like anything in R, understanding how to troubleshoot issues is an important skill set. This can be particularly important with lists, where finding the problem can be tricky.
Now that you have the building blocks, we will start tackling some more complex data problems with purrr.",['Intermediate Tidyverse Toolbox'],"['DataCamp Content Creator', 'Chester Ismay', 'Becca Robins']","[('Simulated data 1990-2005', 'https://assets.datacamp.com/production/repositories/1858/datasets/24e986c962c2acc48ee76ec01363e23ab73a4319/simulated_data_from_1990_to_2005.zip')]",['Introduction to the Tidyverse'],https://www.datacamp.com/courses/foundations-of-functional-programming-with-purrr,Programming,R
105,Foundations of Inference,4,17,58,"15,644","4,350",Inference,"Foundations of Inference
One of the foundational aspects of statistical analysis is inference, or the process of drawing conclusions about a larger population from a sample of data. Although counter intuitive, the standard practice is to attempt to disprove a research claim that is not of interest. For example, to show that one medical treatment is better than another, we can assume that the two treatments lead to equal survival rates only to then be disproved by the data. Additionally, we introduce the idea of a p-value, or the degree of disagreement between the data and the hypothesis. We also dive into confidence intervals, which measure the magnitude of the effect of interest (e.g. how much better one treatment is than another).
In this chapter, you will investigate how repeated samples taken from a population can vary.  It is the variability in samples that allow you to make claims about the population of interest.  It is important to remember that the research claims of interest focus on the population while the information available comes only from the sample data.
In this chapter, you will gain the tools and knowledge to complete a full hypothesis test.  That is, given a dataset, you will know whether or not is appropriate to reject the null hypothesis in favor of the research claim of interest.
You will continue learning about hypothesis testing with a new example and the same structure of randomization tests.  In this chapter, however, the focus will be on different errors (type I and type II), how they are made, when one is worse than another, and how things like sample size and effect size impact the error rates.
As a complement to hypothesis testing, confidence intervals allow you to estimate a population parameter.  Recall that your interest is always in some characteristic of the population, but you only have incomplete information to estimate the parameter using sample data.  Here, the parameter is the true proportion of successes in a population.  Bootstrapping is used to estimate the variability needed to form the confidence interval.",['Statistical Inference with R'],"['Jo Hardin', 'Nick Carchedi', 'Tom Jeon']","[('All polls', 'https://assets.datacamp.com/production/repositories/538/datasets/9737cf05b3899a5057110feb8dd27aa5dfe107b8/all_polls.rds'), ('Polling data', 'https://assets.datacamp.com/production/repositories/538/datasets/b1071ca5cb72143820e33fd7c6605dc4b3f11b7a/all_polls.RData'), ('Big discrimination dataset', 'https://assets.datacamp.com/production/repositories/538/datasets/f03da8fc4a2ae50a3ddf775324f4df90c96f7f26/disc_big.rds'), ('New discrimination dataset', 'https://assets.datacamp.com/production/repositories/538/datasets/60566129b391ef827ea9c8a9846608dee24ce34a/disc_new.rds'), ('Small discrimination dataset', 'https://assets.datacamp.com/production/repositories/538/datasets/543fa990550c61f6a2cc175b0a0414528f8094c0/disc_small.rds')]","['Introduction to R', 'Introduction to Data', 'Exploratory Data Analysis', 'Correlation and Regression']",https://www.datacamp.com/courses/foundations-of-inference,Probability & Statistics,R
106,Foundations of Predictive Analytics in Python (Part 1),4,14,52,"4,732","4,100",Predictive Analytics,"Foundations of Predictive Analytics in Python (Part 1)
In this course, you will learn how to build a logistic regression model with meaningful variables. You will also learn how to use this model to make predictions and how to present it and its performance to business stakeholders.
In this Chapter, you'll learn the basics of logistic regression: how can you predict a binary target with continuous variables and, how should you interpret this model and use it to make predictions for new examples?
In this chapter you'll learn why variable selection is crucial for building a useful model. You'll also learn how to implement forward stepwise variable selection for logistic regression and how to decide on the number of variables to include in your final model.
Now that you know how to build a good model, you should convince stakeholders to use it by creating appropriate graphs. You will learn how to construct and interpret the cumulative gains curve and lift graph.
In a business context, it is often important to explain the intuition behind the model you built. Indeed, if the model and its variables do not make sense, the model might not be used. In this chapter you'll learn how to explain the relationship between the variables in the model and the target by means of predictor insight graphs.",[],"['Nele Verbiest', 'Lore Dirick', 'Nick Solomon', 'Hadrien Lacroix']","[('Example basetable', 'https://assets.datacamp.com/production/repositories/1441/datasets/7abb677ec52631679b467c90f3b649eb4f8c00b2/basetable_ex2_4.csv')]",['Intermediate Python for Data Science'],https://www.datacamp.com/courses/foundations-of-predictive-analytics-in-python-part-1,Machine Learning,Python
107,Foundations of Predictive Analytics in Python (Part 2),4,15,56,"1,313","4,350",Predictive Analytics,"Foundations of Predictive Analytics in Python (Part 2)
Building good models only succeeds if you have a decent base table to start with. In this course you will learn how to construct a good base table, create variables and prepare your data for modeling. We finish with advanced topics on the matter. If you have not already, you should take Foundations of Predictive Analytics in Python (Part 1) first.
In this chapter you will learn how to construct the foundations of your base table, namely the population and the target.
You will learn how to add variables to the base table that you can use to predict the target.
Once you derived variables from the raw data, it is time to clean the data and prepare it for modeling. In this Chapter we discuss the steps that need to be taken to make your data modeling-ready.
In some cases, the target or variables change heavily with the seasons. You will learn how you can deal with seasonality by adding different snapshots to the base table",[],"['Nele Verbiest', 'Hadrien Lacroix', 'Nick Solomon', 'Lore Dirick']","[('Donor IDs', 'https://assets.datacamp.com/production/repositories/1602/datasets/a83c1416e5a3ee2a7b8286aa72de97d2dd8eab45/basetable.csv'), ('Basetable with countries and age', 'https://assets.datacamp.com/production/repositories/1602/datasets/8d94f1d90fcc065416296e29cd1b3fef13cdbd16/basetable_interactions.csv'), ('Basetable used in Ex 2.13', 'https://assets.datacamp.com/production/repositories/1602/datasets/1066b658e4359261e54bb2f303812f4f6e3b6cf9/basetable_ex_2_13.csv'), ('Living place of donors', 'https://assets.datacamp.com/production/repositories/1602/datasets/cc4f3b53a8818e584bed85b75173b217e645216a/living_places.csv'), ('Donations', 'https://assets.datacamp.com/production/repositories/1602/datasets/e828af7f273445328bbe8648f0fc318a6d7741a5/gifts.csv')]","['Intermediate Python for Data Science', 'Foundations of Predictive Analytics in Python (Part 1)']",https://www.datacamp.com/courses/foundations-of-predictive-analytics-in-python-part-2,Machine Learning,Python
108,Foundations of Probability in Python,5,16,61,"1,053","5,050",Probability,"Foundations of Probability in Python
Probability is the study of regularities that emerge in the outcomes of random experiments. In this course, you'll learn about fundamental probability concepts like random variables (starting with the classic coin flip example) and how to calculate mean and variance, probability distributions, and conditional probability. We'll also explore two very important results in probability: the law of large numbers and the central limit theorem. Since probability is at the core of data science and machine learning, these concepts will help you understand and apply models more robustly. Chances are everywhere, and the study of probability will change the way you see the world. Let’s get random!
A coin flip is the classic example of a random experiment. The possible outcomes are heads or tails. This type of experiment, known as a Bernoulli or binomial trial, allows us to study problems with two possible outcomes, like “yes” or “no” and “vote” or “no vote.” This chapter introduces Bernoulli experiments, binomial distributions to model multiple Bernoulli trials, and probability simulations with the scipy library.
In this chapter you'll learn to calculate various kinds of probabilities, such as the probability of the intersection of two events and the sum of probabilities of two events, and to simulate those situations. You'll also learn about conditional probability and how to apply Bayes' rule.
Until now we've been working with binomial distributions, but there are many probability distributions a random variable can take. In this chapter we'll introduce three more that are related to the binomial distribution: the normal, Poisson, and geometric distributions.
No that you know how to calculate probabilities and important properties of probability distributions, we'll introduce two important results: the law of large numbers and the central limit theorem. This will expand your understanding on how the sample mean converges to the population mean as more data is available and how the sum of random variables behaves under certain conditions.

We will also explore connections between linear and logistic regressions as applications of probability and statistics in data science.",[],"['Alexander A. Ramírez M.', 'Hillary Green-Lerman', 'Adrián Soto']",[],"['Intermediate Python for Data Science', 'Statistical Thinking in Python (Part 1)']",https://www.datacamp.com/courses/foundations-of-probability-in-python,Probability & Statistics,Python
109,Foundations of Probability in R,4,13,54,"11,746","4,350",Probability in R,"Foundations of Probability in R
Probability is the study of making predictions about random phenomena. In this course, you'll learn about the concepts of random variables, distributions, and conditioning, using the example of coin flips. You'll also gain intuition for how to solve probability problems through random simulation. These principles will help you understand statistical inference and can be applied to draw conclusions from data.
One of the simplest and most common examples of a random phenomenon is a coin flip: an event that is either ""yes"" or ""no"" with some probability. Here you'll learn about the binomial distribution, which describes the behavior of a combination of yes/no trials and how to predict and simulate its behavior.
In this chapter you'll learn to combine multiple probabilities, such as the probability two events both happen or that at least one happens, and confirm each with random simulations. You'll also learn some of the properties of adding and multiplying random variables.
Bayesian statistics is a mathematically rigorous method for updating your beliefs based on evidence. In this chapter, you'll learn to apply Bayes' theorem to draw conclusions about whether a coin is fair or biased, and back it up with simulations.
So far we've been talking about the binomial distribution, but this is one of many probability distributions a random variable can take. In this chapter we'll introduce three more that are related to the binomial: the normal, the Poisson, and the geometric.",['Probability and Distributions with R'],"['David Robinson', 'Nick Carchedi', 'Tom Jeon', 'Nick Solomon']",[],['Introduction to R'],https://www.datacamp.com/courses/foundations-of-probability-in-r,Probability & Statistics,R
110,Fraud Detection in Python,4,16,57,"5,263","4,800",Fraud Detection,"Fraud Detection in Python
A typical organization loses an estimated 5% of its yearly revenue to fraud. In this course, you will learn how to fight fraud by using data. For example, you'll learn how to apply supervised learning algorithms to detect fraudulent behavior similar to past ones, as well as unsupervised learning methods to discover new types of fraud activities. Moreover, in fraud analytics you often deal with highly imbalanced datasets when classifying fraud versus non-fraud, and during this course you will pick up some techniques on how to deal with that. The course provides a mix of technical and theoretical insights and shows you hands-on how to practically implement fraud detection models. In addition, you will get tips and advice from real-life experience to help you prevent making common mistakes in fraud analytics.
In this chapter, you''ll learn about the typical challenges associated with fraud detection, and will learn how to resample your data in a smart way, to tackle problems with imbalanced data.
Now that you're familiar with the main challenges of fraud detection, you're about to learn how to flag fraudulent transactions with supervised learning. You will use classifiers, adjust them and compare them to find the most efficient fraud detection model.
This chapter focuses on using unsupervised learning techniques to detect fraud. You will segment customers, use K-means clustering  and other clustering algorithms to find suspicious occurrences in your data.
In this final chapter, you will use text data, text mining and topic modeling to detect fraudulent behavior.",[],"['Charlotte Werger', 'Hadrien Lacroix', 'Mari Nazary']","[('Chapter 1 datasets', 'https://assets.datacamp.com/production/repositories/2162/datasets/cc3a36b722c0806e4a7df2634e345975a0724958/chapter_1.zip'), ('Chapter 2 datasets', 'https://assets.datacamp.com/production/repositories/2162/datasets/4fb6199be9b89626dcd6b36c235cbf60cf4c1631/chapter_2.zip'), ('Chapter 3 datasets', 'https://assets.datacamp.com/production/repositories/2162/datasets/08cfcd4158b3a758e72e9bd077a9e44fec9f773b/chapter_3.zip'), ('Chapter 4 datasets', 'https://assets.datacamp.com/production/repositories/2162/datasets/94f2356652dc9ea8f0654b5e9c29645115b6e77f/chapter_4.zip')]","['Supervised Learning with scikit-learn', 'Unsupervised Learning in Python']",https://www.datacamp.com/courses/fraud-detection-in-python,Machine Learning,Python
111,Fraud Detection in R,4,16,49,"2,973","3,900",Fraud Detection in R,"Fraud Detection in R
The Association of Certified Fraud Examiners estimates that fraud costs organizations worldwide $3.7 trillion a year and that a typical company loses five percent of annual revenue due to fraud. Fraud attempts are expected to even increase further in future, making fraud detection highly necessary in most industries. This course will show how learning fraud patterns from historical data can be used to fight fraud. Some techniques from robust statistics and digit analysis are presented to detect unusual observations that are likely associated with fraud. Two main challenges when building a supervised tool for fraud detection are the imbalance or skewness of the data and the various costs for different types of misclassification. We present techniques to solve these issues and focus on artificial and real datasets from a wide variety of fraud applications.
This chapter will first give a formal definition of fraud. You will then learn how to detect anomalies in the type of payment methods used or the time these payments are made to flag suspicious transactions.
In the second chapter, you will learn how to use networks to fight fraud. You will visualize networks and use a sociology concept called homophily to detect fraudulent transactions and catch fraudsters.
Fortunately, fraud occurrences are rare. However, this means that you're working with imbalanced data, which if left as is will bias your detection models. In this chapter, you will tackle imbalance using over and under-sampling methods.
In this final chapter, you will learn about a surprising mathematical law used to detect suspicious occurrences. You will then use robust statistics to make your models even more bulletproof.",[],"['Bart Baesens', 'Sebastiaan Höppner', 'Tim Verdonck', 'Hadrien Lacroix', 'Sara Billen', 'Chester Ismay']","[('Chapter 1 datasets', 'https://assets.datacamp.com/production/repositories/2913/datasets/df95c1b620b0496b485557220a39222788491cb1/chapter_1.zip'), ('Chapter 2 datasets', 'https://assets.datacamp.com/production/repositories/2913/datasets/cd4fb1a9ddaf3c2c6ef3a1e8f3542fa1f10cdf5a/chapter_2.zip'), ('Chapter 3 datasets', 'https://assets.datacamp.com/production/repositories/2913/datasets/1885873fd937a3fa2c94c3581dd8309b81b1e091/chapter_3.zip'), ('Chapter 4 datasets', 'https://assets.datacamp.com/production/repositories/2913/datasets/70e2b476999f68e1b74b4ee321aa30830727817c/chapter_4.zip')]","['Introduction to the Tidyverse', 'Multiple and Logistic Regression', 'Unsupervised Learning in R']",https://www.datacamp.com/courses/fraud-detection-in-r,Machine Learning,R
112,Fundamentals of AI,4,14,49,120,"3,350",Fundamentals of AI,"Fundamentals of AI
So what is all this AI fuss about? Machine Learning, Deep Learning, Predictive Analytics -- what is the reality behind the hype? How do machines actually learn and what are their limits? How can we use Machine Learning to recognize written digits, predict customer churn and find structure in Elon Musk's tweets? All this -- and much more -- is the topic of this course, which will introduce you to the world of AI in a gentle, but firm and very practical manner.
Understand the definition of AI ( “general” and “narrow”), the relationship between AI and Machine Learning, and will the robots take over the world - soon?
Learn about supervised learning, work with labeled data and train regression models.
Learn about unsupervised learning, divide data into clusters, detect anomalies and select the right model for the job.
Learn about deep learning, create your first neural networks, and train a model to recognize digits.",[],"['Nemanja Radojković', 'Hadrien Lacroix', 'Hillary Green-Lerman']","[('Customer Churn', 'https://assets.datacamp.com/production/repositories/3866/datasets/252c7d50740da7988d71174d15184247463d975c/WA_Fn-UseC_-Telco-Customer-Churn.csv'), ('MNIST', 'https://assets.datacamp.com/production/repositories/3866/datasets/28eb967447024b20ba4071bebc1bf2e855ac3ceb/MNIST_5k.csv')]",[],https://www.datacamp.com/courses/fundamentals-of-ai,Machine Learning,Python
113,Fundamentals of Bayesian Data Analysis in R,4,23,58,"8,311","4,450",Fundamentals of Bayesian Data Analysis in R,"Fundamentals of Bayesian Data Analysis in R
Bayesian data analysis is an approach to statistical modeling and machine learning that is becoming more and more popular. It provides a uniform framework to build problem specific models that can be used for both statistical inference and for prediction. This course will introduce you to Bayesian data analysis: What it is, how it works, and why it is a useful tool to have in your data science toolbox.
This chapter will introduce you to Bayesian data analysis and give you a feel for how it works.
In this chapter we will take a detailed look at the foundations of Bayesian inference.
This chapter will show you four reasons why Bayesian data analysis is a useful tool to have in your data science tool belt.
Learn what Bayes theorem is all about and how to use it for statistical inference.
Learn about using the Normal distribution to analyze continuous data and try out a tool for practical Bayesian analysis in R.",[],"['Rasmus Bååth', 'Chester Ismay', 'Nick Solomon']",[],['Introduction to R'],https://www.datacamp.com/courses/fundamentals-of-bayesian-data-analysis-in-r,Probability & Statistics,R
114,GARCH Models in R,4,16,60,"2,093","4,550",GARCH Models in R,"GARCH Models in R
Are you curious about the rhythm of the financial market's heartbeat? Do you want to know when a stable market becomes turbulent? In this course on GARCH models you will learn the forward looking approach to balancing risk and reward in financial decision making. The course gradually moves from the standard normal GARCH(1,1) model to more advanced volatility models with a leverage effect, GARCH-in-mean specification and the use of the skewed student t distribution for modelling asset returns. Applications on stock and exchange rate returns include portfolio optimization, rolling sample forecast evaluation, value-at-risk forecasting and studying dynamic covariances.
We start off by making our hands dirty. A rolling window analysis of daily stock returns shows that its standard deviation changes massively through time. Looking back at the past, we thus have clear evidence of time-varying volatility. Looking forward, we need to estimate the volatility of future returns. This is essentially what a GARCH model does! In this chapter, you will learn the basics of using the rugarch package for specifying and estimating the workhorse GARCH(1,1) model in R. We end by showing its usefulness in tactical asset allocation.
Markets take the stairs up and the elevator down. This Wallstreet wisdom has important consequences for specifying a realistic volatility model. It requires to give up the assumption of normality, as well as the symmetric response of volatility to shocks. In this chapter, you will learn about GARCH models with a leverage effect and skewed student t innovations. At the end, you will be able to use GARCH models for estimating over ten thousand different GARCH model specifications.
GARCH models yield volatility forecasts which serve as input for financial decision making. Their use in practice requires to first evaluate the goodness of the volatility forecast. In this chapter, you will learn about the analysis of statistical significance of the estimated GARCH parameters, the properties of standardized returns, the interpretation of information criteria and the use of rolling GARCH estimation and mean squared prediction errors to analyze the accuracy of the volatility forecast.
At this stage, you master the standard specification, estimation and validation of GARCH models in the rugarch package. This chapter introduces specific rugarch functionality for making value-at-risk estimates, for using the GARCH model in production and for simulating GARCH returns. You will also discover that the presence of GARCH dynamics in the variance has implications for simulating log-returns, the estimation of the beta of a stock and finding the minimum variance portfolio.",[],"['Kris Boudt', 'Hadrien Lacroix', 'Sara Billen', 'Chester Ismay']","[('Daily EUR/USD returns', 'https://assets.datacamp.com/production/repositories/3066/datasets/661d985976cb697abc44abcc2a34170086813dd6/EURUSDret.Rdata'), ('Daily Microsoft returns', 'https://assets.datacamp.com/production/repositories/3066/datasets/5a9a26d972a80e17d6ca316632d36781e6119fc0/msftret.Rdata'), ('S&P 500 prices', 'https://assets.datacamp.com/production/repositories/3066/datasets/39bd6105e3d2f79bb679f9f95426807335f0fd19/sp500prices.Rdata'), ('S&P 500 returns', 'https://assets.datacamp.com/production/repositories/3066/datasets/c3d1811c6fd860f6a9eb3fa97a553d8db855a457/sp500ret.Rdata'), ('Simulated return data', 'https://assets.datacamp.com/production/repositories/3066/datasets/a3261cd3c152d9124c9c0542aabb0c4bd729165d/ret.Rdata')]","['Introduction to Time Series Analysis', 'Manipulating Time Series Data in R with xts & zoo']",https://www.datacamp.com/courses/garch-models-in-r,Applied Finance,R
115,Generalized Linear Models in Python,5,16,59,"1,056","4,950",Generalized Linear Models,"Generalized Linear Models in Python
Imagine being able to handle data where the response variable is either binary, count, or approximately normal, all under one single framework. Well, you don't have to imagine. Enter the Generalized Linear Models in Python course! In this course you will extend your regression toolbox with the logistic and Poisson models, by learning how to fit, understand, assess model performance and finally use the model to make predictions on new data. You will practice using data from real world studies such the largest population poisoning in world's history, nesting of horseshoe crabs and counting the bike crossings on the bridges in New York City.
Review linear models and learn how GLMs are an extension of the linear model given different types of response variables. You will also learn the building blocks of GLMs and the technical process of fitting a GLM in Python.
This chapter focuses on logistic regression. You'll learn about the structure of binary data, the logit link function, model fitting, as well as how to interpret model coefficients, model inference, and how to assess model performance.
Here you'll learn about Poisson regression, including the discussion on count data, Poisson distribution and the interpretation of the model fit. You'll also learn how to overcome problems with overdispersion. Finally, you'll get hands-on experience with the process of model visualization.
In this final chapter you'll learn how to increase the complexity of your model by adding more than one explanatory variable. You'll practice with the problem of multicollinearity, and with treating categorical and interaction terms in your model.",[],"['Ita Cirovic Donev', 'Chester Ismay', 'Adrián Soto']","[('Well switch due to arsenic poisoning', 'https://assets.datacamp.com/production/repositories/4047/datasets/8d608ed3e4e960e9e5d4f1730cb354154faa374f/wells.csv'), ('Nesting of the female horseshoe crab', 'https://assets.datacamp.com/production/repositories/4047/datasets/3dabb99855f48ca92bd8bf123a2cfacfea3ef273/crab.csv'), ('Credit default', 'https://assets.datacamp.com/production/repositories/4047/datasets/a0614614e3917196f66b29ce26d4d5244b85188b/default.csv'), ('Level of salary and years of work experience', 'https://assets.datacamp.com/production/repositories/4047/datasets/7b1962e80528b839bf82e9ed9f1a65968f9aa087/salary.csv'), ('Medical costs per person given age and BMI', 'https://assets.datacamp.com/production/repositories/4047/datasets/9ce36c042c7db33260ef69a27b1918dfc08e7cab/insurance.csv'), ('Bike crossings in New York City', 'https://assets.datacamp.com/production/repositories/4047/datasets/8d069534eea69e7a946dc2f8f9d4f8b594f62d37/bike.csv')]","['Statistical Thinking in Python (Part 2)', 'Introduction to Linear Modeling in Python']",https://www.datacamp.com/courses/generalized-linear-models-in-python,Machine Learning,Python
116,Generalized Linear Models in R,4,14,51,"3,810","4,050",Generalized Linear Models in R,"Generalized Linear Models in R
Linear regression serves as a workhorse of statistics, but cannot handle some types of complex data. A generalized linear model (GLM) expands upon linear regression to include non-normal distributions including binomial and count data. Throughout this course, you will expand your data science toolkit to include GLMs in R. As part of learning about GLMs, you will learn how to fit model binomial data with logistic regression and count data with Poisson regression. You will also learn how to understand these results and plot them with ggplot2.
This chapter teaches you how generalized linear models are an extension of other models in your data science toolbox. The chapter also uses Poisson regression to introduce generalize linear models.
This chapter covers running a logistic regression and examining the model outputs.
This chapter teaches you about interpreting GLM coefficients  and plotting GLMs using ggplot2.
In this chapter, you will learn how to do multiple regression with GLMs in R.",[],"['Richard Erickson', 'Chester Ismay', 'David Campos', 'Shon Inouye']","[('Bus Commuter dataset', 'https://assets.datacamp.com/production/repositories/2698/datasets/e368234a66bbabc19b8da1fb42d3e1027508d710/busData.csv')]",['Multiple and Logistic Regression'],https://www.datacamp.com/courses/generalized-linear-models-in-r,Probability & Statistics,R
117,HR Analytics in Python: Predicting Employee Churn,4,14,44,"3,489","3,500",HR Analytics : Predicting Employee Churn,"HR Analytics in Python: Predicting Employee Churn
Among all of the business domains, HR is still the least disrupted. However, the latest developments in data collection and analysis tools and technologies allow for data driven decision-making in all dimensions, including HR. This course will provide a solid basis for dealing with employee data and developing a predictive model to analyze employee turnover.
In this chapter you will learn about the problems addressed by HR analytics, as well as will explore a sample HR dataset that will further be analyzed. You will describe and visualize some of the key variables, transform and manipulate the dataset to make it ready for analytics.
This chapter introduces one of the most popular classification techniques: the Decision Tree. You will use it to develop an algorithm that predicts employee turnover.
Here, you will learn how to evaluate a model and understand how ""good"" it is. You will compare different trees to choose the best among them.
In this final chapter, you will learn how to use cross-validation to avoid overfitting the training data. You will also learn how to know which features are impactful, and which are negligible. Finally, you will use these newly acquired skills to build a better performing Decision Tree!",[],"['Hrant Davtyan', 'Lore Dirick', 'Nick Solomon']","[('Employee turnover data', 'https://assets.datacamp.com/production/repositories/1765/datasets/ae888d00f9b36dd7d50a4afbc112761e2db766d2/turnover.csv')]",['Intermediate Python for Data Science'],https://www.datacamp.com/courses/hr-analytics-in-python-predicting-employee-churn,Machine Learning,Python
118,Hierarchical and Mixed Effects Models,4,14,54,"5,422","4,600",Hierarchical and Mixed Effects Models,"Hierarchical and Mixed Effects Models
This course begins by reviewing slopes and intercepts in linear regressions before moving on to random-effects. You'll learn what a random effect is and how to use one to model your data. Next, the course covers linear mixed-effect regressions. These powerful models will allow you to explore data with a more complicated structure than a standard linear regression. The course then teaches generalized linear mixed-effect regressions. Generalized linear mixed-effects models allow you to model more kinds of data, including binary responses and count data. Lastly, the course goes over repeated-measures analysis as a special case of mixed-effect modeling. This kind of data appears when subjects are followed over time and measurements are collected at intervals. Throughout the course you'll work with real data to answer interesting questions using mixed-effects models.
The first chapter provides an example of when to use a mixed-effect and also describes the parts of a regression. The chapter also examines a a student test-score dataset with a nested structure to demonstrate mixed-effects.
This chapter providers an introduction to linear mixed-effects models. It covers different types of random-effects, describes how to understand the results for linear mixed-effects models, and goes over different methods for statistical inference with mixed-effects models using crime data from Maryland.
This chapter extends linear mixed-effects models to include non-normal error terms using generalized linear mixed-effects models. By altering the model to include a non-normal error term, you are able to model more kinds of data with non-linear responses. After reviewing generalized linear models, the chapter examines binomial data and count data in the context of mixed-effects models.
This chapter shows how repeated-measures analysis is a special case of mixed-effect modeling. The chapter begins by reviewing paired t-tests and repeated measures ANOVA. Next, the chapter uses a linear mixed-effect model to examine sleep study data. Lastly, the chapter uses a generalized linear mixed-effect model to examine hate crime data from New York state through time.",[],"['Richard Erickson', 'Chester Ismay', 'Nick Solomon']","[('Illinois chlamydia data', 'https://assets.datacamp.com/production/repositories/1803/datasets/612bd6490500636efa74132bfbc37817f250cb5a/ILdata.csv'), ('Maryland crime data', 'https://assets.datacamp.com/production/repositories/1803/datasets/e5e076efd3c3b7665a3180da9f95aaaf671f6a61/MDcrime.csv'), ('Classroom data', 'https://assets.datacamp.com/production/repositories/1803/datasets/975fe2b0190804d854a5da90083364629fb6af2e/classroom.csv'), ('Birth rate data', 'https://assets.datacamp.com/production/repositories/1803/datasets/eb95cb6973afa56c38ba53cfd8058c72f768322f/countyBirthsDataUse.csv'), ('New York hate crime data', 'https://assets.datacamp.com/production/repositories/1803/datasets/45e88fe1bc8d1d76d140e69cb873da9eddb7008e/hateNY.csv')]",['Generalized Linear Models in R'],https://www.datacamp.com/courses/hierarchical-and-mixed-effects-models,Probability & Statistics,R
119,Hierarchical and Recursive Queries in SQL Server,4,13,47,415,"3,800",Hierarchical and Recursive Queries in SQL Server,"Hierarchical and Recursive Queries in SQL Server
Do you want to query complex data structures in an iterative way? Do you have access to hierarchical data structures that need to be queried? This course will teach you the tools required to solve these questions. You will learn how to write recursive queries and query hierarchical data structures. To do this, you will use Common Table Expressions (CTE) and the recursion principle on a wide variety of datasets. You will, for example, dig into a flight plan dataset and learn how to find the best and cheapest connection between two airports. After completing this course, you will understand the principle of recursion, and be able to identify and create hierarchical data models.
In this chapter, you will learn about recursion and why it is beneficial to apply this technique.  You will also refresh your knowledge about Common Expression Tables (CTE).
In this chapter, you will learn about recursive CTEs,  how to query hierarchical datasets,  and finally, how to apply recursive CTEs on hierarchical data.
In this chapter, you will learn how to create and modify database tables. You will learn about relational and hierarchical data models, how they differ, and when each model should be used.
In this chapter, you will practice your learnings about hierarchical and recursive querying on real-world problems, such as finding possible flight routes, assembling a car, and modeling a power grid.",[],"['Dominik Egarter', 'Mona Khalil', 'Sara Billen']",[],['Intermediate SQL Server'],https://www.datacamp.com/courses/hierarchical-and-recursive-queries-in-sql-server,Reporting,SQL
120,Human Resources Analytics in R: Exploring Employee Data,5,16,60,"5,614","4,750",Human Resources Analytics in R: Exploring Employee Data,"Human Resources Analytics in R: Exploring Employee Data
HR analytics, people analytics, workforce analytics -- whatever you call it, businesses are increasingly counting on their human resources departments to answer questions, provide insights, and make recommendations using data about their employees. In this course, you'll learn how to manipulate, visualize, and perform statistical tests on HR data through a series of HR analytics case studies.
In this chapter, you will get an introduction to how data science is used in a human resources context. Then you will dive into a case study where you'll analyze and visualize recruiting data to determine which source of new candidates ultimately produces the best new hires. The dataset you'll use in this and the other chapters in this course is synthetic, to maintain the privacy of actual employees.
Gallup defines engaged employees as those who are involved in, enthusiastic about and committed to their work and workplace.  There is disagreement about the strength of the connection between employee engagement and business outcomes, but the idea is that employees that are more engaged will be more productive and stay with the organization longer. In this chapter, you'll  look into potential reasons that one department's engagement scores are lower than the rest.
When employers make a new hire, they must determine what the new employee will be paid. If the employer is not careful, the new hires can come in with a higher salary than the employees that currently work at the same job, which can cause  employee turnover and dissatisfaction. In this chapter, you will check whether new hires are really getting paid more than current employees, and how to double-check your initial observations.
Performance management helps an organization keep track of which employees are providing extra value, or below-average value, and compensating them accordingly. Whether performance is a rating or the result of a questionnaire, whether employees are rated each year or more often than that, the process is somewhat subjective. An organization should check that ratings are being given with regard to performance, and not individual managers' preferences, or even biases (conscious or subconscious).
In many industries, workplace safety is a critical consideration. Maintaining a safe workplace provides employees with confidence and reduces costs for workers' compensation and legal liabilities. In this chapter, you'll look for  explanations for an increase in workplace accidents.",[],"['Ben Teusch', 'Richie Cotton', 'Sumedh Panchadhar']","[('Recruitment data', 'https://assets.datacamp.com/production/course_5977/datasets/recruitment_data.csv'), ('Survey data', 'https://assets.datacamp.com/production/course_5977/datasets/survey_data.csv'), ('Fair pay data', 'https://assets.datacamp.com/production/course_5977/datasets/fair_pay_data.csv'), ('Performance data', 'https://assets.datacamp.com/production/course_5977/datasets/performance_data.csv'), ('HR data', 'https://assets.datacamp.com/production/course_5977/datasets/hr_data.csv'), ('Accident data', 'https://assets.datacamp.com/production/course_5977/datasets/accident_data.csv'), ('HR data (2)', 'https://assets.datacamp.com/production/course_5977/datasets/hr_data_2.csv'), ('Survey data (2)', 'https://assets.datacamp.com/production/course_5977/datasets/survey_data_2.csv')]","['Introduction to the Tidyverse', 'Correlation and Regression']",https://www.datacamp.com/courses/human-resources-analytics-in-r-exploring-employee-data,Case Studies,R
121,Human Resources Analytics in R: Predicting Employee Churn,4,14,50,"1,917","4,000",Human Resources Analytics in R: Predicting Employee Churn,"Human Resources Analytics in R: Predicting Employee Churn
Organizational growth largely depends on staff retention. Losing employees frequently impacts the morale of the organization and hiring new employees is more expensive than retaining existing ones. Good news is that organizations can increase employee retention using data-driven intervention strategies. This course focuses on data acquisition from multiple HR sources, exploring and deriving new features, building and validating a logistic regression model, and finally, show how to calculate ROI for a potential retention strategy. 
This chapter begins with a general introduction to employee churn/turnover and reasons for turnover as shared by employees. You will learn how to calculate turnover rate and explore turnover rate across different dimensions. You will also identify talent segments for your analysis and bring together relevant data from multiple HR data sources to derive more useful insights.
In this chapter, you will create new variables from existing data to explain employee turnover. You will analyze compensation data and create compa-ratio to measure pay equity of all employees. To identify the most important variables influencing turnover, you will use the concept of Information Value (IV).
In this chapter, you will build a logistic regression model to predict turnover by taking into account multicollinearity among variables.
In this chapter, you will calculate the accuracy of your model and categorize employees into specific risk buckets. You will then formulate an intervention strategy and calculate the ROI for this strategy.",[],"['Abhishek Trehan', 'Anurag Gupta', 'Richie Cotton', 'Sumedh Panchadhar']","[('Employee data', 'https://assets.datacamp.com/production/repositories/1746/datasets/ed764d8978ecdf6d91d2d3f0b5f1efcffe5cb7ec/employee_data.zip')]",['Human Resources Analytics in R: Exploring Employee Data'],https://www.datacamp.com/courses/human-resources-analytics-in-r-predicting-employee-churn,,
122,Hyperparameter Tuning in Python,4,13,44,653,"3,400",Hyperparameter Tuning,"Hyperparameter Tuning in Python
Building powerful machine learning models depends heavily on the set of hyperparameters used. But with increasingly complex models with lots of options, how do you efficiently find the best settings for your particular problem? In this course you will get practical experience in using some common methodologies for automated hyperparameter tuning in Python using Scikit Learn. These include Grid Search, Random Search & advanced optimization methodologies including Bayesian & Genetic algorithms . You will use a dataset predicting credit card defaults as you build skills to dramatically increase the efficiency and effectiveness of your machine learning model building.
In this introductory chapter you will learn the difference between hyperparameters and parameters. You will practice extracting and analyzing parameters, setting hyperparameter values for several popular machine learning algorithms. Along the way you will learn some best practice tips & tricks for choosing which hyperparameters to tune and what values to set & build learning curves to analyze your hyperparameter choices.
This chapter introduces you to a popular automated hyperparameter tuning methodology called Grid Search. You will learn what it is, how it works and practice undertaking a Grid Search using Scikit Learn. You will then learn how to analyze the output of a Grid Search & gain practical experience doing this.
In this chapter you will be introduced to another popular automated hyperparameter tuning methodology called Random Search. You will learn what it is, how it works and importantly how it differs from grid search. You will learn some advantages and disadvantages of this method and when to choose this method compared to Grid Search. You will practice undertaking a Random Search with Scikit Learn as well as visualizing & interpreting the output.
In this final chapter you will be given a taste of more advanced hyperparameter tuning methodologies known as ''informed search''. This includes a methodology known as Coarse To Fine as well as Bayesian & Genetic hyperparameter tuning algorithms. You will learn how informed search differs from uninformed search and gain practical skills with each of the mentioned methodologies, comparing and contrasting them as you go.",[],"['Alex Scriven', 'Hadrien Lacroix', 'Chester Ismay']","[('Credit Card Defaults', 'https://assets.datacamp.com/production/repositories/3983/datasets/bb158f1c76682286f938e02d71de21a3e5389cbf/credit-card-full.csv')]","['Intermediate Python for Data Science', 'Supervised Learning with scikit-learn']",https://www.datacamp.com/courses/hyperparameter-tuning-in-python,Machine Learning,Python
123,Hyperparameter Tuning in R,4,14,47,"2,055","3,500",Hyperparameter Tuning in R,"Hyperparameter Tuning in R
For many machine learning problems, simply running a model out-of-the-box and getting a prediction is not enough; you want the best model with the most accurate prediction. One way to perfect your model is with hyperparameter tuning, which means optimizing the settings for that specific model. In this course, you will work with the caret, mlr and h2o packages to find the optimal combination of hyperparameters in an efficient manner using grid search, random search, adaptive resampling and automatic machine learning (AutoML). Furthermore, you will work with different datasets and tune different supervised learning models, such as random forests, gradient boosting machines, support vector machines, and even neural nets. Get ready to tune!
Why do we use the strange word ""hyperparameter""? What makes it hyper? Here, you will understand what model parameters are, and why they are different from hyperparameters in machine learning. You will then see why we would want to tune them and how the default setting of caret automatically includes hyperparameter tuning.
In this chapter, you will learn how to tune hyperparameters with a Cartesian grid. Then, you will implement faster and more efficient approaches. You will use Random Search and adaptive resampling to tune the parameter grid, in a way that concentrates on values in the neighborhood of the optimal settings.
Here, you will use another package for machine learning that has very convenient hyperparameter tuning functions. You will define a Cartesian grid or perform Random Search, as well as advanced techniques. You will also learn different ways to plot and evaluate models with different hyperparameters.
In this final chapter, you will use h2o, another package for machine learning with very convenient hyperparameter tuning functions. You will use it to train different models and define a Cartesian grid. Then, You will implement a Random Search use stopping criteria. Finally, you will learn AutoML, an  h2o interface which allows for very fast and convenient model and hyperparameter tuning with just one function.",[],"['Shirin  Elsinghorst (formerly Glander)', 'Chester Ismay', 'Hadrien Lacroix']","[('Bc train data', 'https://assets.datacamp.com/production/course_6650/datasets/bc_train_data.csv'), ('Breast cancer data', 'https://assets.datacamp.com/production/course_6650/datasets/breast_cancer_data.csv')]","['Introduction to the Tidyverse', 'Supervised Learning in R: Classification', 'Machine Learning Toolbox']",https://www.datacamp.com/courses/hyperparameter-tuning-in-r,Machine Learning,R
124,Image Processing in Python,4,16,54,562,"4,450",Image Processing,"Image Processing in Python
Images are everywhere! We live in a time where images contain lots of information, which is sometimes difficult to obtain. This is why image pre-processing has become a highly valuable skill, applicable in many use cases. In this course, you will learn to process, transform, and manipulate images at your will, even when they come in thousands. You will also learn to restore damaged images, perform noise reduction, smart-resize images, count the number of dots on a dice, apply facial detection, and much more, using scikit-image. After completing this course, you will be able to apply your knowledge to different domains such as machine learning and artificial intelligence, machine and robotic vision, space and medical image analysis, retailing, and many more. Take the step and dive into the wonderful world that is computer vision!
Jump into digital image structures and learn to process them! Extract data, transform and analyze images using NumPy and Scikit-image. 

With just a few lines of code, you will convert RGB images to grayscale,  get data from them, obtain histograms containing very useful information, and separate objects from the background!
You will learn to detect object shapes using edge detection filters, improve medical images with contrast enhancement and even enlarge pictures to five times its original size! 

You will also apply morphology to make thresholding more accurate when segmenting images and go to the next level of processing images with Python.
So far, you have done some very cool things with your image processing skills!
 
In this chapter, you will apply image restoration to remove objects, logos, text, or damaged areas in pictures! 
You will also learn how to apply noise, use segmentation to speed up processing, and find elements in images by their contours.
After completing this chapter, you will have a deeper knowledge of image processing as you will be able to detect edges, corners, and even faces!  You will learn how to detect not just front faces but also face profiles, cat, or dogs. You will apply your skills to more complex real-world applications.
Learn to master several widely used image processing techniques with very few lines of code!",[],"['Rebeca Saraí González Guerra', 'Hillary Green-Lerman', 'Sara Billen']","[('Images', 'https://assets.datacamp.com/production/repositories/4470/datasets/44adb5b3c76caece2225b30f7660c5e50508d2ee/Image Processing with Python course exercise dataset.zip')]",['Python Data Science Toolbox (Part 2)'],https://www.datacamp.com/courses/image-processing-in-python,Data Visualization,Python
125,Importing & Managing Financial Data in Python,5,16,53,"19,853","4,350",Importing & Managing Financial Data,"Importing & Managing Financial Data in Python
If you want to apply your new 'Python for Data Science' skills to real-world financial data, then this course will give you some very valuable tools.
First, you will learn how to get data out of Excel into pandas and back. Then, you will learn how to pull stock prices from various online APIs like
Google or Yahoo! Finance, macro data from the Federal Reserve, and exchange rates from OANDA. Finally, you will learn how to calculate returns for various time horizons,
analyze stock performance by sector for IPOs, and calculate and summarize correlations.

In this chapter, you will learn how to import, clean and combine data from Excel workbook sheets into a pandas DataFrame. You will also practice grouping data, summarizing information for categories, and visualizing the result using subplots and heatmaps.
You will use data on companies listed on the stock exchanges NASDAQ, NYSE, and AMEX with information on company name, stock symbol, last market capitalization and price, sector or industry group, and IPO year. In Chapter 2, you will build on this data to download and analyze stock price history for some of these companies.
This chapter introduces online data access to Google Finance and the Federal Reserve Data Service through the `pandas` `DataReader`. You will pull data, perform basic manipulations, combine data series, and visualize the results.
In this chapter, you will learn how to capture key characteristics of individual variables in simple metrics. As a result, it will be easier to understand the distribution of the variables in your data set: Which values are central to, or typical of your data? Is your data widely dispersed, or rather narrowly distributed around some mid point? Are there outliers? What does the overall distribution look like?
This chapter introduces the ability to group data by one or more categorical variables, and to calculate and visualize summary statistics for each caategory. In the process, you will learn to compare company statistics for different sectors and IPO vintages, analyze the global income distribution over time, and learn how to create various statistical charts from the seaborn library.",[],"['Stefan Jansen', 'Lore Dirick']","[('Amex listings .csv file', 'https://assets.datacamp.com/production/repositories/993/datasets/2bd8d6c19608fc6f3facbac31021d26a2fcac42f/amex-listings.csv'), ('Income growth .csv file', 'https://assets.datacamp.com/production/repositories/993/datasets/5c79f58382e658b649ed45070976f3c815b69307/income_growth.csv'), ('Listings .xlsx file', 'https://assets.datacamp.com/production/repositories/993/datasets/2dad49608ef2966bcd0fb209bafb3b365271c7c2/listings.xlsx'), ('Nasdaq listings .csv file', 'https://assets.datacamp.com/production/repositories/993/datasets/3e432c89aa85b5782fb16bdc8f16de01699d9885/nasdaq-listings.csv'), ('Per capita income .csv file', 'https://assets.datacamp.com/production/repositories/993/datasets/488a0764add121948fcdc683d29659425f39bfa4/per_capita_income.csv')]","['Introduction to Python', 'Intermediate Python for Data Science']",https://www.datacamp.com/courses/importing-managing-financial-data-in-python,Applied Finance,Python
126,Importing Data in Python (Part 1),3,15,54,"117,438","4,150",Importing Data,"Importing Data in Python (Part 1)
As a data scientist, you will need to clean data, wrangle and munge it, visualize it, build predictive models, and interpret these models. Before you can do so, however, you will need to know how to get data into Python. In this course, you'll learn the many ways to import data into Python: from flat files such as .txt and .csv; from files native to other software such as Excel spreadsheets, Stata, SAS, and MATLAB files; and from relational databases such as SQLite and PostgreSQL.
In this chapter, you'll learn how to import data into Python from all types of flat files, which are a simple and prevalent form of data storage. You've previously learned how to use NumPy and pandas—you will learn how to use these packages to import flat files and customize your imports.
You've learned how to import flat files, but there are many other file types you will potentially have to work with as a data scientist. In this chapter, you'll learn how to import data into Python from a wide array of important file types. These include pickled files, Excel spreadsheets, SAS and Stata files, HDF5 files, a file type for storing large quantities of numerical data, and MATLAB files.
In this chapter, you'll learn how to extract meaningful data from relational databases, an essential skill for any data scientist. You will learn about relational models, how to create SQL queries, how to filter and order your SQL records, and how to perform advanced queries by joining database tables.","['Data Analyst with Python', 'Data Scientist with Python', 'Importing & Cleaning Data with Python', 'Python Programmer']","['Hugo Bowne-Anderson', 'Francisco Castro']","[('Chinook (SQLite)', 'https://assets.datacamp.com/production/repositories/487/datasets/ec8aa8bc9ffea6b4e2729e1a0a2d4aea2f300b3a/Chinook.sqlite'), ('LIGO (HDF5)', 'https://assets.datacamp.com/production/repositories/487/datasets/ab9107b749b832daada36bfaa718d9a591a0d69c/L-L1_LOSC_4_V1-1126259446-32.hdf5'), ('Battledeath (XLSX)', 'https://assets.datacamp.com/production/repositories/487/datasets/5e8897e4624f8577ed0d33aeafbe7bd88bfc424b/battledeath.xlsx'), ('Extent of infectious diseases (DTA)', 'https://assets.datacamp.com/production/repositories/487/datasets/c4129edae533cf2683d8995f6dcdbcf5f41520ba/disarea.dta'), ('Gene expressions (MATLAB)', 'https://assets.datacamp.com/production/repositories/487/datasets/2fc0beea2d8cc7c93d79e79344a6e9e66f65d1fe/ja_data2.mat'), ('MNIST', 'https://assets.datacamp.com/production/repositories/487/datasets/d6d1b84ef06151ff913b4173e2eca8e6d5fa959b/mnist_kaggle_some_rows.csv'), ('Sales (SAS7BDAT)', 'https://assets.datacamp.com/production/repositories/487/datasets/0300d44b3ac77accc4b9706af86e33037bda6861/sales.sas7bdat'), ('Seaslugs', 'https://assets.datacamp.com/production/repositories/487/datasets/07cd090cb965782011a76af72c16b400a5ca5cc0/seaslug.txt'), ('Titanic', 'https://assets.datacamp.com/production/repositories/487/datasets/be79810c4288801167cfb31dbedd396559816ade/titanic_sub.csv')]","['Introduction to Python', 'Intermediate Python for Data Science']",https://www.datacamp.com/courses/importing-data-in-python-part-1,Importing & Cleaning Data,Python
127,Importing Data in Python (Part 2),2,7,29,"68,072","2,400",Importing Data,"Importing Data in Python (Part 2)
As a data scientist, you will need to clean data, wrangle and munge it, visualize it, build predictive models and interpret these models. Before you can do so, however, you will need to know how to get data into Python. In the prequel to this course, you learned many ways to import data into Python: from flat files such as .txt and .csv; from files native to other software such as Excel spreadsheets, Stata, SAS, and MATLAB files; and from relational databases such as SQLite and PostgreSQL. In this course, you'll extend this knowledge base by learning to import data from the web and by pulling data from Application Programming Interfaces— APIs—such as the Twitter streaming API, which allows us to stream real-time tweets.
The web is a rich source of data from which you can extract various types of insights and findings. In this chapter, you will learn how to get data from the web, whether it is stored in files or in HTML. You'll also learn the basics of scraping and parsing web data.
In this chapter, you will gain a deeper understanding of how to import data from the web. You will learn the basics of extracting data from APIs, gain insight on the importance of APIs, and practice extracting data by diving into the OMDB and Library of Congress APIs.
In this chapter, you will consolidate your knowledge of interacting with APIs in a deep dive into the Twitter streaming API. You'll learn how to stream real-time Twitter data, and how to analyze and visualize it.","['Data Analyst with Python', 'Data Scientist with Python', 'Importing & Cleaning Data with Python', 'Python Programmer']","['Hugo Bowne-Anderson', 'Francisco Castro']","[('Latitudes (XLS)', 'https://assets.datacamp.com/production/repositories/488/datasets/b422ace2fceada7b569e0ba3e8d833fddc684c4d/latitude.xls'), ('Tweets', 'https://assets.datacamp.com/production/repositories/488/datasets/3ef452f83a91556ea4284624b969392c0506fb33/tweets3.txt'), ('Red wine quality', 'https://assets.datacamp.com/production/repositories/488/datasets/013936d2700e2d00207ec42100d448c23692eb6f/winequality-red.csv')]","['Introduction to Python', 'Intermediate Python for Data Science', 'Importing Data in Python (Part 1)']",https://www.datacamp.com/courses/importing-data-in-python-part-2,Importing & Cleaning Data,Python
128,Importing Data in R (Part 1),3,11,42,"92,765","3,550",Importing Data in R,"Importing Data in R (Part 1)
Importing data into R should be the easiest step in your analysis. Unfortunately, that is almost never the case. Data can come in many formats, ranging from .csv and text files, to statistical software files, to databases and HTML data. Knowing which approach to use is key to getting started with the actual analysis.
In this course, you’ll start by learning how to read .csv and text files in R. You will then cover the readr and data.table packages to easily and efficiently import flat file data. After that, you will learn how to read .xls files in R using readxl and gdata.
A lot of data comes in the form of flat files: simple tabular text files. Learn how to import the common formats of flat file data with base R functions.
In addition to base R, there are dedicated packages to easily and efficiently import flat file data. We'll talk about two such packages: readr and data.table.
Excel is a widely used data analysis tool. If you prefer to do your analyses in R, though, you'll need an understanding of how to import  .csv data into R. This chapter will show you how to use readxl and gdata to do so.
Beyond importing data from Excel, you can take things one step further with XLConnect. Learn all about it and bridge the gap between R and Excel.","['Data Analyst with R', 'Data Scientist with R', 'Importing & Cleaning Data with R']",['Filip Schouwenaars'],"[('Hotdogs', 'https://assets.datacamp.com/production/repositories/453/datasets/3e5a732b4467c1cbed6a8e8e7a1c9eec3fc86c58/hotdogs.txt'), ('Potatoes (CSV)', 'https://assets.datacamp.com/production/repositories/453/datasets/b47d250de5379914100e28075556fb24e55ca2cd/potatoes.csv'), ('Potatoes (TSV)', 'https://assets.datacamp.com/production/repositories/453/datasets/d78f476c64cf9bc91d4467ff64769afd64d4b450/potatoes.txt'), ('Swimming pools', 'https://assets.datacamp.com/production/repositories/453/datasets/0badb39b50c7daf000698efbca476716db7c1a6f/swimming_pools.csv'), ('Urban population (XLS)', 'https://assets.datacamp.com/production/repositories/453/datasets/ae595b67772d71e79ea9c25897192ba49dcb2b81/urbanpop.xls'), ('Urban population (XLSX)', 'https://assets.datacamp.com/production/repositories/453/datasets/775623dcd2ee9b07bff5b034edba3137bb24b748/urbanpop.xlsx')]",['Introduction to R'],https://www.datacamp.com/courses/importing-data-in-r-part-1,Importing & Cleaning Data,R
129,Importing Data in R (Part 2),3,10,48,"43,872","3,950",Importing Data in R,"Importing Data in R (Part 2)

Many companies store their information in relational databases. The R community has also developed R packages to get data from these architectures. You'll learn how to connect to a database and how to retrieve data from it.
Importing an entire table from a database while you might only need a tiny bit of information seems like a lot of unncessary work. In this chapter, you'll learn about SQL queries, which will help you make things more efficient by performing some computations on the database side.
More and more of the information that data scientists are using resides on the web. Importing this data into R requires an understanding of the protocols used on the web. In this chapter, you'll get a crash course in HTTP and learn to perform your own HTTP requests from inside R.
Importing data from the web is one thing; actually being able to extract useful information is another. Learn more about the JSON format to get one step closer to web domination.
Next to R, there are also other commonly used statistical software packages: SAS, STATA and SPSS. Each of them has their own file format. Learn how to use the haven and foreign packages to get them into R with remarkable ease!","['Data Analyst with R', 'Data Scientist with R', 'Importing & Cleaning Data with R']",['Filip Schouwenaars'],"[('Education equality data', 'https://assets.datacamp.com/production/repositories/454/datasets/c326824a049fa32779b2e06a0b3cab25c0055716/edequality.dta'), ('Employee data', 'https://assets.datacamp.com/production/repositories/454/datasets/7d9358f29b6b1a50c641ca11192d7ca383f7a19f/employee.sav'), ('Florida election data', 'https://assets.datacamp.com/production/repositories/454/datasets/3d5db3972c085c8f9bb99239ddd78f60aeff8300/florida.dta'), ('International socio-economic data', 'https://assets.datacamp.com/production/repositories/454/datasets/9a7178d07a670ab0dd88aa1f4d9d806948acdd43/international.sav'), ('Latitude (XLS)', 'https://assets.datacamp.com/production/repositories/454/datasets/b422ace2fceada7b569e0ba3e8d833fddc684c4d/latitude.xls'), ('Latitude (XLSX)', 'https://assets.datacamp.com/production/repositories/454/datasets/257641a69f9f56700a11be661315e285b6e61091/latitude.xlsx'), ('Big Five data', 'https://assets.datacamp.com/production/repositories/454/datasets/8919ed67a6692ad4474df6414a39f9749b24278e/person.sav'), ('Potatoes', 'https://assets.datacamp.com/production/repositories/454/datasets/3c295cdad28103efca12907eddda0acb15d2a2b8/potatoes.txt'), ('Sales data', 'https://assets.datacamp.com/production/repositories/454/datasets/1ce18d1211c51ef3d083d4e2881c9c056eada5ed/sales.sas7bdat'), ('Swimming pools', 'https://assets.datacamp.com/production/repositories/454/datasets/0badb39b50c7daf000698efbca476716db7c1a6f/swimming_pools.csv'), ('Sugar import data', 'https://assets.datacamp.com/production/repositories/454/datasets/fe0bdbfa768a4dc8ee6414fc40139bf47b60a7fb/trade.dta'), ('Water data', 'https://assets.datacamp.com/production/repositories/454/datasets/c189c407928639e85031c42483743f7edd2d6111/water.csv'), ('Wine data', 'https://assets.datacamp.com/production/repositories/454/datasets/f62786f0dab58bedeefe6af6ee9250a8cd8daa35/wine.RData')]",['Importing Data in R (Part 1)'],https://www.datacamp.com/courses/importing-data-in-r-part-2,Importing & Cleaning Data,R
130,Importing and Managing Financial Data in R,5,15,57,"9,985","4,850",Importing and Managing Financial Data in R,"Importing and Managing Financial Data in R
If you've ever done anything with financial or economic time series, you know the data come in various shapes, sizes, and periodicities. Getting the data into R can be stressful and time-consuming, especially when you need to merge data from several different sources into one data set. This course will cover importing data from local files as well as from internet sources.
A wealth of financial and economic data are available online. Learn how getSymbols() and Quandl() make it easy to access data from a variety of sources.
You've learned how to import data from online sources, now it's time to see how to extract columns from the imported data. After you've learned how to extract columns from a single object, you will explore how to import, transform, and extract data from multiple instruments.
Learn how to simplify and streamline your workflow by taking advantage of the ability to customize default arguments to `getSymbols()`. You will see how to customize defaults by data source, and then how to customize defaults by symbol. You will also learn how to handle problematic instrument symbols.
You've learned how to import, extract, and transform data from multiple data sources. You often have to manipulate data from different sources in order to combine them into a single data set. First, you will learn how to convert sparse, irregular data into a regular series.  Then you will review how to aggregate dense data to a lower frequency. Finally, you will learn how to handle issues with intra-day data.
You've learned the core workflow of importing and manipulating financial data. Now you will see how to import data from text files of various formats. Then you will learn how to check data for weirdness and handle missing values. Finally, you will learn how to adjust stock prices for splits and dividends.","['Finance Basics with R', 'Quantitative Analyst with R']","['Joshua Ulrich', 'Lore Dirick', 'Davis Vaughan']","[('Amazon CSV file', 'https://assets.datacamp.com/production/repositories/389/datasets/ce26cee08d14cb53379495add3045ed98b5e3c66/AMZN.csv'), ('DC data', 'https://assets.datacamp.com/production/repositories/389/datasets/04adfd61735fae4293df91302c83e4fa77ee1a59/DC.RData'), ('UNE CSV file', 'https://assets.datacamp.com/production/repositories/389/datasets/69be01cc2dc9342b822c554dddc07d29c270a41f/UNE.csv'), ('two_symbols CSV file', 'https://assets.datacamp.com/production/repositories/389/datasets/6656e8b791f96452761dc709ceca5c0484994ca9/two_symbols.csv')]","['Introduction to R for Finance', 'Intermediate R for Finance', 'Manipulating Time Series Data in R with xts & zoo']",https://www.datacamp.com/courses/importing-and-managing-financial-data-in-r,Applied Finance,R
131,Improving Query Performance in PostgreSQL,4,15,53,854,"4,300",Improving Query Performance in PostgreSQL,"Improving Query Performance in PostgreSQL
Losing time on slow queries? Hesitant to share your queries with more seasoned coworkers? In this course, you will learn how to structure your PostgreSQL to run in a fraction of the time. Exploring intertwined data relating Olympic participation, country climate, and gross domestic product, you will experience firsthand how changes in filtering method and using subqueries impact query performance. You will learn the properties of a row oriented database while also seeing how Hawaii's volcanos impact air quality. Restructuring your queries with the query planner and the SQL order of operations, you will soon be dazzling your coworkers with your effortless efficiency.
Bundle up as you dive into the Winter Olympics! You will learn how to join, subquery, and create temporary tables while finding which Olympic athletes brave sub-freezing temperatures to train. You will also learn about the query planner and how its functionality can guide your SQL structure to faster queries.
Dig up those past algebra memories while learning the SQL order of operations. Find which countries ""should"" have the most athletes by looking at population and gross domestic product (GDP) while learning the best way to filter. You will also learn when your query aggregates (sums, counts, etc.) and how you can structure your query to optimize this process.
Zero in on the properties that improve database performance. Discover when your table is not a table but a view. Learn how your database's storage structure (row or column oriented) impacts your query structure. You will explore volcanic smog while using partitions and indexes to speed your queries.
Learn the lingo of the Query Lifecycle and dive into the query planner. Explore how the query planner creates and optimizes the query plan. Find your next vacation locale by looking for countries with recent population growth while also seeing how a join impacts the query steps. Fine tune your optimization techniques by seeing how different filters speed your query times.",[],"['Amy McCarty', 'Mona Khalil', 'Becca Robins']","[('GDP', 'https://assets.datacamp.com/production/repositories/4297/datasets/f7b2dc67088b46263792d6358b67b2ac6cee1432/population_gdp_transposed.csv'), ('Olympic Athletes', 'https://assets.datacamp.com/production/repositories/4297/datasets/199f66ce5b9d899a2609284547607078f4908990/olympic_athletes_2016_14.csv'), ('Olympic Regions', 'https://assets.datacamp.com/production/repositories/4297/datasets/64e4e1c14554cbb8cd115485d4301f92f1cbbd17/olympic_regions.csv'), ('AQI', 'https://assets.datacamp.com/production/repositories/4297/datasets/ae89ea124b77507cefe318f6499318571e63a88f/annual_aqi_by_county_2018.csv')]","['Joining Data in SQL', 'Intermediate SQL']",https://www.datacamp.com/courses/improving-query-performance-in-postgresql,Data Manipulation,SQL
132,Improving Query Performance in SQL Server,4,16,58,"1,388","4,450",Improving Query Performance in SQL Server,"Improving Query Performance in SQL Server
A mission critical assignment is depending on your SQL coding skills. You’ve been given some code to fix.  It is giving the results you need but it’s running too slow, and it’s poorly formatted making it hard to read.  The deadline is tomorrow.  You’ll need to reformat the code and try different methods to improve performance.  The pressure is on!!!  In this course we’ll be using SQL on real world datasets, from sports and geoscience, to look at good coding practices and different ways how we can can improve the performance of queries to achieve the same outcome.
In this chapter, students will learn how SQL code formatting, commenting, and aliasing is used to make queries easy to read and understand.  Students will also be introduced to query processing order in the database versus the order of the SQL syntax in a query.
This chapter introduces filtering with WHERE and HAVING and some best practices for how (and how not) to use these keywords. Next, it explains the methods used to interrogate data and the effects these may have on performance.  Finally, the chapter goes over the roles of DISTINCT() and UNION in removing duplicates and their potential effects on performance.
This chapter is an introduction to sub-queries and their potential impacts on query performance.  It also examines the different methods used to determine if the data in one table is present, or absent, in a related table.
Students are introduced to how STATISTICS TIME, STATISTICS IO, indexes, and executions plans can be used in SQL Server to help analyze and tune query performance.",[],"['Dean Smith', 'Mona Khalil', 'Becca Robins', 'Marianna Lamnina']","[('Orders dataset', 'https://assets.datacamp.com/production/repositories/4005/datasets/751fbc814728455952b3f12df8d4bd90abf4696b/Orders.csv'), ('NBAPlayers dataset', 'https://assets.datacamp.com/production/repositories/4005/datasets/f7dc8389514bc1d366e380d76bffc6bfc9be179b/NBAPlayers.csv'), ('NBATeams dataset', 'https://assets.datacamp.com/production/repositories/4005/datasets/c70021a4a78c360198ada1231b45a8521aced7f5/NBATeams.csv'), ('NBAPlayersStatistics dataset', 'https://assets.datacamp.com/production/repositories/4005/datasets/de2a75358e326167eec3a0077dafac33b0f204f5/NBAPlayerStatistics.csv')]",['Intermediate SQL Server'],https://www.datacamp.com/courses/improving-query-performance-in-sql-server,Data Manipulation,SQL
133,Improving Your Data Visualizations in Python,4,15,54,"3,100","4,650",Improving Your Data Visualizations,"Improving Your Data Visualizations in Python
Great data visualization is the cornerstone of impactful data science. Visualization helps you to both find insight in your data and share those insights with your audience. Everyone learns how to make a basic scatter plot or bar chart on their journey to becoming a data scientist, but the true potential of data visualization is realized when you take a step back and think about what, why, and how you are visualizing your data. In this course you will learn how to construct compelling and attractive visualizations that help you communicate the results of your analyses efficiently and effectively. We will cover comparing data, the ins and outs of color, showing uncertainty, and how to build the right visualization for your given audience through the investigation of a datasets on air pollution around the US and farmer's markets. We will finish the course by examining open-access farmers market data to build a polished and impactful visual report.
How do you show all of your data while making sure that viewers don't miss an important point or points? Here we discuss how to guide your viewer through the data with color-based highlights and text. We also introduce a dataset on common pollutant values across the United States.
Color is a powerful tool for encoded values in data visualization. However, with this power comes danger. In this chapter, we talk about how to choose an appropriate color palette for your visualization based upon the type of data it is showing.
Uncertainty occurs everywhere in data science, but it's frequently left out of visualizations where it should be included. Here, we review what a confidence interval is and how to visualize them for both single estimates and continuous functions. Additionally, we discuss the bootstrap resampling technique for assessing uncertainty and how to visualize it properly.
Often visualization is taught in isolation, with best practices only discussed in a general way. In reality, you will need to bend the rules for different scenarios.  From messy exploratory visualizations to polishing the font sizes of your final product; in this chapter, we dive into how to optimize your visualizations at each step of a data science workflow.",['Data Visualization with Python'],"['Nicholas Strayer', 'Hillary Green-Lerman', 'Becca Robins']","[('State populations dataset', 'https://assets.datacamp.com/production/repositories/3841/datasets/f0dbd061f3851ac130cf2f8ad6b3f28f1d19c1fd/census-state-populations.csv'), (""U.S. farmer's markets dataset"", 'https://assets.datacamp.com/production/repositories/3841/datasets/efdbc5d7c7b734f0b091d924605c4ad2664ef830/markets_cleaned.csv'), ('Pollution dataset', 'https://assets.datacamp.com/production/repositories/3841/datasets/a6b11493e11dd47f3e03e0b96e2a2dbc51f03cb2/pollution_wide.csv')]","['Python Data Science Toolbox (Part 1)', 'Python Data Science Toolbox (Part 2)', 'Introduction to Data Visualization with Python', 'Data Visualization with Seaborn']",https://www.datacamp.com/courses/improving-your-data-visualizations-in-python,Data Visualization,Python
134,Inference for Categorical Data,4,14,53,"1,549","4,000",Inference Categorical Data,"Inference for Categorical Data
Categorical data is all around us. It's in the latest opinion polling numbers, in the data that lead to new breakthroughs in genomics, and in the troves of data that internet companies collect to sell products to you. In this course you'll learn techniques for parsing the signal from the noise; tools for identifying when structure in this data represents interesting phenomena and when it is just random noise.
In this chapter you will learn how to perform statistical inference on a single parameter that describes categorical data. This includes both resampling based methods and approximation based methods for a single proportion.
This chapter dives deeper into performing hypothesis tests and creating confidence intervals for a single parameter. Then, you'll learn how to perform inference on a difference between two proportions. Finally, this chapter wraps up with an exploration of what happens when you know the null hypothesis is true.
This part of the course will teach you how to use both resampling methods and classical methods to test for the indepence of two categorical variables. This chapter covers how to perform a Chi-squared test.
The course wraps up with two case studies using election data. Here, you'll learn how to use a Chi-squared test to check goodness-of-fit. You'll study election results from Iran and Iowa and test if Benford's law applies to these datasets.",['Statistical Inference with R'],"['Andrew Bray', 'Nick Solomon', 'Benjamin  Feder', 'Jonathan Ng']","[('GSS data', 'https://assets.datacamp.com/production/repositories/1703/datasets/622fb3f93aa52cac9da874699feb95911eba8abd/gss.RData'), ('Iowa election data', 'https://assets.datacamp.com/production/repositories/1703/datasets/3e73a6c4432671bff5e6f05d340ac1ee41f2ba76/iowa.csv'), ('Iran election data', 'https://assets.datacamp.com/production/repositories/1703/datasets/a777b2366f4e576da5d58fda42f8337332acd3ae/iran.csv')]",['Foundations of Inference'],https://www.datacamp.com/courses/inference-for-categorical-data,Probability & Statistics,R
135,Inference for Linear Regression,4,15,59,"4,145","4,650",Inference Linear Regression,"Inference for Linear Regression
Previously, you learned the fundamentals of both statistical inference and linear models; now, the next step is to put them together.  This course gives you a chance to think about how different samples can produce different linear models, where your goal is to understand the underlying population model.  From the estimated linear model, you will learn how to create interval estimates for the effect size as well as how to determine if the effect is significant.  Prediction intervals for the response variable will be contrasted with estimates of the average response. Throughout the course, you'll gain more practice with the dplyr and ggplot2 packages, and you will learn about the broom package for tidying models; all three packages are invaluable in data science.
In the first chapter, you will understand how and why to perform inferential (instead of descriptive only) analysis on a regression model.
In this chapter you will learn about the ideas of the sampling distribution using simulation methods for regression models.
In this chapter you will learn about how to use the t-distribution to perform inference in linear regression models. You will also learn about how to create prediction intervals for the response variable.
Additionally, you will consider the technical conditions that are important when using linear models to make claims about a larger population.
This chapter covers topics that build on the basic ideas of inference in linear models, including multicollinearity and inference for multiple regression models.",['Statistical Inference with R'],"['Jo Hardin', 'Nick Carchedi', 'Nick Solomon']","[('LA home price data', 'https://assets.datacamp.com/production/repositories/848/datasets/96a4003545f7eb48e1c14b855df9a97ab8c84b1d/LAhomes.csv'), ('NYC restaurant data', 'https://assets.datacamp.com/production/repositories/848/datasets/4ff34a40bd4e636556494f83cf40bdc10c33d49e/restNYC.csv'), ('Twin data', 'https://assets.datacamp.com/production/repositories/848/datasets/84f9e42a9041695d790dfe2b5e1b6e22fc3f0118/twins.csv')]","['Foundations of Inference', 'Multiple and Logistic Regression']",https://www.datacamp.com/courses/inference-for-linear-regression,Probability & Statistics,R
136,Inference for Numerical Data,4,15,49,"3,360","3,650",Inference Numerical Data,"Inference for Numerical Data
In this course, you'll learn how to use statistical techniques to make inferences and estimations using numerical data. This course uses two approaches to these common tasks. The first makes use of bootstrapping and permutation to create resample based tests and confidence intervals. The second uses theoretical results and the t-distribution to achieve the same result. You'll learn how (and when) to perform a t-test, create a confidence interval, and do an ANOVA!
In this chapter you'll use bootstrapping techniques to estimate a single parameter from a numerical distribution.
In this chapter you'll use Central Limit Theorem based techniques to estimate a single parameter from a numerical distribution. You will do this using the t-distribution.
In this chapter you'll extend what you have learned so far to use both simulation and CLT based techniques for inference on the difference between two parameters from two independent numerical distributions.
In this chapter you will use ANOVA (analysis of variance) to test for a difference in means across many groups.",['Statistical Inference with R'],"['Mine Cetinkaya-Rundel', 'Nick Carchedi', 'Nick Solomon']","[('Chp1-vid1-boot-dist-noaxes-parantheses', 'https://assets.datacamp.com/production/repositories/846/datasets/dc24f53d92a90863666f2e47827049caff156ccd/chp1-vid1-boot-dist-noaxes-parantheses.png'), ('Chp1-vid1-bootsamp-bootpop.001', 'https://assets.datacamp.com/production/repositories/846/datasets/641ae10cf7121130f50eb499f8daf2a7412c608d/chp1-vid1-bootsamp-bootpop.001.png'), ('Chp1-vid1-manhattan-rents', 'https://assets.datacamp.com/production/repositories/846/datasets/5b08f701debd264bf33d50ca7771617547516948/chp1-vid1-manhattan-rents.png'), ('Chp1-vid2-boot-dist-withaxes', 'https://assets.datacamp.com/production/repositories/846/datasets/28abc3cfc4421c4c000b460783cbadf28b695fa7/chp1-vid2-boot-dist-withaxes.png'), ('Chp1-vid2-perc-method.001', 'https://assets.datacamp.com/production/repositories/846/datasets/b56d4018323b33273a967acf0e0c6c56ad00a10e/chp1-vid2-perc-method.001.png'), ('Chp1-vid2-perc-method.002', 'https://assets.datacamp.com/production/repositories/846/datasets/36b43cc273e9fe611f2dd1a30973e9eda65fa861/chp1-vid2-perc-method.002.png'), ('Chp1-vid3-boot-test.001', 'https://assets.datacamp.com/production/repositories/846/datasets/4b5aa2f1d1d29d48ceb141df373313d2d26854c0/chp1-vid3-boot-test.001.png'), ('Chp3-vid3-hrly-rate-citizen-smaller', 'https://assets.datacamp.com/production/repositories/846/datasets/53b1c749b8fb60eeb1efc53fbbed5ac92d4a2e23/chp3-vid3-hrly-rate-citizen-smaller.png'), ('Chp3-vid3-hrly-rate-citizen', 'https://assets.datacamp.com/production/repositories/846/datasets/53b1c749b8fb60eeb1efc53fbbed5ac92d4a2e23/chp3-vid3-hrly-rate-citizen.png'), ('Chp4-vid1-class-bar', 'https://assets.datacamp.com/production/repositories/846/datasets/88de2cfbf76339e941b124df6bff1ad656f4bca8/chp4-vid1-class-bar.png'), ('Chp4-vid1-wodrsum-hist', 'https://assets.datacamp.com/production/repositories/846/datasets/5f946e0df3d682b4b92db4044cacd7bb08177409/chp4-vid1-wodrsum-hist.png'), ('Gss moredays', 'https://assets.datacamp.com/production/repositories/846/datasets/408f7effbe5b0ef743439636d9aae9aa27a149aa/gss_moredays.csv'), ('GSS data', 'https://assets.datacamp.com/production/repositories/846/datasets/1c0f04aae31ed37d453234a2b373315609492a7e/gss_wordsum_class.csv'), ('Manhattan rent data', 'https://assets.datacamp.com/production/repositories/846/datasets/bd62fb71666052ffe398d85e628eae9d0339c9c4/manhattan.csv'), ('Runners.001', 'https://assets.datacamp.com/production/repositories/846/datasets/7903500ef451067b1df953ec4f340d21beb55e92/runners.001.png'), ('Tdistcomparetonormaldist', 'https://assets.datacamp.com/production/repositories/846/datasets/9ef15957c776618902282d81ba7b9612d8cbbb72/tDistCompareToNormalDist.png')]",['Foundations of Inference'],https://www.datacamp.com/courses/inference-for-numerical-data,Probability & Statistics,R
137,Interactive Data Visualization with Bokeh,4,17,63,"34,569","5,100",Interactive Data Visualization Bokeh,"Interactive Data Visualization with Bokeh
Bokeh is an interactive data visualization library for Python—and other languages—that targets modern web browsers for presentation. It can create versatile, data-driven graphics and connect the full power of the entire Python data science stack to create rich, interactive visualizations.
This chapter provides an introduction to basic plotting with Bokeh. You will create your first plots, learn about different data formats Bokeh understands, and make visual customizations for selections and mouse hovering.
Learn how to combine multiple Bokeh plots into different kinds of layouts on a page, how to easily link different plots together, and how to add annotations such as legends and hover tooltips.
Bokeh server applications allow you to connect all of the powerful Python libraries for data science and analytics, such as NumPy and pandas to create rich, interactive Bokeh visualizations. Learn about Bokeh's built-in widgets, how to add them to Bokeh documents alongside plots, and how to connect everything to real Python code using the Bokeh server.
In this final chapter, you'll build a more sophisticated Bokeh data exploration application from the ground up based on the famous Gapminder dataset.","['Data Scientist with Python', 'Data Visualization with Python']","['Team Anaconda', 'Yashas Roy', 'Hugo Bowne-Anderson']","[('AAPL stock', 'https://assets.datacamp.com/production/repositories/401/datasets/313eb985cce85923756a128e49d7260a24ce6469/aapl.csv'), ('Automobile miles per gallon', 'https://assets.datacamp.com/production/repositories/401/datasets/2a776ae9ef4afc3f3f3d396560288229e160b830/auto-mpg.csv'), ('Gapminder', 'https://assets.datacamp.com/production/repositories/401/datasets/09378cc53faec573bcb802dce03b01318108a880/gapminder_tidy.csv'), ('Blood glucose levels', 'https://assets.datacamp.com/production/repositories/401/datasets/edcedae3825e0483a15987248f63f05a674244a6/glucose.csv'), ('Female literacy and birth rate', 'https://assets.datacamp.com/production/repositories/401/datasets/5aae6591ddd4819dec17e562f206b7840a272151/literacy_birth_rate.csv'), ('Olympic medals (100m sprint)', 'https://assets.datacamp.com/production/repositories/401/datasets/68b7a450b34d1a331d4ebfba22069ce87bb5625d/sprint.csv')]","['Introduction to Python', 'Intermediate Python for Data Science']",https://www.datacamp.com/courses/interactive-data-visualization-with-bokeh,Data Visualization,Python
138,Interactive Data Visualization with plotly in R,4,15,54,"3,085","4,600",Interactive Data Visualization plotly in R,"Interactive Data Visualization with plotly in R
Interactive graphics allow you to manipulate plotted data to gain further insight. As an example, an interactive graphic would allow you to zoom in on a subset of your data without the need to create a new plot. In this course, you will learn how to create and customize interactive graphics in plotly using the R programming language. Along the way, you will review data visualization best practices and be introduced to new plot types such as scatterplot matrices and binned scatterplots.
In this chapter, you will receive an introduction to basic graphics with plotly. You will create your first interactive graphics, displaying both univariate and bivariate distributions. Additionally, you will discover how to easily convert ggplot2 graphics to interactive plotly graphics.
In this chapter, you will learn how to customize the appearance of your graphics and use opacity, symbol, and color to clarify your message. You will also learn how to transform axes, label your axes, and customize the hover information of your graphs.
In this chapter, you move past basic plotly charts to explore more-complex relationships and larger datasets. You will learn how to layer traces, create faceted charts and scatterplot matrices, and create binned scatterplots.
In the final chapter, you use your plotly toolkit to explore the results of the 2018 United States midterm elections, learning how to create maps in plotly along the way.",['Interactive Data Visualization in R'],"['Adam Loy', 'Chester Ismay', 'David Campos', 'Shon Inouye']","[('Video game sales and ratings dataset', 'https://assets.datacamp.com/production/repositories/1792/datasets/2396f3f587e31ea726911e5d8974c5f98db5eee1/vgsales.csv'), ('Wine datasets', 'https://assets.datacamp.com/production/repositories/1792/datasets/df77160d2b3c71dded411ea6ab0910ca0be93045/wine_data.zip'), ('Midterm election datasets', 'https://assets.datacamp.com/production/repositories/1792/datasets/235e75c27821684690bb0ad9f3461b4d7ba89740/election_data.zip')]",['Introduction to the Tidyverse'],https://www.datacamp.com/courses/interactive-data-visualization-with-plotly-in-r,Data Visualization,R
139,Interactive Data Visualization with rbokeh,4,12,47,"1,212","4,000",Interactive Data Visualization rbokeh,"Interactive Data Visualization with rbokeh
Data visualization is an integral part of the data analysis process. This course will get you introduced to rbokeh: a visualization library for interactive web-based plots. You will learn how to use rbokeh layers and options to create effective visualizations that carry your message and emphasize your ideas. We will focus on the two main pieces of data visualization: wrangling data in the appropriate format as well as employing the appropriate visualization tools, charts and options from rbokeh.
In this chapter we get introduced to rbokeh layers. You will learn how to specify data and arguments to create the desired plot and how to combine multiple layers in one figure.
In this chapter you will learn how to customize your rbokeh figures using aesthetic attributes and figure options. You will see how aesthetic attributes such as color, transparancy and shape can serve a purpose and add more info to your visualizations. In addition, you will learn how to activate the tooltip and specify the hover info in your figures.
In this chapter, you will learn how to put your data in the right format to fit the desired figure. And how to transform between the wide and long formats. You will also see how to combine normal layers with regression lines. In addition you will learn how to customize the interaction tools that appear with each figure.
In this chapter you will learn how to combine multiple plots in one layout using grid plots. In addition, you will learn how to create interactive maps.",['Interactive Data Visualization in R'],"['Omayma Said', 'David Campos', 'Shon Inouye']","[('Human Development Index dataset', 'https://assets.datacamp.com/production/repositories/2062/datasets/25d3cc40fcd74d60135d47242462188054e7e6a1/hdi_data.csv'), ('Corruption Perception Index dataset', 'https://assets.datacamp.com/production/repositories/2062/datasets/8487d81d372e11c6e62373984f98d7c86360a059/hdi_cpi_2015.csv'), ('Tuberculosis Cases dataset', 'https://assets.datacamp.com/production/repositories/2062/datasets/77f871007000492983828f077b8f2d2566eb31c4/tb_tidy.csv'), ('New York Citi Bike Trips dataset', 'https://assets.datacamp.com/production/repositories/2062/datasets/de34f1073c85cae62cea5c887a3f927671828029/ny_bikedata.csv')]",['Introduction to the Tidyverse'],https://www.datacamp.com/courses/interactive-data-visualization-with-rbokeh,Data Visualization,R
140,Interactive Maps with leaflet in R,4,16,55,"4,859","4,500",Interactive Maps leaflet in R,"Interactive Maps with leaflet in R
Get ready to have some fun with maps! Interactive Maps with leaflet in R will give you the tools to make attractive and interactive web maps using spatial data and the tidyverse. In this course, you will create maps using the IPEDS dataset, which contains data on U.S. colleges and universities. Along the way, you will customize our maps using labels, popups, and custom markers, and add layers to enhance interactivity. Following the course, you will be able to create and customize your own interactive web maps to reveal patterns in your data.
Chapter 1 will introduce students to the htmlwidgets package and the leaflet package. Following this introduction, students will build their first interactive web map using leaflet. Through the process of creating this first map students will be introduced to many of the core features of the leaflet package, including adding different map tiles, setting the center point and zoom level, plotting single points based on latitude and longitude coordinates, and storing leaflet maps as objects. Chapter 1 will conclude with students geocoding DataCamp’s headquarters, and creating a leaflet map that plots the headquarters and displays a popup describing the location.
In chapter 2 students will build on the leaflet map they created in chapter 1 to create an interactive web map of every four year college in California. After plotting hundreds of points on an interactive leaflet map, students will learn to customize the markers on their leaflet map. This chapter will also how to color code markers based on a factor variable.
In chapter 3 students will expand on their map of all four year colleges in California to create a map of all American colleges. First, in section 3.1 students will review and build on the material from Chapter 2 to create a map of all American colleges. Then students will re-plot the colleges on their leaflet map by sector (public, private, or for-profit) using groups to enable users to toggle the colleges that are displayed on the map. In section 3.3 students will learn to add multiple base maps so that users can toggle between multiple map tiles.
In Chapter 4 students will learn to map polygons, which can be used to define geographic regions (e.g., zip codes, states, countries, etc.).  Chapter 4 will start by plotting the zip codes in North Carolina that fall in the top quartile of mean family incomes. Students will learn to customize the polygons with color palettes and labels. Chapter 4 will conclude with adding a new layer to the map of every college in America that displays every zip code with a mean income of $200,000 or more during the 2015 tax year. Through the process of mapping zip codes students will learn about spatial data generally, geoJSON data, the @ symbol, and the addPolygons() function. Furthermore, students will have an opportunity to practice applying many of the options that they learned about in the previous chapters, such as popups and labels, as well as new ways to customize their maps, such as the highlight option in addPolygons().","['Interactive Data Visualization in R', 'Spatial Data with R']","['Rich Majerus', 'Chester Ismay', 'Becca Robins']","[('IPEDS All 4-Year Colleges', 'https://assets.datacamp.com/production/repositories/1942/datasets/18a000cf70d2fe999c6a6f2b28a7dc9813730e74/ipeds.csv'), ('NC Zipcode Income data', 'https://assets.datacamp.com/production/repositories/1942/datasets/09d53d484e4979a41a51a427a59a49d2654feb5d/mean_income_by_zip_nc.csv'), ('NC Zipcode Polygons', 'https://assets.datacamp.com/production/repositories/1942/datasets/fe567316eb621bf19798df15b1ed4a84a9aa4832/nc_zips.Rda'), (""America's Wealthiest Zipcodes"", 'https://assets.datacamp.com/production/repositories/1942/datasets/ecce5259c642b7b259bcf030b212e8c7f34786fa/wealthiest_zips.Rda')]","['Introduction to R', 'Introduction to the Tidyverse']",https://www.datacamp.com/courses/interactive-maps-with-leaflet-in-r,Data Visualization,R
141,Intermediate Functional Programming with purrr,4,17,49,"1,477","3,850",Intermediate Functional Programming purrr,"Intermediate Functional Programming with purrr
Have you ever been wondering what the purrr description (“A functional programming toolkit for R”) refers to? Then, you’ve come to the right place! This course will walk you through the functional programming part of purrr - in other words, you will learn how to take full advantage of the flexibility offered by the .f in map(.x, .f) to iterate other lists, vectors and data.frame with a robust, clean, and easy to maintain code. During this course, you will learn how to write your own mappers (or lambda functions), and how to use predicates and adverbs. Finally, this new knowledge will be applied to a use case, so that you’ll be able to see how you can use this newly acquired knowledge on a concrete example of a simple nested list, how to extract, keep or discard elements, how to compose functions to manipulate and parse results from this list, how to integrate purrr workflow inside other functions, how to avoid copy and pasting with purrr functional tools.
Do lambda functions, mappers, and predicates sound scary to you? Fear no more! After refreshing your purrr memory, we will dive into functional programming 101, discover anonymous functions and predicates, and see how we can use them to clean and explore data.
Ready to go deeper with functional programming and purrr? In this chapter, we'll discover the concept of functional programming, explore error handling using including safely() and possibly(), and introduce the function compact() for cleaning your code.
In this chapter, we'll use purrr to write code that is clearer, cleaner, and easier to maintain. We'll learn how to write clean functions with compose() and negate().  We'll also use partial() to compose functions by ""prefilling"" arguments from existing functions. Lastly, we'll introduce list-columns, which are a convenient data structure that helps us write clean code using the Tidyverse.
We'll wrap up everything we know about purrr in a case study. Here, we'll use purrr to analyze data that has been scraped from Twitter. We'll use clean code to organize the data and then we'll identify Twitter influencers from the 2018 RStudio conference.",['Intermediate Tidyverse Toolbox'],"['Colin FAY', 'Chester Ismay', 'Becca Robins']",[],"['Introduction to the Tidyverse', 'Foundations of Functional Programming with purrr']",https://www.datacamp.com/courses/intermediate-functional-programming-with-purrr,Programming,R
142,Intermediate Interactive Data Visualization with plotly in R,4,15,54,937,"4,400",Intermediate Interactive Data Visualization plotly in R,"Intermediate Interactive Data Visualization with plotly in R
The plotly package enables the construction of interactive and animated graphics entirely within R. This goes beyond basic interactivity such as panning, zooming, and tooltips. In this course, you will extend your understanding of plotly to create animated and linked interactive graphics, which will enable you to communicate multivariate stories quickly and effectively. Along the way, you will review the basics of plotly, learn how to wrangle your data in new ways to facilitate cumulative animations, and learn how to add filters to your graphics without using Shiny.
A review of key plotly commands. You will review how to create multiple plot types in plotly and how to polish your charts. Additionally, you will create static versions of the bubble and line charts that you will animate in the next chapter.
In this chapter, you will learn how to implement keyframe animation in plotly. You will explore how to create animations, such as Hans Rosling's bubble charts, as well as cumulative animations, such as an animation of a stock's valuation over time.
When you are exploring unexpected structure in your graphics, it's useful to have selections made on one chart update the other. For example, if you are exploring clusters observed on a scatterplot, it is useful to have the selected cluster update some chart of group membership, such as a jittered scatterplot or sets of bar charts. In this chapter, you will learn how to link your plotly charts to enable linked brushing. Along the way, you will also learn how to add dropdown menus, checkboxes, and sliders to your plotly charts, without the need for Shiny.
In the final chapter, you will use your expanded plotly toolkit to explore orbital space launches between 1957 and 2018. Along the way, you'll learn how to wrangle data to enable cumulative animations without common starting points, and hone your understanding of the crosstalk package.",['Interactive Data Visualization in R'],"['Adam Loy', 'Chester Ismay', 'David Campos']","[('Economic indicators for the 50 states and Washington, D.C. from 1997 to 2017', 'https://assets.datacamp.com/production/repositories/2166/datasets/1367560ab66f0b7006da2075a5a97a99b5184bf7/state_economic_data.csv'), ('Complete list of all orbital space launches between 1957 and 2018', 'https://assets.datacamp.com/production/repositories/2166/datasets/c09b75e6d503e5253c80bcfcdfb8f95a606d9793/launches.csv')]",['Interactive Data Visualization with plotly in R'],https://www.datacamp.com/courses/intermediate-interactive-data-visualization-with-plotly-in-r,,
143,Intermediate Portfolio Analysis in R,5,12,42,"5,093","3,250",Intermediate Portfolio Analysis in R,"Intermediate Portfolio Analysis in R
This course builds on the fundamental concepts from Introduction to Portfolio Analysis in R and explores advanced concepts in the portfolio optimization process. It is critical for an analyst or portfolio manager to understand all aspects of the portfolio optimization problem to make informed decisions. In this course, you will learn a quantitative approach to apply the principles of modern portfolio theory to specify a portfolio, define constraints and objectives, solve the problem, and analyze the results. This course will use the R package PortfolioAnalytics to solve portfolio optimization problems with complex constraints and objectives that mirror real world problems.
This chapter will give you a brief review of Modern Portfolio Theory and introduce you to the PortfolioAnalytics package by solving a couple portfolio optimization problems.
The focus of this chapter is a detailed overview of the recommended workflow for solving portfolio optimization problems with PortfolioAnalytics. You will learn how to create a portfolio specification, add constraints, objectives, run the optimization, and analyze the results of the optimization output.
In this chapter, you will learn about estimating moments, characteristics of the distribution of asset returns, as well as custom objective functions.
In the final chapter of the course, you will solve a portfolio optimization problem that mimics a real world real world example of constructing a portfolio of hedge fund strategy with different style definitions.","['Applied Finance with R', 'Quantitative Analyst with R']","['Ross Bennett', 'Lore Dirick', 'Davis Vaughan']","[('Portfolio specifications object I', 'https://assets.datacamp.com/production/repositories/484/datasets/b88fab92460b3085545ce05d863b6d3431cad69a/port_spec_fi_lo_ret.rds'), ('Portfolio specifications object II', 'https://assets.datacamp.com/production/repositories/484/datasets/326375e59f26728e82bc83d2b2cda2a9713102ef/port_spec_ws_lo_ret_ri_ribud.rds'), ('Set of random portfolios I', 'https://assets.datacamp.com/production/repositories/484/datasets/e9eca7c9ab5cefdca301b6617b6764e300fcc3c3/rp_fi_lo_ret.rds'), ('Set of random portfolios II', 'https://assets.datacamp.com/production/repositories/484/datasets/9e5f72ca2e7411905ddb94424f21e625f978b3a2/rp_ws_lo_ret_ri_ribud.rds')]","['Introduction to R for Finance', 'Intermediate R for Finance', 'Introduction to Portfolio Analysis in R']",https://www.datacamp.com/courses/intermediate-portfolio-analysis-in-r,Applied Finance,R
144,Intermediate Python for Data Science,4,18,87,"365,486","7,400",Intermediate Python Data Science,"Intermediate Python for Data Science
Intermediate Python for Data Science is crucial for any aspiring data science practitioner learning Python. Learn to visualize real data with Matplotlib's functions and get acquainted with data structures such as the dictionary and the pandas DataFrame. After covering key concepts such as boolean logic, control flow, and loops in Python, you'll be ready to blend together everything you've learned to solve a case study using hacker statistics.
Data visualization is a key skill for aspiring data scientists. Matplotlib makes it easy to create meaningful and insightful plots. In this chapter, you’ll learn how to build various types of plots, and customize them to be more visually appealing and interpretable.
Learn about the dictionary, an alternative to the Python list, and the pandas DataFrame, the de facto standard to work with tabular data in Python. You will get hands-on practice with creating and manipulating datasets, and you’ll learn how to access the information you need from these data structures.
Boolean logic is the foundation of decision-making in Python programs. Learn about different comparison operators, how to combine them with Boolean operators, and how to use the Boolean outcomes in control structures. You'll also learn to filter data in pandas DataFrames using logic.
There are several techniques you can use to repeatedly execute Python code. While loops are like repeated if statements, the for loop iterates over all kinds of data structures. Learn all about them in this chapter.
This chapter will allow you to apply all the concepts you've learned in this course. You will use hacker statistics to calculate your chances of winning a bet. Use random number generators, loops, and Matplotlib to gain a competitive edge!","['Data Analyst with Python', 'Data Scientist with Python', 'Python Programmer', 'Python Programming']","['Filip Schouwenaars', 'Vincent Vankrunkelsven', 'Patrick Varilly', 'Florian Goossens']","[('Gapminder', 'https://assets.datacamp.com/production/repositories/287/datasets/5b1e4356f9fa5b5ce32e9bd2b75c777284819cca/gapminder.csv'), ('Cars', 'https://assets.datacamp.com/production/repositories/287/datasets/79b3c22c47a2f45a800c62cae39035ff2ea4e609/cars.csv'), ('BRICS', 'https://assets.datacamp.com/production/repositories/287/datasets/b60fb5bdbeb4e4ab0545c485d351e6ff5428a155/brics.csv')]",['Introduction to Python'],https://www.datacamp.com/courses/intermediate-python-for-data-science,Programming,Python
145,Intermediate R,6,14,81,"324,271","6,950",Intermediate R,"Intermediate R
Intermediate R is the next stop on your journey in mastering the R programming language. In this R training, you will learn about conditional statements, loops, and functions to power your own R scripts. Next, make your R code more efficient and readable using the apply functions. Finally, the utilities chapter gets you up to speed with regular expressions in R, data structure manipulations, and times and dates. This course will allow you to take the next step in advancing your overall knowledge and capabilities while programming in R.
In this chapter, you'll learn about relational operators for comparing R objects, and logical operators like ""and"" and ""or"" for combining TRUE and FALSE values.  Then, you'll use this knowledge to build conditional statements.
Loops can come in handy on numerous occasions. While loops are like repeated if statements, the for loop is designed to iterate over all elements in a sequence. Learn about them in this chapter.
Functions are an extremely important concept in almost every programming language, and R is no different. Learn what functions are and how to use them—then take charge by writing your own functions.
Whenever you're using a for loop, you may want to revise your code to see whether you can use the lapply function instead. Learn all about this intuitive way of applying a function over a list or a vector, and how to use its variants, sapply and vapply.
Mastering R programming is not only about understanding its programming concepts. Having a solid understanding of a wide range of R functions is also important. This chapter introduces you to many useful functions for data structure manipulation, regular expressions, and working with times and dates.","['Data Analyst with R', 'Data Scientist with R', 'R Programmer', 'R Programming']",['Filip Schouwenaars'],[],['Introduction to R'],https://www.datacamp.com/courses/intermediate-r,Programming,R
146,Intermediate R for Finance,5,15,59,"14,629","5,050",Intermediate R Finance,"Intermediate R for Finance
If you enjoyed the Introduction to R for Finance course, then you will love Intermediate R for Finance. Here, you will first learn the basics about how dates work in R, an important skill for the rest of the course. Your next step will be to explore the world of if statements, loops, and functions. These are powerful ideas that are essential to any financial data scientist's toolkit. Finally, we will spend some time working with the family of apply functions as a vectorized alternative to loops. And of course, all examples will be finance related! Enjoy!
Welcome! Before we go deeper into the world of R, it will be nice to have an understanding of how dates and times are created. This chapter will teach you enough to begin working with dates, but only scratches the surface of what you can do with them.
Imagine you own stock in a company. If the stock goes above a certain price, you might want to sell. If the stock drops below a certain price, you might want to buy it while it's cheap! This kind of thinking can be implemented using operators and if statements. In this chapter, you will learn all about them, and create a program that tells you to buy or sell a stock.
Loops can be useful for doing the same operation to each element of your data structure. In this chapter you will learn all about repeat, while, and for loops!
If data structures like data frames and vectors are how you hold your data, functions are how you tell R what to do with your data. In this chapter, you will learn about using built-in functions, creating your own unique functions, and you will finish off with a brief introduction to packages.
A popular alternative to loops in R are the apply functions. These are often more readable than loops, and are incredibly useful for scaling the data science workflow to perform a complicated calculation on any number of observations. Learn about them here!","['Finance Basics with R', 'Quantitative Analyst with R']","['Lore Dirick', 'Davis Vaughan']",[],['Introduction to R for Finance'],https://www.datacamp.com/courses/intermediate-r-for-finance,Applied Finance,R
147,Intermediate R: Practice,4,0,52,"55,678","4,800",Intermediate R: Practice,"Intermediate R: Practice
This follow-up course on Intermediate R does not cover new programming concepts. Instead, you will strengthen your knowledge of the topics in Intermediate R with a bunch of new and fun exercises.
If conditionals are your thing, these exercises will be a walk in the park. Else, let the feedback guide you and add these vital elements of R to your toolkit!
Looping through data structures is something you'll often do. While and for loops help you do this. Get more practice on them by analyzing the log data from a chemical plant.
Functions make R powerful: you can isolate chunks of code, wrap them in a function and use them whenever you want. In this set of exercises, you'll practice more on using functions and writing your own functions.
lapply, sapply and vapply are all members of R's apply family: they provide a fast and intuitive alternative to the while and for loops you've learned about before. Become an apply pro with some more practice!
To finish off these supplementary exercises, you can exercise some more with often-used functions in R, regular expressions and manipulating dates and times.",[],['Filip Schouwenaars'],"[('The 1912 Titanic ship disaster', 'https://assets.datacamp.com/production/repositories/239/datasets/ea08b483790c2a7bc9b95b0f923526f8e60eae44/titanic.csv'), ('Chemical company log files', 'https://assets.datacamp.com/production/course_7747/datasets/logs.rds')]","['Introduction to R', 'Intermediate R']",https://www.datacamp.com/courses/intermediate-r-practice,Other,R
148,Intermediate SQL,4,15,55,"17,588","4,700",Intermediate SQL,"Intermediate SQL
So you've learned how to aggregate and join data from tables in your database—now what? How do you manipulate, transform, and make the most sense of your data? This intermediate-level course will teach you several key functions necessary to wrangle, filter, and categorize information in a relational database, expand your SQL toolkit, and answer complex questions. You will learn the robust use of CASE statements, subqueries, and window functions—all while discovering some interesting facts about soccer using the European Soccer Database.
In this chapter, you will learn how to use the CASE WHEN statement to create categorical variables, aggregate data into a single column with multiple filtering conditions, and calculate counts and percentages.
In this chapter, you will learn about subqueries in the SELECT, FROM, and WHERE clauses. You will gain an understanding of when subqueries are necessary to construct your dataset and where to best include them in your queries.
In this chapter, you will learn how to use nested and correlated subqueries to extract more complex data from a relational database. You will also learn about common table expressions and how to best construct queries using multiple common table expressions.
You will learn about window functions and how to pass aggregate functions along a dataset. You will also learn how to calculate running totals and partitioned averages.",['SQL Fundamentals'],"['Mona Khalil', 'Hillary Green-Lerman', 'Sumedh Panchadhar']",[],['Joining Data in SQL'],https://www.datacamp.com/courses/intermediate-sql,Data Manipulation,SQL
149,Intermediate SQL Server,4,14,47,"12,394","3,850",Intermediate SQL Server,"Intermediate SQL Server
A majority of data is stored in databases and knowing the necessary tools needed to analyze and clean data directly in databases is indispensable. This course focuses on T-SQL, the version of SQL used in Microsoft SQL Server, needed for data analysis. You will learn several concepts in this course such as dealing with missing data, working with dates, and calculating summary statistics using advanced queries. After completing this course, you will have the skills needed to analyze data and provide insights quickly and easily.
One of the first steps in data analysis is examining data through aggregations. This chapter explores how to create aggregations in SQL Server, a common first step in data exploration. You will also clean missing data and categorize data into bins with CASE statements.
This chapter explores essential math operations such as rounding numbers, calculating squares and square roots, and counting records. You will also work with dates in this chapter!
In this chapter, you will create variables and write while loops to process data. You will also write complex queries by using derived tables and common table expressions.
In the final chapter of this course, you will work with partitions of data and window functions to calculate several summary stats and see how easy it is to create running totals and compute the mode of numeric columns.",['SQL Server Fundamentals'],"['Ginger Grant', 'Richie Cotton', 'Sumedh Panchadhar']","[('Incidents', 'https://assets.datacamp.com/production/repositories/1611/datasets/d34780ca1f1bf7578939a2fea4398809e0160d1f/Incidents.csv'), ('Shipments', 'https://assets.datacamp.com/production/repositories/1611/datasets/3222b2ba724c7fc672b77a88d07ec1b51eb5cc22/MixData.csv'), ('Kidney', 'https://assets.datacamp.com/production/repositories/1611/datasets/e974b5ed6baeda8ab34b73bd1c36105f0735be47/ChronicKidneyDisease.csv'), ('Orders', 'https://assets.datacamp.com/production/repositories/1611/datasets/cc02f651accbb545cd5b37bb98236ce7da0f1fb2/Orders.csv')]","['Intro to SQL for Data Science', 'Joining Data in SQL']",https://www.datacamp.com/courses/intermediate-t-sql,Programming,SQL
150,Intermediate Spreadsheets for Data Science,4,12,48,"6,925","4,150",Intermediate Spreadsheets Data Science,"Intermediate Spreadsheets for Data Science
This course will expand your Google Sheets vocabulary. You'll dive deeper into data types, practice manipulating numeric and logical data, explore missing data and error types, and calculate some summary statistics. As you go, you'll explore datasets on 100m sprint world records, asteroid close encounters, benefit claims, and butterflies.
In which you learn to interrogate cells to determine the data type of their contents, and to convert between data types.
In which you learn to apply log and square root transformations to numbers, round them up and down, and generate random numbers.
In which you learn how to work with logical data consisting of TRUE and FALSE values, and how to handle missing values and errors.
In which you learn about cell addresses, advanced matching, sorting and filtering, and simple imputation.",[],['Richie Cotton'],[],[],https://www.datacamp.com/courses/intermediate-spreadsheets-for-data-science,Programming,Spreadsheets
151,Intro to Financial Concepts using Python,4,13,50,"8,192","4,200",Intro Financial Concepts using Python,"Intro to Financial Concepts using Python
Understanding the basic principles of finance is essential for making important financial decisions ranging from taking out a student loan to constructing an investment portfolio. Combining basic financial knowledge with Python will allow you to construct some very powerful tools. You'll come out of this course understanding the time value of money, how to compare potential projects and how to make rational, data-driven financial decisions.
Learn about fundamental financial concepts like the time value of money, growth and rate of return, discount factors, depreciation, and inflation.
In this chapter, you will act as the CEO of a company, making important data-driven financial decisions about projects and financing using measures such as IRR and NPV.
You just got married, and you're looking for a new home in Hoboken, New Jersey. You will build a mortgage payment simulator to estimate your mortgage payments and analyze different possible economic scenarios.
You just got a new job as a data scientist in San Francisco, and you're looking for an apartment. In this chapter, you'll be building your own budgeting application to plan out your financial future.",[],"['Dakota Wixom', 'Lore Dirick', 'Sumedh Panchadhar']",[],[],https://www.datacamp.com/courses/intro-to-financial-concepts-using-python,Applied Finance,Python
152,Intro to Portfolio Risk Management in Python,4,13,51,"4,574","4,250",Intro Portfolio Risk Management,"Intro to Portfolio Risk Management in Python
This course will teach you how to evaluate basic portfolio risk and returns like a quantitative analyst on Wall Street. This is the most critical step towards being able to fully automate your portfolio construction and management processes. Discover what factors are driving your portfolio returns, construct market-cap weighted equity portfolios, and learn how to forecast and hedge market risk via scenario generation.
Learn about the fundamentals of investment risk and financial return distributions.
Level up your understanding of investing by constructing portfolios of assets to enhance your risk-adjusted returns.
Learn about the main factors that influence the returns of your portfolios and how to quantify your portfolio's exposure to these factors.
In this chapter, you will learn two different methods to estimate the probability of sustaining losses and the expected values of those losses for a given asset or portfolio of assets.",[],"['Dakota Wixom', 'Lore Dirick', 'Sumedh Panchadhar', 'Eunkyung Park']","[('All returns (2017)', 'https://assets.datacamp.com/production/repositories/1546/datasets/fb7165b7270a3721f69abf9ff09b85938d9d1068/Big9Returns2017.csv'), ('Efficient Frontier Portfolios', 'https://assets.datacamp.com/production/repositories/1546/datasets/85e2663a50d3445cbc2c2d30ac81abbaae6a7f56/EfficientFrontierPortfoliosSlim.csv'), ('Fama-French factors', 'https://assets.datacamp.com/production/repositories/1546/datasets/3d9b734fea954b629d2477ef48c36525dfecf6e0/FamaFrenchFactors.csv'), ('Microsoft prices', 'https://assets.datacamp.com/production/repositories/1546/datasets/0f1a004a8aa693163fa55f277513309f710b700d/MSFTPrices.csv'), ('ETF of oil prices (UFO)', 'https://assets.datacamp.com/production/repositories/1546/datasets/dfe9da08c986709d59943d1d5c0106537a8c608a/USO.csv')]","['Intro to Financial Concepts using Python', 'Manipulating Time Series Data in Python']",https://www.datacamp.com/courses/intro-to-portfolio-risk-management-in-python,Applied Finance,Python
153,Intro to Python for Finance,4,14,55,"9,409","4,650",Intro Python Finance,"Intro to Python for Finance
The financial industry is increasingly adopting Python for general-purpose programming and quantitative analysis, ranging from understanding trading dynamics to risk management systems. This course focuses specifically on introducing Python for financial analysis. Using practical examples, you will learn the fundamentals of Python data structures such as lists and arrays and learn powerful ways to store and manipulate financial data to identify trends.
This chapter is an introduction to basics in Python, including how to name variables and various data types in Python.
This chapter introduces lists in Python and how they can be used to work with data.
This chapter introduces packages in Python, specifically the NumPy package and how it can be efficiently used to manipulate arrays.
In this chapter, you will be introduced to the Matplotlib package for creating line plots, scatter plots, and histograms.
In this chapter, you will get a chance to apply all the techniques you learned in the course on the S&P 100 data.",[],"['Adina Howe', 'Lore Dirick', 'Eunkyung Park', 'Sumedh Panchadhar']","[('Stocks data (I)', 'https://assets.datacamp.com/production/repositories/1715/datasets/2623c8037df0505d619c87a09131af9105e5883d/stock_data.csv'), ('Stocks data (II)', 'https://assets.datacamp.com/production/repositories/1715/datasets/d96bf818f1f6f52af429edcaaf9dd96d37ab7b0a/stock_data2.csv'), ('S&P 100 data', 'https://assets.datacamp.com/production/repositories/1715/datasets/0ef2a37a04b12d12368f060efd02b93cd110bd29/sector.txt')]",[],https://www.datacamp.com/courses/intro-to-python-for-finance,Applied Finance,Python
154,Introduction to AWS Boto in Python,4,15,54,300,"4,550",Introduction AWS Boto,"Introduction to AWS Boto in Python
What if you were no longer constrained by the capabilities of your laptop? What if you could get an SMS when a city garbage truck camera spots a missing a cat? This is all possible with cloud technology. This course will teach you how to integrate Amazon Web Services (AWS) into your data workflow. You’ll learn how to upload data to S3, AWS cloud storage. You’ll use triggers from your analysis to send text messages with AWS SNS. You will use Rekognition to detect objects in an image. And you will use Comprehend to decide if a piece of feedback is negative. By the time you’re done, you will learn how to build a pipeline, subscribe people to it, and send them text messages when an image contains a cat!
Embark on the world of cloud technology! From learning how AWS works to creating S3 buckets and uploading files to them. You will master the basics of setting up AWS and uploading files to the cloud!
Continue your journey in mastering AWS by learning how to upload and share files securely. You will learn how set files to be public or private, and cap off what you learned by generating web-based reports!
Next, you will learn how to automate sharing your findings with the world by building notification triggers for your analysis! You will learn how to harness AWS to send SMS and email notifications to users and cap off what you learned by making custom notifications depending on a user's needs.
Finally, you will go beyond uploading, sharing and notifying into rekognizing using AWS Rekognition and other AWS machine learning services to recognize cats, translate language and detect sentiment. You will be capping off your learning journey by applying a real-world use case that mixes everything you've learned!",[],"['Maksim Pecherskiy', 'Hillary Green-Lerman', 'Adel Nehme']","[('Get It Done Requests', 'https://assets.datacamp.com/production/repositories/4607/datasets/77f70071e5e5ea42aa31d5384640bee6931a5d50/get_it_done_2019_requests_datasd.csv')]","['Introduction to Python', 'Intermediate Python for Data Science', 'Python Data Science Toolbox (Part 1)', 'Python Data Science Toolbox (Part 2)']",https://www.datacamp.com/courses/introduction-to-aws-boto-in-python,Programming,Python
155,Introduction to Bioconductor,4,14,54,"1,442","4,050",Introduction Bioconductor,"Introduction to Bioconductor
Much of the biological research, from medicine to biotech, is moving toward sequence analysis. We are now generating targeted and whole genome big data, which needs to be analyzed to answer biological questions. To help you get started, you will be introduced to The Bioconductor project. Bioconductor is and builds the infrastructure to share software tools (packages), workflows and datasets for the analysis and comprehension of genomic data. Bioconductor is a great platform accessible to you, and it is a community developed open software resource. By the end of this course, you will be able to use essential Bioconductor packages and get a grasp of its infrastructure and some built-in datasets. Using BSgenome, Biostrings, IRanges, GenomicRanges, TxDB, ShortRead and Rqc with real datasets from different species is going to be an exceptional experience!
In this chapter you will get hands-on with Bioconductor. Bioconductor is the specialized  repository for bioinformatics software, developed and maintained by the R community. You will learn how to install and use bioconductor packages. You will be introduced to S4 objects and functions, because most packages within Bioconductor inherit from S4. Additionally, you will use a real genomic dataset of a fungus to explore the BSgenome package.
Biostrings are memory efficient string containers. Biostring has matching algorithms, and other utilities, for fast manipulation of large biological sequences or sets of sequences. How efficient you can become by using the right containers for your sequences? You will learn about alphabets, and sequence manipulation by using the tiny genome of a virus.
The IRanges and GenomicRanges packages are also containers for storing and manipulating genomic intervals  and variables defined along a genome.  These packages provide infrastructure and support to many other Bioconductor packages because of their enriching features. You will learn how to use these containers and their associated metadata, for manipulation of your sequences. The dataset you will be looking at is a special gene of interest in the human genome.
ShortRead is the package for input, manipulation and assessment of fasta and fastq files. You can subset, trim and filter the sequences of interest, and even do a report of quality. An extra bonus towards the last exercises will give you the tools for parallel quality assessment, wink, wink Rqc. Exciting enough, for this you will use plant genome sequences!",[],"['Paula Martinez', 'Sascha Mayr', 'David Campos', 'Shon Inouye']","[('Zika Genomic DNA dataset', 'https://assets.datacamp.com/production/repositories/1641/datasets/790618555a5e420bbda36fd93effe01182896e1f/zika_genomic.fa.txt'), ('A. Thaliana Short Reads with Quality dataset', 'https://assets.datacamp.com/production/repositories/1641/datasets/0b92c84cc116f3c838b709fccd9cbede96f7fe1e/small_SRR1971253.fastq'), ('Human Gene & Transcript ID dataset', 'https://assets.datacamp.com/production/repositories/1641/datasets/7d0e830ab73ed1b85fbc2f9149c244eee1dfe4d7/gene_id_tx_id.txt'), ('Yeast Genome dataset', 'https://assets.datacamp.com/production/repositories/1641/datasets/4870f7b72822ef33b987e40d5f6aed21a54de858/sacCer3.fasta.gz')]","['Introduction to R', 'Introduction to the Tidyverse']",https://www.datacamp.com/courses/introduction-to-bioconductor,Other,R
156,Introduction to Data,4,15,46,"60,570","3,200",Introduction Data,"Introduction to Data
Scientists seek to answer questions using rigorous methods and careful observations. These observations—collected from the likes of field notes, surveys, and experiments—form the backbone of a statistical investigation and are called data. Statistics is the study of how best to collect, analyze, and draw conclusions from data. It is helpful to put statistics in the context of a general process of investigation: 1) identify a question or problem; 2) collect relevant data on the topic; 3) analyze the data; and 4) form a conclusion. In this course, you'll focus on the first two steps of the process.
This chapter introduces terminology of datasets and data frames in R.
In this chapter, you will learn about observational studies and experiments, scope of inference, and Simpson's paradox.
This chapter defines various sampling strategies and their benefits/drawbacks as well as principles of experimental design.
Apply terminology, principles, and R code learned in the first three chapters of this course to a case study looking at how the physical appearance of instructors impacts their students' course evaluations.","['Data Analyst with R', 'Data Scientist with R', 'Statistics Fundamentals with R']","['Mine Cetinkaya-Rundel', 'Nick Carchedi', 'Tom Jeon']","[('Course evaluation', 'https://assets.datacamp.com/production/repositories/539/datasets/e4bb6dc2496e3a50208dccb81dcbcb62faf5b122/evals.RData'), ('UC Berkeley admissions', 'https://assets.datacamp.com/production/repositories/539/datasets/312d8ff0bad2cd9d567adce0181435a99892c5f8/ucb_admit.RData'), ('US state regions', 'https://assets.datacamp.com/production/repositories/539/datasets/5a549cee71a2347201fb145e25312eaa426ec9be/us_regions.RData')]",['Introduction to R'],https://www.datacamp.com/courses/introduction-to-data,Probability & Statistics,R
157,Introduction to Data Engineering,4,15,57,147,"4,100",Introduction Data Engineering,"Introduction to Data Engineering
Have you heard people talk about data engineers and wonder what it is they do? Do you know what data engineers do but you're not sure how to become one yourself? This course is the perfect introduction. It touches upon all things you need to know to streamline your data processing. This introductory course will give you enough context to start exploring the world of data engineering. It's perfect for people who work at a company with several data sources and don't have a clear idea of how to use all those data sources in a scalable way. Be the first one to introduce these techniques to your company and become the company star employee.
In this first chapter, you will be exposed to the world of data engineering! Explore the differences between a data engineer and a data scientist, get an overview of the various tools data engineers use and expand your understanding of how cloud technology plays a role in data engineering.
Now that you know the primary differences between a data engineer and a data scientist, get ready to explore the data engineer's toolbox! Learn in detail about different types of databases data engineers use, how parallel computing is a cornerstone of the data engineer's toolkit, and how to schedule data processing jobs using scheduling frameworks.
Having been exposed to the toolbox of data engineers, it's now time to jump into the bread and butter of a data engineer's workflow! With ETL, you will learn how to extract raw data from various sources, transform this raw data into actionable insights, and load it into relevant databases ready for consumption!
Cap off all that you've learned in the previous three chapters by completing a real-world data engineering use case from DataCamp! You will perform and schedule an ETL process that transforms raw course rating data, into actionable course recommendations for DataCamp students!",[],"['Vincent Vankrunkelsven', 'Adel Nehme']",[],"['Introduction to Python', 'Intermediate Python for Data Science', 'Intro to SQL for Data Science']",https://www.datacamp.com/courses/introduction-to-data-engineering,Programming,Python
158,Introduction to Data Science in Python,4,13,44,"20,105","3,700",Introduction Data Science,"Introduction to Data Science in Python
Begin your journey into Data Science! Even if you've never written a line of code in your life, you'll be able to follow this course and witness the power of Python to perform Data Science. You'll use data to solve the mystery of Bayes, the kidnapped Golden Retriever, and along the way you'll become familiar with basic Python syntax and popular Data Science modules like Matplotlib (for charts and graphs) and Pandas (for tabular data).
Welcome to the wonderful world of Data Analysis in Python! In this chapter, you'll learn the basics of Python syntax, load your first Python modules, and use functions to get a suspect list for the kidnapping of Bayes, DataCamp's prize-winning Golden Retriever.
In this chapter, you'll learn a powerful Python libary: pandas.  Pandas lets you read, modify, and search tabular datasets (like spreadsheets and database tables).  You'll examine credit card records for the suspects and see if any of them made suspicious purchases.
Get ready to visualize your data! You'll create line plots with another Python module: matplotlib.  Using line plots, you'll analyze the letter frequencies from the ransom note and several handwriting samples to determine the kidnapper.
In this final chapter, you'll learn how to create three new plot types: scatter plots, bar plots, and histograms.  You'll use these tools to locate where the kidnapper is hiding and rescue Bayes, the Golden Retriever.",['Data Analyst with Python'],"['Hillary Green-Lerman', 'Mona Khalil']",[],[],https://www.datacamp.com/courses/introduction-to-data-science-in-python,Programming,Python
159,Introduction to Data Visualization with Python,4,14,58,"87,388","5,000",Introduction Data Visualization Python,"Introduction to Data Visualization with Python
This course extends Intermediate Python for Data Science to provide a stronger foundation in data visualization in Python. You’ll get a  broader coverage of the Matplotlib library and an overview of seaborn, a package for statistical graphics. Topics covered include customizing graphics, plotting two-dimensional arrays (like pseudocolor plots, contour plots, and images), statistical graphics (like visualizing distributions and regressions), and working with time series and image data.
Following a review of basic plotting with Matplotlib, this chapter delves into customizing plots using Matplotlib. This includes overlaying plots, making subplots, controlling axes, adding legends and annotations, and using different plot styles.
This chapter showcases various techniques for visualizing two-dimensional arrays. This includes the use, presentation, and orientation of grids for representing two-variable functions followed by discussions of pseudocolor plots, contour plots, color maps, two-dimensional histograms, and images.
This is a high-level tour of the seaborn plotting library for producing statistical graphics in Python. We’ll cover seaborn tools for computing and visualizing linear regressions, as well as tools for visualizing univariate distributions (like strip, swarm, and violin plots) and multivariate distributions (like joint plots, pair plots, and heatmaps). We’ll also discuss grouping categories in plots.
This chapter ties together the skills gained so far through examining time series data and images. You’ll customize plots of stock data, generate histograms of image pixel intensities, and enhance image contrast through histogram equalization.","['Data Analyst with Python', 'Data Scientist with Python']","['Team Anaconda', 'Yashas Roy', 'Hugo Bowne-Anderson']","[('Automobile miles per gallon', 'https://assets.datacamp.com/production/repositories/558/datasets/1a03987ad77b38d61fc4c692bf64454ddf345fbe/auto-mpg.csv'), (""Percentage of bachelor's degrees awarded to women in the USA"", 'https://assets.datacamp.com/production/repositories/558/datasets/5f4f1a9bab95fba4d7fea1ad3c30dcab8f5b9c96/percent-bachelors-degrees-women-usa.csv'), ('Stocks', 'https://assets.datacamp.com/production/repositories/558/datasets/8dd58ff003e399765cdf348305783b842ff1d7eb/stocks.csv')]","['Introduction to Python', 'Intermediate Python for Data Science']",https://www.datacamp.com/courses/introduction-to-data-visualization-with-python,Data Visualization,Python
160,Introduction to Data Visualization with ggplot2,4,14,52,304,"4,300",Introduction Data Visualization ggplot2,"Introduction to Data Visualization with ggplot2
The ability to produce meaningful and beautiful data visualizations is an essential part of your skill set as a data scientist. This course, the first R data visualization tutorial in the series, introduces you to the principles of good visualizations and the grammar of graphics plotting concepts implemented in the ggplot2 package. ggplot2 has become the go-to tool for flexible and professional plots in R. Here, we’ll examine the first three essential layers for making a plot - Data, Aesthetics and Geometries. By the end of the course you will be able to make complex exploratory plots.
In this chapter we’ll get you into the right frame of mind for developing meaningful visualizations with R. You’ll understand that as a communications tool, visualizations require you to think about your audience first. You’ll also be introduced to the basics of ggplot2 - the 7 different grammatical elements (layers) and aesthetic mappings.
Aesthetic mappings are the cornerstone of the grammar of graphics plotting concept. This is where the magic happens - converting continuous and categorical data into visual scales that provide access to a large amount of information in a very short time. In this chapter you’ll understand how to choose the best aesthetic mappings for your data.
A plot’s geometry dictates what visual elements will be used. In this chapter, we’ll familiarize you with the geometries used in the three most common plot types you’ll encounter - scatter plots, bar charts and line plots. We’ll look at a variety of different ways to construct these plots.
In this chapter, we’ll explore how understanding the structure of your data makes data visualization much easier. Plus, it’s time to make our plots pretty. This is the last step in the data viz process. The Themes layer will enable you to make publication quality plots directly in R. In the next course we'll look at some extra layers to add more variables to your plots.",[],"['Rick Scavetta', 'Richie Cotton', 'Jonathan Ng', 'Shon Inouye']","[('Diamonds', 'https://assets.datacamp.com/production/repositories/5171/datasets/ca796c3f86ee883c59e31bd6272ee81841cb7f54/diamonds.RData'), ('Iris', 'https://assets.datacamp.com/production/repositories/5171/datasets/ffb99ce839b696b5a8102c8015b6db7060b360fe/iris.RData'), ('Recession', 'https://assets.datacamp.com/production/repositories/5171/datasets/98df1f3a8e599cbabd16ea2a21d8f74c0d02290d/recess.RData'), ('Fish', 'https://assets.datacamp.com/production/repositories/5171/datasets/fd66a8c2408f8cccc24df8ce2668e0e195519532/fish.RData')]",['Introduction to the Tidyverse'],https://www.datacamp.com/courses/introduction-to-data-visualization-with-ggplot2,Data Visualization,R
161,Introduction to Databases in Python,4,20,66,"54,675","5,550",Introduction Databases,"Introduction to Databases in Python
In this course, you'll learn the basics of using SQL with Python. This will be useful because databases are ubiquitous and data scientists, analysts, and engineers must interact with them constantly. The Python SQL toolkit SQLAlchemy provides an accessible and intuitive way to query, build, and write to essential databases, including SQLite, MySQL, and PostgreSQL.
In this chapter, you’ll get acquainted with the fundamentals of relational databases and the relational model for database management. You will learn how to connect to a database and interact with it by writing basic SQL queries, both in raw SQL as well as SQLAlchemy, which provides a Pythonic way of interacting with databases.
In this chapter, you will build on your database knowledge by writing more nuanced queries that allow you to filter, order, and count your data—all within the Pythonic framework provided by SQLAlchemy.
In this chapter, you will learn to perform advanced—and incredibly useful—queries that enable you to interact with your data in powerful ways.
In the previous chapters, you interacted with existing databases and queried them in different ways. Now, you will learn how to build your own databases and keep them updated.
Bring together all of the skills you acquired in the previous chapters to work on a real-life project. From connecting to a database and populating it, to reading and querying it.",['Importing & Cleaning Data with Python'],"['Jason Myers', 'Hugo Bowne-Anderson', 'Vincent Lan']","[('Census (CSV)', 'https://assets.datacamp.com/production/repositories/274/datasets/7a5a4567430ee737c70994d1c4747f252e0fd527/census.csv'), ('Census (SQLite)', 'https://assets.datacamp.com/production/repositories/274/datasets/f6eda83e7fb90ac06a22af4132a355933763785c/census.sqlite'), ('Employees (SQLite)', 'https://assets.datacamp.com/production/repositories/274/datasets/af705f788c225cad7e6ef405ed5490db36ed03bf/employees.sqlite')]","['Introduction to Python', 'Intermediate Python for Data Science']",https://www.datacamp.com/courses/introduction-to-relational-databases-in-python,Data Manipulation,Python
162,Introduction to Function Writing in R,4,14,52,308,"4,350",Introduction Function Writing in R,"Introduction to Function Writing in R
Being able to write your own functions makes your analyses more readable, with fewer errors, and more reusable from project to project. Function writing will increase your productivity more than any other skill!  In this course you'll the basics of function writing, focusing on the arguments going into the function and the return values. You'll be writing useful data science functions, and using real-world data on Wyoming tourism, stock price/earnings ratios, and grain yields.
Learn why writing your own functions is useful, how to convert a script into a function, and what order you should include the arguments.
Learn how to set defaults for arguments, how to pass arguments between functions, and how to check that users specified arguments correctly.
Learn how to return early from a function, how to return multiple values, and understand how R decides which variables exist.
Apply your function writing skills to a case study involving data preparation, visualization, and modeling.",[],"['Richie Cotton', 'Marianna Lamnina']","[('Snake River visits', 'https://assets.datacamp.com/production/repositories/5028/datasets/a55843f83746968c7f118d82ed727db9c71e891f/snake_river_visits.rds'), ('Standard & Poor 500 price/earnings ratios', 'https://assets.datacamp.com/production/repositories/5028/datasets/675d6348dbcc81bcb4bb9d25e827f5b63d034771/std_and_poor500_with_pe_2019-06-21.rds'), ('NASS corn yields', 'https://assets.datacamp.com/production/repositories/5028/datasets/495f9e5fa1ae333cd013568412df4e7c663c2192/nass.corn.rds'), ('NASS wheat yields', 'https://assets.datacamp.com/production/repositories/5028/datasets/8dde0453e7b53c630546e5b9723ce279ac6e4901/nass.wheat.rds'), ('NASS barley yields', 'https://assets.datacamp.com/production/repositories/5028/datasets/a5eddd39a47abc8efbdb419c54882112dc28785b/nass.barley.rds')]","['Introduction to the Tidyverse', 'Intermediate R']",https://www.datacamp.com/courses/introduction-to-function-writing-in-r,Programming,R
163,Introduction to Git for Data Science,4,0,46,"58,953","3,650",Introduction Git Data Science,"Introduction to Git for Data Science
Version control is one of the power tools of programming. It allows you to keep track of what you did when, undo any changes you decide you don't want, and collaborate at scale with other people. This course will introduce you to Git, a modern version control tool that is very popular with data scientists and software developers, and show you how to use it to get more done in less time and with less pain.
This chapter explains what version control is and why you should use it, and introduces the most common steps in a common Git workflow.
This chapter digs a little deeper into how Git stores information and how you can explore a repository's history.
Since Git saves all the changes you've made to your files, you can use it to undo those changes. This chapter shows you several ways to do that.
Branching is one of Git's most powerful features, since it allows you to work on several things at once. This chapter shows you how to create and manage branches.
This chapter showcases how Git allows you to share changes between repositories to collaborate at scale.",[],"['Greg Wilson', 'Filip Schouwenaars']",[],[],https://www.datacamp.com/courses/introduction-to-git-for-data-science,Programming,Git
164,Introduction to Linear Modeling in Python,4,16,59,"4,500","5,050",Introduction Linear Modeling,"Introduction to Linear Modeling in Python
One of the primary goals of any scientist is to find patterns in data and build models to describe, predict, and extract insight from those patterns. The most fundamental of these patterns is a linear relationship between two variables. This course provides an introduction to exploring, quantifying, and modeling linear relationships in data, by demonstrating techniques such as least-squares, linear regression, estimatation, and bootstrap resampling. Here you will apply the most powerful modeling tools in the python data science ecosystem, including scipy, statsmodels, and scikit-learn, to build and evaluate linear models. By exploring the concepts and applications of linear models with python, this course serves as both a practical introduction to modeling, and as a foundation for learning more advanced modeling techniques and tools in statistics and machine learning.
We start the course with an initial exploration of linear relationships, including some motivating examples of how linear models are used, and demonstrations of data visualization methods from matplotlib. We then use descriptive statistics to quantify the shape of our data and use correlation to quantify the strength of linear relationships between two variables.
Here we look at the parts that go into building a linear model. Using the concept of a Taylor Series, we focus on the parameters slope and intercept, how they define the model, and how to interpret the them in several applied contexts. We apply a variety of python modules to find the model that best fits the data, by computing the optimal values of slope and intercept, using least-squares, numpy, statsmodels, and scikit-learn.
Next we will apply models to real data and make predictions. We will explore some of the most common pit-falls and limitations of predictions, and we evaluate and compare models by quantifying and contrasting several measures of goodness-of-fit, including RMSE and R-squared.
In our final chapter, we introduce concepts from inferential statistics, and use them to explore how maximum likelihood estimation and bootstrap resampling can be used to estimate linear model parameters. We then apply these methods to make probabilistic statements about our confidence in the model parameters.",['Statistics Fundamentals with Python'],"['Jason Vestuto', 'Nick Solomon', 'Adrián Soto']","[('Femur length versus body height', 'https://assets.datacamp.com/production/repositories/1480/datasets/a1871736d7829e85ec4ead2212d621df69bb3977/femur_data.csv'), ('Distance hiked versus hike duration', 'https://assets.datacamp.com/production/repositories/1480/datasets/2a0748a53d9f54544d63e451318b44cf438c01c2/hiking_data.csv'), ('Galaxy distances versus recession velocities', 'https://assets.datacamp.com/production/repositories/1480/datasets/fffe3864d24c1864a5df0f0985a11d07bc1cd526/hubble_data.csv'), ('Sea surface height versus year', 'https://assets.datacamp.com/production/repositories/1480/datasets/0c1231fa1e777d3d7dd2d0ab877158a0dd219a5b/sea_level_data.csv'), ('Mass versus volume of solution', 'https://assets.datacamp.com/production/repositories/1480/datasets/e4c4cdd076de27d3c1bccfe1d0019d279c08d2fc/solution_data.csv')]","['Statistical Thinking in Python (Part 1)', 'Intermediate Python for Data Science']",https://www.datacamp.com/courses/introduction-to-linear-modeling-in-python,Probability & Statistics,Python
165,Introduction to Machine Learning,6,15,81,"82,075","6,500",Introduction Machine Learning,"Introduction to Machine Learning
This online machine learning course is perfect for those who have a solid basis in R and statistics, but are complete beginners with machine learning. After a broad overview of the discipline's most common techniques and applications, you'll gain more insight into the assessment and training of different machine learning models. The rest of the course is dedicated to a first reconnaissance with three of the most basic machine learning tasks: classification, regression and clustering.
In this first chapter, you get your first intro to machine learning. After learning the true fundamentals of machine learning, you'll experiment with the techniques that are explained in more detail in future chapters.
You'll learn how to assess the performance of both supervised and unsupervised learning algorithms. Next, you'll learn why and how you should split your data in a training set and a test set. Finally, the concepts of bias and variance are explained.
You'll gradually take your first steps to correctly perform classification, one of the most important tasks in machine learning today. By the end of this chapter, you'll be able to learn and build a decision tree and to classify unseen observations with k-Nearest Neighbors.
Although a traditional subject in classical statistics, you can also consider regression from a machine learning point of view. You'll learn more about the predictive capabilities and performance of regression algorithms. At the end of this chapter you'll be acquainted with simple linear regression, multi-linear regression and k-Nearest Neighbors regression.
As an unsupervised learning technique, clustering requires a different approach than the ones you have seen in the previous chapters. How can you cluster? When is a clustering any good? All these questions will be answered; you'll also learn about k-means clustering and hierarchical clustering along the way. At the end of this chapter and our machine learning video tutorials, you’ll have a basic understanding of all the main principles.",[],"['Gilles Inghelbrecht', 'Vincent Vankrunkelsven', 'Filip Schouwenaars']","[('Cars', 'https://assets.datacamp.com/production/repositories/248/datasets/67f1336bf09c9fe377ea122819e6389670a80e0b/cars.csv'), ('Emails', 'https://assets.datacamp.com/production/repositories/248/datasets/00ec717b6fdbdd7eb472e9d4e5f0183a79af5aee/emails.RData'), ('Titanic', 'https://assets.datacamp.com/production/repositories/248/datasets/1e4773b17af6546a965b69a6d4f003238f34ccc2/titanic.csv'), ('Air', 'https://assets.datacamp.com/production/repositories/248/datasets/da2123725e81e36e4b749f4c2a717b8ac9769c8d/air.csv'), ('Seeds', 'https://assets.datacamp.com/production/repositories/248/datasets/89aa0ee629a8dadb2e49383f6a062761fcc83263/seeds.csv'), ('Income', 'https://assets.datacamp.com/production/repositories/248/datasets/d6c8a65d29855a493fa84e64d252fad5ee6737da/income.RData'), ('Kangoroos', 'https://assets.datacamp.com/production/repositories/248/datasets/29d26a0047b418dde99b04693b9c1a443ed02214/kangoroos.csv'), ('World Bank data', 'https://assets.datacamp.com/production/repositories/248/datasets/beaeff0287c7bd9d9d1b87f408aaaa3782a487cd/world_bank_train.csv'), ('School results', 'https://assets.datacamp.com/production/repositories/248/datasets/f80a7294c73c6fca3fcd8e323f86a8f9ab871c54/school_result.csv'), ('Olympic run records', 'https://assets.datacamp.com/production/repositories/248/datasets/938ee8e578c1c7aa62986772c6f8f00ddde6170d/run_record.csv'), ('Crime data', 'https://assets.datacamp.com/production/repositories/248/datasets/fbee5e33af1244d2cf858b56690b9fcb762e764e/crime_data.csv')]","['Introduction to R', 'Intermediate R', 'Correlation and Regression']",https://www.datacamp.com/courses/introduction-to-machine-learning-with-r,Machine Learning,R
166,Introduction to Matplotlib,4,14,44,"6,583","3,600",Introduction Matplotlib,"Introduction to Matplotlib
Visualizing data in plots and figures exposes the underlying patterns in the data and provides insights. Good visualizations also help you communicate your data to others, and are useful to data analysts and other consumers of the data. In this course, you will learn how to use Matplotlib, a powerful Python data visualization library. Matplotlib provides the building blocks to create rich visualizations of many different kinds of datasets. You will learn how to create visualizations for different kinds of data and how to customize, automate, and share these visualizations.
This chapter introduces the Matplotlib visualization library and demonstrates how to use it with data.
Time series data is data that is recorded. Visualizing this type of data helps clarify trends and illuminates relationships between data.
Visualizations can be used to compare data in a quantitative manner. This chapter explains several methods for quantitative visualizations.
This chapter shows you how to share your visualizations with others: how to save your figures as files, how to adjust their look and feel, and how to automate their creation based on input data.",['Data Visualization with Python'],"['Ariel Rokem', 'Chester Ismay', 'Amy Peterson']",[],['Introduction to Python'],https://www.datacamp.com/courses/introduction-to-matplotlib,Data Visualization,Python
167,Introduction to MongoDB in Python,4,16,60,"3,779","4,450",Introduction MongoDB,"Introduction to MongoDB in Python
MongoDB is a tool to explore data structured as you see fit. As a NoSQL database, it doesn't follow the strict relational format imposed by SQL. By providing capabilities that typically require adding layers to SQL, it collapses complexity. With dynamic schema, you can handle vastly different data together and consolidate analytics. The flexibility of MongoDB empowers you to keep improving and fix issues as your requirements evolve. In this course, you will learn the MongoDB language and apply it to search and analytics. Working with unprocessed data from the official nobelprize.org API, you will explore and answer questions about Nobel Laureates and prizes.
This chapter is about getting a bird's-eye view of the Nobel Prize data's structure. You will relate MongoDB documents, collections, and databases to JSON and Python types. You'll then use filters, operators, and dot notation to explore substructure.
Now you have a sense of the data's structure. This chapter is about dipping your toes into the pools of values for various fields. You'll collect distinct values, test for membership in sets, and match values to patterns.
You can now query collections with ease and collect documents to examine and analyze with Python. But this process is sometimes slow and onerous for large collections and documents. This chapter is about various ways to speed up and simplify that process.
You've used projection, sorting, indexing, and limits to speed up data fetching. But there are still annoying performance bottlenecks in your analysis pipelines. You still need to fetch a ton of data. Thus, network bandwidth and downstream processing and memory capacity still impact performance. This chapter is about using MongoDB to perform aggregations for you on the server.",[],"['Donny Winston', 'Hadrien Lacroix', 'Mari Nazary', 'Greg Wilson']","[('Laureates dataset', 'https://assets.datacamp.com/production/repositories/1838/datasets/f402fa7be837b9cd4890f4e1c59a7377693ba36c/laureates.json'), ('Prizes dataset', 'https://assets.datacamp.com/production/repositories/1838/datasets/3fde64719bc3226b593a1c261f715566ea6284b2/prizes.json')]","['Python Data Science Toolbox (Part 2)', 'Data Types for Data Science']",https://www.datacamp.com/courses/introduction-to-using-mongodb-for-data-science-with-python,Data Manipulation,Python
168,Introduction to Portfolio Analysis in Python,4,15,52,467,"4,200",Introduction Portfolio Analysis,"Introduction to Portfolio Analysis in Python
Have you ever had wondered whether an investment fund is actually a good investment? Or compared two investment options and asked what the difference between the two is? What does the risk indicator of these funds even mean? Or do you frequently work with financial data in your daily job and you want to get an edge? In this course, you’re going to get familiar with the exciting world of investing, by learning about portfolios, risk and return, and how to critically analyze them. By working on actual historical stock data, you’ll learn how to calculate meaningful measures of risk, how to break-down performance, and how to calculate an optimal portfolio for the desired risk and return trade-off. After this course, you’ll be able to make data-driven decisions when it comes to investing and have a better understanding of investment portfolios.
In the first chapter, you’ll learn how a portfolio is build up out of individual assets and corresponding weights. The chapter also covers how to calculate the main characteristics of a portfolio: returns and risk.
Chapter 2 goes deeper into how to measure returns and risk accurately. The two most important measures of return, annualized returns, and risk-adjusted returns, are covered in the first part of the chapter. In the second part, you’ll learn how to look at risk from different perspectives. This part focuses on skewness and kurtosis of a distribution, as well as downside risk.
In chapter 3, you’ll learn about investment factors and how they play a role in driving risk and return. You’ll learn about the Fama French factor model, and use that to break down portfolio returns into explainable, common factors. This chapter also covers how to use Pyfolio, a public portfolio analysis tool.
In this last chapter, you learn how to create optimal portfolio weights, using Markowitz’ portfolio optimization framework. You’ll learn how to find the optimal weights for the desired level of risk or return. Lastly, you’ll learn alternative ways to calculate expected risk and return, using the most recent data only.",[],"['Charlotte Werger', 'Hillary Green-Lerman', 'Ruanne Van Der Walt']","[('Small portfolio', 'https://assets.datacamp.com/production/repositories/4745/datasets/da4863e3ea40482a56929e595f1a7c6e1fdaf614/small_portfolio.csv'), ('S&P500', 'https://assets.datacamp.com/production/repositories/4745/datasets/870eef8aa8b72b4790e12cd3179aa555e1055682/sp500.csv'), ('Large portfolio', 'https://assets.datacamp.com/production/repositories/4745/datasets/f61b6a5210dfa02b8299f96380e500f59cbe38c5/large_pf.csv'), ('Factors for portfolio returns', 'https://assets.datacamp.com/production/repositories/4745/datasets/cd4884fa086d67c85d89e879e3fc2d95269f16bd/factors_pf_returns.csv'), ('Portfolio factors', 'https://assets.datacamp.com/production/repositories/4745/datasets/9a9c0ae96a31e2502fb60882eb539c61c0c1230d/pf_factors.csv')]","['Manipulating Time Series Data in Python', 'Intro to Python for Finance', 'pandas Foundations']",https://www.datacamp.com/courses/introduction-to-portfolio-analysis-in-python,Applied Finance,Python
169,Introduction to Portfolio Analysis in R,5,14,57,"19,759","4,400",Introduction Portfolio Analysis in R,"Introduction to Portfolio Analysis in R
A golden rule in investing is to always test the portfolio strategy on historical data, and, once you are trading the strategy, to constantly monitor its performance. In this course, you will learn this by critically analyzing portfolio returns using the package PerformanceAnalytics. The course also shows how to estimate the portfolio weights that optimally balance risk and return. This is a data-driven course that combines portfolio theory with the practice in R, illustrated on real-life examples of equity portfolios and asset allocation problems.  If you'd like to continue exploring the data after you've finished this course, the data used in the first three chapters can be obtained using the tseries-package. The code to get them can be found here. The data used in chapter 4 can be downloaded here.
Asset returns and portfolio weights; those are the building blocks of a portfolio return. This chapter is about computing those portfolio weights and returns in R.
The history of portfolio returns reveals valuable information about how much the investor can expect to gain or lose. This chapter introduces the R functionality to analyze the investment performance based on a statisical analysis of the portfolio returns. It includes graphical analysis and the calculation of performance statistics expressing average return, risk and risk-adjusted return over rolling estimation samples.
In addition to studying portfolio performance based on the observed portfolio return series, it is relevant to find out how individual (expected) returns, volatilities and correlations interact to determine the total portfolio performance.
We have up to now considered the portfolio weights as given. In this chapter you learn how to determine in R the portfolio weights that are optimal in terms of achieving a target return with minimum variance, while satisfying constraints on the portfolio weights.","['Applied Finance with R', 'Quantitative Analyst with R']","['Kris Boudt', 'Lore Dirick']","[('Stock prices for Apple and Microsoft', 'https://assets.datacamp.com/production/repositories/156/datasets/19b8706d185f4a46536ede60b2aab77457d139cf/aapl_msft.RData'), ('Bonds prices', 'https://assets.datacamp.com/production/repositories/156/datasets/16a33b7cf90c561d6b7118778e74b34f96478174/bond_prices.RData'), ('Commodities prices', 'https://assets.datacamp.com/production/repositories/156/datasets/34c3822b17f6b911c1725da49e90207964509738/comm_prices.RData'), ('Equities prices', 'https://assets.datacamp.com/production/repositories/156/datasets/0b39b863d740fa2cd39f408a463cd10eb6c617e6/eq_prices.RData'), ('Stock prices for DIJA', 'https://assets.datacamp.com/production/repositories/156/datasets/f1b7df924abf7f11f7b01284b8874d8fda609f2f/prices.rds'), ('Real estate prices', 'https://assets.datacamp.com/production/repositories/156/datasets/40978acd3fd7efa00815a5dceaf3dcf8cddb5331/re_prices.RData'), ('Daily prices in S&P 500', 'https://assets.datacamp.com/production/repositories/156/datasets/df69bb807d3c6bec45af9ef4d7708970f2a0760a/sp500.RData')]","['Introduction to R for Finance', 'Intermediate R for Finance']",https://www.datacamp.com/courses/introduction-to-portfolio-analysis-in-r,Applied Finance,R
170,Introduction to PySpark,4,0,45,"30,053","3,850",Introduction PySpark,"Introduction to PySpark
In this course, you'll learn how to use Spark from Python! Spark is a tool for doing parallel computation with large datasets and it integrates well with Python. PySpark is the Python package that makes the magic happen. You'll use this package to work with data about flights from Portland and Seattle. You'll learn to wrangle this data and build a whole machine learning pipeline to predict whether or not flights will be delayed. Get ready to put some Spark in your Python code and dive into the world of high-performance machine learning!
In this chapter, you'll learn how Spark manages data and how can you read and write tables from Python.
In this chapter, you'll learn about the pyspark.sql module, which provides optimized data queries to your Spark session.
PySpark has built-in, cutting-edge machine learning routines, along with utilities to create full machine learning pipelines. You'll learn about them in this chapter.
In this last chapter, you'll apply what you've learned to create a model that predicts which flights will be delayed.",[],"['Lore Dirick', 'Nick Solomon', 'Colin Ricardo']","[('Airports', 'https://assets.datacamp.com/production/repositories/1237/datasets/6e5c4ac2a4799338ba7e13d54ce1fa918da644ba/airports.csv'), ('Flights', 'https://assets.datacamp.com/production/repositories/1237/datasets/fa47bb54e83abd422831cbd4f441bd30fd18bd15/flights_small.csv'), ('Planes', 'https://assets.datacamp.com/production/repositories/1237/datasets/231480a2696c55fde829ce76d936596123f12c0c/planes.csv')]",['Introduction to Python'],https://www.datacamp.com/courses/introduction-to-pyspark,Other,Python
171,Introduction to R for Finance,4,14,62,"36,577","5,300",Introduction R Finance,"Introduction to R for Finance
In this finance-oriented introduction to R, you will learn essential data structures such as lists and data frames and have the chance to apply that knowledge to real-world financial examples. By the end of the course, you will be comfortable with the basics of manipulating your data to perform financial analysis in R.
Get comfortable with the very basics of R and learn how to use it as a calculator. Also, create your first variables in R and explore some of the base data types such as numerics and characters.
In this chapter, you will learn all about vectors and matrices using historical stock prices for companies like Apple and IBM. You will then be able to feel confident creating, naming, manipulating, and selecting from vectors and matrices.
Arguably the most important data structure in R, the data frame is what most of your data will take the form of. It combines the structure of a matrix with the flexibility of having different types of data in each column.
Questions with answers that fall into a limited number of categories can be classified as factors. In this chapter, you will use bond credit ratings to learn all about creating, ordering, and subsetting factors.
Wouldn't it be nice if there was a way to hold related vectors, matrices, or data frames together in R? In this final chapter, you will explore lists and many of their interesting features by building a small portfolio of stocks.","['Finance Basics with R', 'Quantitative Analyst with R']","['Lore Dirick', 'Davis Vaughan']",[],[],https://www.datacamp.com/courses/introduction-to-r-for-finance,Applied Finance,R
172,Introduction to Relational Databases in SQL,4,13,45,"19,740","3,600",Introduction Relational Databases in SQL,"Introduction to Relational Databases in SQL
You’ve already used SQL to query data from databases. But did you know that there's a lot more you can do with databases? You can model different phenomena in your data, as well as the relationships between them. This gives your data structure and consistency, which results in better data quality. In this course, you'll experience this firsthand by working with a real-life dataset that was used to investigate questionable university affiliations. Column by column, table by table, you'll get to unlock and admire the full potential of databases. You'll learn how to create tables and specify their relationships, as well as how to enforce data integrity. You'll also discover other unique features of database systems, such as constraints.
In this chapter, you'll create your very first database with a set of simple SQL commands. Next, you'll migrate data from existing flat tables into that database. You'll also learn how meta-information about a database can be queried.
After building a simple database, it's now time to make use of the features. You'll specify data types in columns, enforce column uniqueness, and disallow NULL values in this chapter.
Now let’s get into the best practices of database engineering. It's time to add primary and foreign keys to the tables. These are two of the most important concepts in databases, and are the building blocks you’ll use to establish relationships between tables.
In the final chapter, you'll leverage foreign keys to connect tables and establish relationships that will greatly benefit your data quality. And you'll run ad hoc analyses on your new database.","['Data Analyst with Python', 'Data Scientist with R', 'Data Scientist with Python', 'Python Programmer', 'SQL Server Fundamentals']","['Timo Grossenbacher', 'Mona Khalil', 'Chester Ismay', 'Sumedh Panchadhar']",[],['Intro to SQL for Data Science'],https://www.datacamp.com/courses/introduction-to-relational-databases-in-sql,Programming,SQL
173,Introduction to SQL Server,4,13,46,"6,174","3,850",Introduction SQL Server,"Introduction to SQL Server
SQL is an essential skill for data scientists, and Microsoft SQL Server is one of the world's most popular database systems. In this course, you'll start with simple SELECT statements, and refine these queries with ORDER BY and WHERE clauses. You'll learn how to group and aggregate your results, and also how to work with strings.  You'll also cover the most common way to join tables, how to create tables, and inserting and updating data. You'll mainly work with the Chinook digital media database, representing sales of various artists and tracks. The music theme continues with the classic rock and Eurovision datasets, while we also look at trends in power outages in the US. Overall, you'll become proficient in the most common data manipulation tasks in SQL Server and build a solid foundation for the upcoming T-SQL courses here on DataCamp!
Hit the ground running by learning the basics of SELECT  statements to retrieve data from one or more columns. 
You'll also learn how to apply filters to both numeric and text data, and sort the results.
Herein, you'll learn how to use important SQL Server aggregate functions such as SUM, COUNT, MIN, MAX, and AVG. Following that, you'll learn how to manipulate text fields. To round out the chapter, you'll power up your queries using GROUP BY and HAVING, which will enable you to perform more meaningful aggregations.
This chapter covers the basics of joining tables, using INNER, LEFT, and RIGHT joins,  so that you can confidently retrieve data from multiple sources.
In this final chapter,  you'll get really hands on! You've worked with existing tables, but in this chapter, you'll get to CREATE and INSERT data into them.  You'll also UPDATE existing records and practice DELETE  statements in a safe environment. This chapter ensures the course gives you a thorough introduction to the key aspects of working with data in SQL Server.",['SQL Server Fundamentals'],"['John MacKintosh', 'Mona Khalil', 'Yashas Roy']","[('Eurovision Song Contest', 'https://assets.datacamp.com/production/repositories/3633/datasets/609c953592798d7f61051376e75a2978e08a015d/eurovis.csv'), ('Power outages in USA', 'https://assets.datacamp.com/production/repositories/3633/datasets/013e3a1acf0c13d5c23e0d629b48ed8a648fe6c3/grid.csv'), ('Classic rock radio station songs', 'https://assets.datacamp.com/production/repositories/3633/datasets/a6edd56b896aa5a2692cba03622acd6b80a59ed2/songlist.csv'), ('Chinook Database', 'https://assets.datacamp.com/production/repositories/3633/datasets/e2649c4f08c28cd55f17f83a4f4290decda6ba00/Chinook.sql')]",[],https://www.datacamp.com/courses/introduction-to-sql-server,Programming,SQL
174,Introduction to Seaborn,4,14,44,"1,976","3,700",Introduction Seaborn,"Introduction to Seaborn
Seaborn is a powerful Python library that makes it easy to create informative and attractive visualizations. This course provides an introduction to Seaborn and teaches you how to visualize your data using plots such as scatter plots, box plots, and bar plots. You’ll do this while exploring survey responses about student hobbies and the factors that are associated with academic success. You’ll also learn about some of Seaborn’s advantages as a statistical visualization tool, such as how it automatically calculates confidence intervals. By the end of the course, you will be able to use Seaborn in a variety of situations to explore your data and effectively communicate the results of your data analyses to others.
What is Seaborn, and when should you use it? In this chapter, you will find out! Plus, you will learn how to create scatter plots and count plots with both lists of data and pandas DataFrames. You will also be introduced to one of the big advantages of using Seaborn - the ability to easily add a third variable to your plots by using color to represent different subgroups.
In this chapter, you will create and customize plots that visualize the relationship between two quantitative variables. To do this, you will use scatter plots and line plots to explore how the level of air pollution in a city changes over the course of a day and how horsepower relates to fuel efficiency in cars. You will also see another big advantage of using Seaborn - the ability to easily create subplots in a single figure!
Categorical variables are present in nearly every dataset, but they are especially prominent in survey data. In this chapter, you will learn how to create and customize categorical plots such as box plots, bar plots, count plots, and point plots. Along the way, you will explore survey data from young people about their interests, students about their study habits, and adult men about their feelings about masculinity.
In this final chapter, you will learn how to add informative plot titles and axis labels, which are one of the most important parts of any data visualization! You will also learn how to customize the style of your visualizations in order to more quickly orient your audience to the key takeaways. Then, you will put everything you have learned together for the final exercises of the course!",['Data Visualization with Python'],"['Erin Case', 'Mona Khalil', 'Yashas Roy']","[('Countries', 'https://assets.datacamp.com/production/repositories/3996/datasets/cd320cdd281edc2d2ce24058565d3d7090aa9708/countries-of-the-world.csv'), ('Mileage per gallon', 'https://assets.datacamp.com/production/repositories/3996/datasets/e0b285b89bdbfbbe8d81123e64727ff150d544e0/mpg.csv'), ('Students', 'https://assets.datacamp.com/production/repositories/3996/datasets/61e08004fef1a1b02b62620e3cd2533834239c90/student-alcohol-consumption.csv'), ('Survey responses', 'https://assets.datacamp.com/production/repositories/3996/datasets/ab13162732ae9ca1a9a27e2efd3da923ed6a4e7b/young-people-survey-responses.csv')]",['Introduction to Data Science in Python'],https://www.datacamp.com/courses/introduction-to-seaborn,Data Visualization,Python
175,Introduction to Shell for Data Science,4,0,55,"34,472","4,650",Introduction Shell Data Science,"Introduction to Shell for Data Science
The Unix command line has survived and thrived for almost 50 years because it lets people do complex things with just a few keystrokes. Sometimes called ""the universal glue of programming,"" it helps users combine existing programs in new ways, automate repetitive tasks, and run programs on clusters and clouds that may be halfway around the world. This course will introduce its key elements and show you how to use them efficiently.
This chapter is a brief introduction to the Unix shell. You'll learn why it is still in use after almost 50 years, how it compares to the graphical tools you may be more familiar with, how to move around in the shell, and how to create, modify, and delete files and folders.
The commands you saw in the previous chapter allowed you to move things around in the filesystem. This chapter will show you how to work with the data in those files. The tools we’ll use are fairly simple, but are solid building blocks.
The real power of the Unix shell lies not in the individual commands, but in how easily they can be combined to do new things. This chapter will show you how to use this power to select the data you want, and introduce commands for sorting values and removing duplicates.
Most shell commands will process many files at once. This chapter shows you how to make your own pipelines do that. Along the way, you will see how the shell uses variables to store information.
History lets you repeat things with just a few keystrokes, and pipes let you combine existing commands to create new ones. In this chapter, you will see how to go one step further and create new commands of your own.","['Data Scientist with R', 'Data Scientist with Python', 'Python Programmer', 'R Programmer']","['Greg Wilson', 'Filip Schouwenaars']",[],[],https://www.datacamp.com/courses/introduction-to-shell-for-data-science,Programming,Shell
176,Introduction to Spark SQL with Python,4,15,52,"1,237","4,200",Introduction Spark SQL Python,"Introduction to Spark SQL with Python
You're familiar with SQL, and have heard great things about Apache Spark. Then this course is for you! Apache Spark is a computing framework for processing big data. Spark SQL is a component of Apache Spark that works with tabular data. Window functions are an advanced feature of SQL that take Spark to a new level of usefulness.  You will use Spark SQL to analyze time series.  You will extract the most common sequences of words from a text document. You will create feature sets from natural language text and use them to predict the last word in a sentence using logistic regression. Spark combines the power of distributed computing with the ease of use of Python and SQL.

The course uses a natural language text dataset that is easy to understand. Sentences are sequences of words. Window functions are very suitable for manipulating sequence data. The same techniques taught here can be applied to sequences of song identifiers, video ids, or podcast ids.  Exercises include discovering frequent word sequences, and converting word sequences into machine learning feature set data for training a text classifier.
In this chapter you will learn how to create and query a SQL table in Spark.  Spark SQL brings the expressiveness of SQL to Spark.  You will also learn how to use SQL window functions in Spark.  Window functions perform a calculation across rows that are related to the current row.  They greatly simplify achieving results that are difficult to express using only joins and traditional aggregations.  We'll use window functions to perform running sums, running differences, and other operations that are challenging to perform in basic SQL.
In this chapter you will be loading natural language text.  Then you will apply a moving window analysis to find frequent word sequences.
In the previous chapters you learned how to use the expressiveness of window function SQL.  However, this expressiveness now makes it important that you understand how to properly cache dataframes and cache SQL tables.  It is also important to know how to evaluate your application.  You learn how to do do this using the Spark UI.  You'll also learn a best practice for logging in Spark. Spark SQL brings with it another useful tool for tuning query performance issues, the query execution plan. You will learn how to use the execution plan for evaluating the provenance of a dataframe.
Previous chapters provided you with the tools for loading raw text, tokenizing it, and extracting word sequences. This is already very useful for analysis, but it is also useful for machine learning. What you've learned now comes together by using logistic regression to classify text.  By the conclusion of this chapter, you will have loaded raw natural language text data and used it to train a text classifier.",[],"['Mark Plutowski', 'Hadrien Lacroix', 'Hillary Green-Lerman']","[('Sherlock (parquet file)', 'https://assets.datacamp.com/production/repositories/3937/datasets/de0a90e8f132c1f2846e70d7b5eec250923318d5/sherlock.parquet'), ('Sherlock (txt file)', 'https://assets.datacamp.com/production/repositories/3937/datasets/213ca262bf6af12428d42842848464565f3d5504/sherlock.txt'), ('Train schedule', 'https://assets.datacamp.com/production/repositories/3937/datasets/a367f6f461f670a364ab2a59afc25bc2e3fab157/trainsched.txt')]","['Introduction to PySpark', 'Intermediate SQL', 'Python Data Science Toolbox (Part 2)']",https://www.datacamp.com/courses/introduction-to-spark-sql,Data Manipulation,Python
177,Introduction to Spark in R using sparklyr,4,5,51,"11,620","4,700",Introduction Spark in R using sparklyr,"Introduction to Spark in R using sparklyr
R is mostly optimized to help you write data analysis code quickly and readably. Apache Spark is designed to analyze huge datasets quickly. The sparklyr package lets you write dplyr R code that runs on a Spark cluster, giving you the best of both worlds. This course teaches you how to manipulate Spark DataFrames using both the dplyr interface and the native interface to Spark, as well as trying machine learning techniques. Throughout the course, you'll explore the Million Song Dataset.
In which you learn how Spark and R complement each other, how to get data to and from Spark, and how to manipulate Spark data frames using dplyr syntax.
In which you learn more about using the dplyr interface to Spark, including advanced field selection, calculating groupwise statistics, and joining data frames.
In which you learn about Spark's machine learning data transformation features, and functionality for manipulating native DataFrames.
A case study in which you learn to use sparklyr's machine learning routines, by predicting the year in which a song was released.",['Big Data with R'],"['Richie Cotton', 'Tom Jeon', 'Sumedh Panchadhar']","[('Anti-join', 'https://assets.datacamp.com/production/course_3309/datasets/anti-join.png'), ('Both-model-responses', 'https://assets.datacamp.com/production/course_3309/datasets/both-model-responses.rds'), ('Gbt-model-responses', 'https://assets.datacamp.com/production/course_3309/datasets/gbt-model-responses.rds'), ('Inner-join', 'https://assets.datacamp.com/production/course_3309/datasets/inner-join.png'), ('Left-join', 'https://assets.datacamp.com/production/course_3309/datasets/left-join.png'), ('Predicted vs actual', 'https://assets.datacamp.com/production/course_3309/datasets/predicted_vs_actual.png'), ('Residual density', 'https://assets.datacamp.com/production/course_3309/datasets/residual_density.png'), ('Semi-join', 'https://assets.datacamp.com/production/course_3309/datasets/semi-join.png'), ('Timbre', 'https://assets.datacamp.com/production/course_3309/datasets/timbre.rds'), ('Timbre parquet', 'https://assets.datacamp.com/production/course_3309/datasets/timbre_parquet.zip'), ('Title text parquet', 'https://assets.datacamp.com/production/course_3309/datasets/title_text_parquet.zip'), ('Track data parquet', 'https://assets.datacamp.com/production/course_3309/datasets/track_data_parquet.zip'), ('Track data to model parquet', 'https://assets.datacamp.com/production/course_3309/datasets/track_data_to_model_parquet.zip'), ('Track data to predict parquet', 'https://assets.datacamp.com/production/course_3309/datasets/track_data_to_predict_parquet.zip'), ('Track metadata', 'https://assets.datacamp.com/production/course_3309/datasets/track_metadata.rds')]","['Introduction to R', 'Intermediate R', 'Data Manipulation in R with dplyr']",https://www.datacamp.com/courses/introduction-to-spark-in-r-using-sparklyr,Other,R
178,Introduction to TensorFlow in Python,4,15,51,"5,178","4,300",Introduction TensorFlow,"Introduction to TensorFlow in Python
Not long ago, cutting-edge computer vision algorithms couldn’t differentiate between images of cats and dogs. Today, a skilled data scientist equipped with nothing more than a laptop can classify tens of thousands of objects with greater accuracy than the human eye. In this course, you will use TensorFlow 2.0 to develop, train, and make predictions with the models that have powered major advances in recommendation systems, image classification, and FinTech. You will learn both high-level APIs, which will enable you to design and train deep learning models in 15 lines of code, and low-level APIs, which will allow you to move beyond off-the-shelf routines. You will also learn to accurately predict housing prices, credit card borrower defaults, and images of sign language gestures.
Before you can build advanced models in TensorFlow 2.0, you will first need to understand the basics. In this chapter, you’ll learn how to define constants and variables, perform tensor addition and multiplication, and compute derivatives. Knowledge of linear algebra will be helpful, but not necessary.
In this chapter, you will learn how to build, solve, and make predictions with models in TensorFlow 2.0. You will focus on a simple class of models – the linear regression model – and will try to predict housing prices. By the end of the chapter, you will know how to load and manipulate data, construct loss functions, perform minimization, make predictions, and reduce resource use with batch training.
The previous chapters taught you how to build models in TensorFlow 2.0. In this chapter, you will apply those same tools to build, train, and make predictions with neural networks. You will learn how to define dense layers, apply activation functions, select an optimizer, and apply regularization to reduce overfitting. You will take advantage of TensorFlow's flexibility by using both low-level linear algebra and high-level Keras API operations to define and train models.
In the final chapter, you'll use high-level APIs in TensorFlow 2.0 to train a sign language letter classifier. You will use both the sequential and functional Keras APIs to train, validate, make predictions with, and evaluate models. You will also learn how to use the Estimators API to streamline the model definition and training process, and to avoid errors.",[],"['Isaiah Hull', 'Mona Khalil', 'Sara Billen']","[('King County House Sales', 'https://assets.datacamp.com/production/repositories/3953/datasets/818cd72f8f36fb710e52074a41b0cfd322915eaa/kc_house_data.csv'), ('UCI Credit Card Default', 'https://assets.datacamp.com/production/repositories/3953/datasets/692b2c9bd91688f9f96882f10f0edf691ae32ec9/uci_credit_card.csv'), ('Sign Language MNIST', 'https://assets.datacamp.com/production/repositories/3953/datasets/556e17437b93b8cfada1f4c29ea5fb5833393dfc/slmnist.csv')]",['Supervised Learning with scikit-learn'],https://www.datacamp.com/courses/introduction-to-tensorflow-in-python,Machine Learning,Python
179,Introduction to Text Analysis in R,4,15,46,"3,745","3,850",Introduction Text Analysis in R,"Introduction to Text Analysis in R
From social media to product reviews, text is an increasingly important type of data across applications, including marketing analytics. In many instances, text is replacing other forms of unstructured data due to how inexpensive and current it is. However, to take advantage of everything that text has to offer, you need to know how to think about, clean, summarize, and model text. In this course, you will use the latest tidy tools to quickly and easily get started with text. You will learn how to wrangle and visualize text, perform sentiment analysis, and run and interpret topic models.
Since text is unstructured data, a certain amount of wrangling is required to get it into a form where you can analyze it. In this chapter, you will learn how to add structure to text by tokenizing, cleaning, and treating text as categorical data.
While counts are nice, visualizations are better. In this chapter, you will learn how to apply what you know from ggplot2 to tidy text data.
While word counts and visualizations suggest something about the content, we can do more. In this chapter, we move beyond word counts alone to analyze the sentiment or emotional valence of text.
In this final chapter, we move beyond word counts to uncover the underlying topics in a collection of documents. We will use a standard topic model known as latent Dirichlet allocation.","['Marketing Analytics with R', 'Text Mining with R']","['Marc Dotson', 'Chester Ismay', 'Sumedh Panchadhar']","[('Airline tweets', 'https://assets.datacamp.com/production/repositories/3741/datasets/dd0c82eb551b79b21841f05d29a17bece628374d/ch_1_twitter_data.rds'), ('Roomba reviews', 'https://assets.datacamp.com/production/repositories/3741/datasets/574700d2f56584039d50b7781337181c56855fb3/Roomba Reviews.csv')]",['Introduction to the Tidyverse'],https://www.datacamp.com/courses/introduction-to-text-analysis-in-r,Data Manipulation,R
180,Introduction to Time Series Analysis,4,16,58,"29,381","4,600",Introduction Time Series Analysis,"Introduction to Time Series Analysis
Many phenomena in our day-to-day lives, such as the movement of stock prices, are measured in intervals over a period of time. Time series analysis methods are extremely useful for analyzing these special data types. In this course, you will be introduced to some core time series analysis concepts and techniques.
This chapter will give you insights on how to organize and visualize time series data in R. You will learn several simplifying assumptions that are widely used in time series analysis, and common characteristics of financial time series.
In this chapter, you will conduct some trend spotting, and learn the white noise (WN) model, the random walk (RW) model, and the definition of stationary processes.
In this chapter, you will review the correlation coefficient, use it to compare two time series, and also apply it to compare a time series with its past, as an autocorrelation. You will discover the autocorrelation function (ACF) and practice estimating and visualizing autocorrelations for time series data.
In this chapter, you will learn the autoregressive (AR) model and several of its basic properties. You will also practice simulating and estimating the AR model in R, and compare the AR model with the random walk (RW) model.
In this chapter, you will learn the simple moving average (MA) model and several of its basic properties. You will also practice simulating and estimating the MA model in R, and compare the MA model with the autoregressive (AR) model.","['Quantitative Analyst with R', 'Time Series with R']","['David S. Matteson', 'Lore Dirick', 'Matt Isaacs']",[],"['Introduction to R', 'Intermediate R']",https://www.datacamp.com/courses/introduction-to-time-series-analysis,Probability & Statistics,R
181,Introduction to Time Series Analysis in Python,4,17,59,"18,765","4,850",Introduction Time Series Analysis,"Introduction to Time Series Analysis in Python
From stock prices to climate data, time series data are found in a wide variety of domains, and being able to effectively work with such data is an increasingly important skill for data scientists. This course will introduce you to time series analysis in Python. After learning about what a time series is, you'll learn about several time series models ranging from autoregressive and moving average models to cointegration models. Along the way, you'll learn how to estimate, forecast, and simulate these models using statistical libraries in Python. You'll see numerous examples of how these models are used, with a particular emphasis on applications in finance.
In this chapter you'll be introduced to the ideas of correlation and autocorrelation for time series. Correlation describes the relationship between two time series and autocorrelation describes the relationship of a time series with its past values.
In this chapter you'll learn about some simple time series models. These include white noise and a random walk.
In this chapter you'll learn about autoregressive, or AR, models for time series. These models use past values of the series to predict the current value.
In this chapter you'll learn about another kind of model, the moving average, or MA, model. You will also see how to combine AR and MA models into a powerful ARMA model.
This chapter will show you how to model two series jointly using cointegration models. Then you'll wrap up with a case study where you look at a time series of temperature data from New York City.",[],"['Rob Reider', 'Lore Dirick', 'Nick Solomon']","[('Financial time series datasets', 'https://assets.datacamp.com/production/repositories/1120/datasets/557c7532b976d021b8447b1dc365060fab79e609/data.zip'), ('UFO sightings', 'https://assets.datacamp.com/production/repositories/1120/datasets/c89d2158693af7d555a008198629d93aa059236e/UFO.csv'), ('New York temperature data', 'https://assets.datacamp.com/production/repositories/1120/datasets/850e5614f8de4b134f40b4eeca27c7d0b91aeed2/NOAA_TAVG.csv')]",['pandas Foundations'],https://www.datacamp.com/courses/introduction-to-time-series-analysis-in-python,Probability & Statistics,Python
182,Introduction to the Tidyverse,4,16,50,"79,086","4,150",Introduction Tidyverse,"Introduction to the Tidyverse
This is an introduction to the programming language R, focused on a powerful set of tools known as the Tidyverse. You'll learn the intertwined processes of data manipulation and visualization using the tools dplyr and ggplot2. You'll learn to manipulate data by filtering, sorting, and summarizing a real dataset of historical country data in order to answer exploratory questions. You'll then learn to turn this processed data into informative line plots, bar plots, histograms, and more with the ggplot2 package. You’ll get a taste of the value of exploratory data analysis and the power of Tidyverse tools. This is a suitable introduction for those who have no previous experience in R and are interested in performing data analysis.
In this chapter, you'll learn to do three things with a table: filter for particular observations, arrange the observations in a desired order, and mutate to add or change a column. You'll see how each of these steps allows you to answer questions about your data.
Often a better way to understand and present data as a graph. In this chapter, you'll learn the essential skills of data visualization using the ggplot2 package, and you'll see how the dplyr and ggplot2 packages work closely together to create informative graphs.
So far you've been answering questions about individual country-year pairs, but you may be interested in aggregations of the data, such as the average life expectancy of all countries within each year. Here you'll learn to use the group by and summarize verbs, which collapse large datasets into manageable summaries.
In this chapter, you'll learn how to create line plots, bar plots, histograms, and boxplots. You'll see how each plot requires different methods of data manipulation and preparation, and you’ll understand how each of these plot types plays a different role in data analysis.","['Data Analyst with R', 'Data Scientist with R', 'R Programmer', 'Tidyverse Fundamentals with R']","['David Robinson', 'Yashas Roy', 'Chester Ismay']","[('Gapminder', 'https://assets.datacamp.com/production/repositories/1323/datasets/578a87c7d1e46b61ec2cfc0aaa1e42acb3d03a11/gapminder.tsv')]",[],https://www.datacamp.com/courses/introduction-to-the-tidyverse,Programming,R
183,Joining Data in R with data.table,4,13,47,"2,486","3,950",Joining Data in R data.table,"Joining Data in R with data.table
In the real world, data sets typically come split across many tables while most data analysis functions in R are designed to work with single tables of data. In this course, you'll learn how to effectively combine data sets into single tables using data.table. You'll learn how to add columns from one table to another table,
how to filter a table based on observations in another table, and how to identify records across multiple tables matching complex criteria. Along the way, you'll learn how to troubleshoot failed join operations and best practices for working with complex data sets. After completing this course you'll be well on your way to be a data.table master!

This chapter will show you how to perform simple joins that will enable you to combine information spread across multiple tables.
In this chapter you will perform joins using the data.table syntax, set and view data.table keys, and perform anti-joins.
This chapter will discuss common problems and errors encountered when performing data.table joins and show you how to troubleshoot and avoid them.
In the last chapter of this course you'll learn how to concatenate observations from multiple tables together, how to identify observations present in one table but not another, and how to reshape tables between long and wide formats.","['Data Analyst with R', 'Data Manipulation with R']","['Scott Ritchie', 'Sascha Mayr', 'Sumedh Panchadhar', 'Eunkyung Park']","[('IMDB', 'https://assets.datacamp.com/production/repositories/1583/datasets/49bfe9dc4e96e0b38724839b641c4dbea82b6126/imdb_ratings.csv'), ('Netflix', 'https://assets.datacamp.com/production/repositories/1583/datasets/21a33688d76aa24f5f4e8ed26e764d421fd484e6/netflix_2017.csv'), ('Australia Population', 'https://assets.datacamp.com/production/repositories/1583/datasets/e572282707469dfb2579414d6025222835354713/australia_population.zip'), ('Life Expectancy', 'https://assets.datacamp.com/production/repositories/1583/datasets/86af32a8814b486a6e96e615e59502a11876999d/life_expectancy.zip'), ('School Database', 'https://assets.datacamp.com/production/repositories/1583/datasets/4583ada08ad260084645e1071f48ce1448e0c88f/school_db.zip'), ('Heart Disease', 'https://assets.datacamp.com/production/repositories/1583/datasets/5ec46922615ba8d5e807ca67fe1c85aa2f9be9fe/heart_data.zip'), ('Ebola Cases', 'https://assets.datacamp.com/production/repositories/1583/datasets/40904e97c7229725d36641b55edde4a313fed3e5/ebola_cases.zip'), ('GDP', 'https://assets.datacamp.com/production/repositories/1583/datasets/2ecdb8244008feaf3c3d286dab24b5b8dbf04b5f/gdp.zip')]",['Data Manipulation in R with data.table'],https://www.datacamp.com/courses/joining-data-in-r-with-datatable,Data Manipulation,R
184,Joining Data in SQL,5,13,53,"87,068","4,400",Joining Data in SQL,"Joining Data in SQL
Now that you've learned the basics of SQL in our Intro to SQL for Data Science course,  it's time to supercharge your queries using joins and relational set theory. In this course, you'll learn all about the power of joining tables while exploring interesting features of countries and their cities throughout the world. You will master inner and outer joins, as well as self joins, semi joins, anti joins and cross joins—fundamental tools in any PostgreSQL wizard's toolbox. Never fear set theory again after learning all about unions, intersections, and except clauses through easy-to-understand diagrams and examples. Lastly, you'll be introduced to the challenging topic of subqueries. You will be able to visually grasp these ideas by using Venn diagrams and other linking illustrations.
In this chapter, you'll be introduced to the concept of joining tables, and will explore the different ways you can enrich your queries using inner joins and self joins. You'll also see how to use the case statement to split up a field into different categories.
In this chapter, you'll come to grips with different kinds of outer joins. You'll learn how to gain further insights into your data through left joins, right joins, and full joins. In addition to outer joins, you'll also work with cross joins.
In this chapter, you'll learn more about set theory using Venn diagrams and get an introduction to union, union all, intersect, and except clauses. You'll finish by investigating semi joins and anti joins, which provide a nice introduction to subqueries.
In this closing chapter, you'll learn how to use nested queries and you'll use what you’ve learned in this course to solve three challenge problems.","['Data Analyst with Python', 'Data Analyst with R', 'Data Scientist with R', 'Data Scientist with Python', 'SQL Fundamentals', 'SQL Server Fundamentals']","['Chester Ismay', 'Colin Ricardo', 'Filip Schouwenaars']","[('Countries', 'https://assets.datacamp.com/production/repositories/1069/datasets/578834f5908e3b2fa575429a287586d1eaeb2e54/countries2.zip'), ('Leaders', 'https://assets.datacamp.com/production/repositories/1069/datasets/5aba4b2d25e3025de97d9715a022f5c24b74f347/leaders2.zip'), ('Diagrams', 'https://assets.datacamp.com/production/repositories/1069/datasets/379b79c12b968edafe24e4bc02fae89d090a9490/diagrams.zip')]",['Intro to SQL for Data Science'],https://www.datacamp.com/courses/joining-data-in-postgresql,Data Manipulation,SQL
185,Joining Data with dplyr in R,4,13,49,206,"4,200",Joining Data dplyr in R,"Joining Data with dplyr in R
Often in data science, you'll encounter fascinating data that is spread across multiple tables. This course will teach you the skills you'll need to join multiple tables together to analyze them in combination. You'll practice your skills using a fun dataset about LEGOs from the Rebrickable website. The dataset contains information about the sets, parts, themes, and colors of LEGOs, but is spread across many tables. You'll work with the data throughout the course as you learn a total of six different joins! You'll learn four mutating joins: inner join, left join, right join, and full join, and two filtering joins: semi join and anti join. In the final chapter, you'll apply your new skills to Stack Overflow data, containing each of the almost 300,000 Stack Oveflow questions that are tagged with R, including information about their answers, the date they were asked, and their score. Get ready to take your dplyr skills to the next level!
Get started with your first joining verb: inner-join! You'll learn to join tables together to answer questions about the LEGO dataset, which contains information across many tables about the sets, parts, themes, and colors of LEGOs over time.
Learn two more mutating joins, the left and right join, which are mirror images of each other! You'll learn use cases for each type of join as you explore parts and colors of LEGO themes. Then, you'll explore how to join tables to themselves to understand the hierarchy of LEGO themes in the data.
In this chapter, you'll cover three more joining verbs: full-join, semi-join, and anti-join. You'll then use these verbs to answer questions about the similarities and differences between a variety of LEGO sets.
Put together all the types of join you learned in this course to analyze a new dataset: Stack Overflow questions, answers, and tags. This includes calculating and visualizing trends for some notable tags like dplyr and ggplot2. You'll also master one more method for combining tables, the bind_rows verb, which stacks tables on top of each other.",[],"['Chris Cardillo', 'Amy Peterson']","[('sets', 'https://assets.datacamp.com/production/repositories/5284/datasets/2e7cb938873ba685957efd822867c86f46dc6b78/sets.rds'), ('themes', 'https://assets.datacamp.com/production/repositories/5284/datasets/267bcb026359fb2104bf4b717ae166d0bd99c5e6/themes.rds'), ('parts', 'https://assets.datacamp.com/production/repositories/5284/datasets/cb649926d41ce73490a9bb710e1501a273061723/parts.rds'), ('part_categories', 'https://assets.datacamp.com/production/repositories/5284/datasets/30fc459770c89e46cce9cce99752ca95fb1d06fe/part_categories.rds'), ('inventories', 'https://assets.datacamp.com/production/repositories/5284/datasets/2b509dd7a49493ab990580be1845f21f36c46ca0/inventories.rds'), ('inventory_parts', 'https://assets.datacamp.com/production/repositories/5284/datasets/a49d7bf17fc35fdd1331c01a7f36573800e93cb4/inventory_parts.rds'), ('colors', 'https://assets.datacamp.com/production/repositories/5284/datasets/aeeda0eaafe6b04c1e42da71a4e9fed7299d096e/colors.rds'), ('questions', 'https://assets.datacamp.com/production/repositories/5284/datasets/89d5a716b4f41dbe4fcda1a7a1190f24f58f0e47/questions.rds'), ('tags', 'https://assets.datacamp.com/production/repositories/5284/datasets/207c31b235786e73496fd7e58e416779911a9d98/tags.rds'), ('question_tags', 'https://assets.datacamp.com/production/repositories/5284/datasets/966938d665c69bffd87393b345ea2837a94bab97/question_tags.rds'), ('answers', 'https://assets.datacamp.com/production/repositories/5284/datasets/6cb9c039aa8326d98de37afefa32e1c458764638/answers.rds')]",['Introduction to the Tidyverse'],https://www.datacamp.com/courses/joining-data-with-dplyr-in-r,Data Manipulation,R
186,Linear Algebra for Data Science in R,4,15,56,"3,794","4,000",Linear Algebra Data Science in R,"Linear Algebra for Data Science in R
Linear algebra is one of the most important set of tools in applied mathematics and data science. In this course, you’ll learn how to work with vectors and matrices, solve matrix-vector equations, perform eigenvalue/eigenvector analyses and use principal component analysis to do dimension reduction on real-world datasets. All analyses will be performed in R, one of the world’s most-popular programming languages.
In this chapter, you will learn about the key objects in linear algebra, such as vectors and matrices. You will understand why they are important and how they interact with each other.
Many machine learning algorithms boil down to solving a matrix-vector equation.  In this chapter, you learn what matrix-vector equations are trying to accomplish and how to solve them in R.
Matrix operations are complex. Eigenvalue/eigenvector analyses allow you
to decompose these operations into simpler ones for the sake of image recognition, genomic analysis, and more!
“Big Data” is ubiquitous in data science and its applications.  However, redundancy in these datasets can be problematic.  In this chapter, we learn about principal component analysis and how it can be used in dimension reduction.",[],"['Eric Eager', 'Chester Ismay', 'David Campos', 'Shon Inouye']","[('NFL Player dataset', 'https://assets.datacamp.com/production/repositories/2654/datasets/760dae913f682ba6b2758207280138662ddedc0d/DataCampCombine.csv'), ('WNBA Massey Matrix dataset', 'https://assets.datacamp.com/production/repositories/2654/datasets/6bfadc8a2147bddbbaedafc8e21b8576cb4364ce/WNBA_Data_2017_M.csv'), ('WNBA Point Differentials dataset', 'https://assets.datacamp.com/production/repositories/2654/datasets/4e20e9adfd6514bd5b1bfb1464cd6da9fbbadfe9/WNBA_Data_2017_f.csv')]","['Introduction to R', 'Introduction to the Tidyverse']",https://www.datacamp.com/courses/linear-algebra-for-data-science-in-r,Probability & Statistics,R
187,Linear Classifiers in Python,4,13,44,"9,457","3,200",Linear Classifiers,"Linear Classifiers in Python
In this course you'll learn all about using linear classifiers, specifically logistic regression and support vector machines, with scikit-learn. Once you've learned how to apply these methods, you'll dive into the ideas behind them and find out what really makes them tick. At the end of this course you'll know how to train, test, and tune these linear classifiers in Python. You'll also have a conceptual foundation for understanding many other machine learning algorithms.
In this chapter you will learn the basics of applying logistic regression and support vector machines (SVMs) to classification problems. You'll use the scikit-learn library to fit classification models to real data.
In this chapter you will discover the conceptual framework behind logistic regression and SVMs. This will let you delve deeper into the inner workings of these models.
In this chapter you will delve into the details of logistic regression. You'll learn all about regularization and how to interpret model output.
In this chapter you will learn all about the details of support vector machines. You'll learn about tuning hyperparameters for these models and using kernels to fit non-linear decision boundaries.",['Machine Learning with Python'],"['Mike Gelbart', 'Nick Solomon', 'Kara Woo']",[],['Supervised Learning with scikit-learn'],https://www.datacamp.com/courses/linear-classifiers-in-python,Machine Learning,Python
188,Loan Amortization in Spreadsheets,4,13,56,41,"4,800",Loan Amortization in Spreadsheets,"Loan Amortization in Spreadsheets
A loan amortization schedule sounds like something that's only used by bankers and financial traders, right? Wrong! In this course, we'll be looking at the key financial formulas in Google Sheets that you can use to investigate your own loans, like student loans, car loans, and mortgages. We'll build up a dashboard in Google Sheets which uses visualizations and conditional formulas to produce presentation-ready spreadsheets which will impress any finance manager!
In this first chapter, you will learn all the basic financial formulas in Google Sheets that are needed to build up your first loan amortization spreadsheet for a student loan. This chapter will introduce calculations for principal payment, interest and principal at a given point in time.
This chapter is focused on extending the payment formulas to the full length of a loan. By the end of the chapter, you will be able to create a fully functional schedule and will be able to verify the accuracy of the calculations on the schedule.
This chapter is about taking the amortization schedule that you created in Chapter 2 and converting it into a fully functional loan dashboard which can be used by end users. You will create line and bar graphs, as well as using input controls and cell protection to ensure that end users will only be able to change what you want them to change!
The final chapter introduces real-world adjustments which are made to amortization schedules. These sorts of adjustments include upfront fees and lump sum payments. The course finishes by talking about floating rate mortgages, the maximum interest rate on floating loans and negative amortization.",[],"['Brent Allen', 'Chester Ismay', 'Marianna Lamnina']",[],['Intermediate Spreadsheets for Data Science'],https://www.datacamp.com/courses/loan-amortization-in-spreadsheets,Applied Finance,Spreadsheets
189,Longitudinal Analysis in R,4,13,49,"1,200","3,950",Longitudinal Analysis in R,"Longitudinal Analysis in R
What is longitudinal data and how can you analyze it? Here you will learn all about this kind of data and the descriptive analyses that can be used to explore it! You will also learn to model continuous and binary outcome variables. Linear mixed effects models will be used as a modern approach to modeling this kind of data, taking into account the correlated nature of it. For binary outcomes, generalized estimating equations will be introduced as an alternative to the generalized linear mixed models. Visualizations are used throughout the course to interpret model results and strategies for model selection are also explored. Along the way, you will use data from a number of longitudinal studies, including the Madras and Calcium datasets.
This chapter introduces the user to longitudinal data. Exploration of what is and what isn't longitudinal data, exploration of the dependent data structure, and other numeric summaries of the data will be covered in this chapter.
Chapter 2 will model continuous longitudinal outcomes with lme4. These observed score mixed models are common in the analysis of longitudinal data.
This chapter will further explore adding additional predictors to the longitudinal model. These predictors, referred to as fixed effects, allow different trajectories based on variable characteristics.
This chapter will shift from continuous to binary outcomes. Binary outcomes are ones in which the outcome are in two categories. Special considerations for this outcome are needed to appropriately model the data and receive valid statistical results.",[],"['DataCamp Content Creator', 'Sascha Mayr', 'Amy Peterson']",[],['Working with Data in the Tidyverse'],https://www.datacamp.com/courses/longitudinal-analysis-in-r,Probability & Statistics,R
190,Machine Learning Toolbox,4,24,88,"33,362","6,200",Machine Learning Toolbox,"Machine Learning Toolbox
Machine learning is the study and application of algorithms that learn from and make predictions on data. From search results to self-driving cars, it has manifested itself in all areas of our lives and is one of the most exciting and fast growing fields of research in the world of data science. This course teaches the big ideas in machine learning: how to build and evaluate predictive models, how to tune them for optimal performance, how to preprocess data for better results, and much more. The popular caret R package, which provides a consistent interface to all of R's most powerful machine learning facilities, is used throughout the course.
In the first chapter of this course, you'll fit regression models with train() and evaluate their out-of-sample performance using cross-validation and root-mean-square error (RMSE).
In this chapter, you'll fit classification models with train() and evaluate their out-of-sample performance using cross-validation and area under the curve (AUC).
In this chapter, you will use the train() function to tweak model parameters through cross-validation and grid search.
In this chapter, you will practice using train() to preprocess data before fitting models, improving your ability to making accurate predictions.
In the final chapter of this course, you'll learn how to use resamples() to compare multiple models and select (or ensemble) the best one(s).","['Data Scientist with R', 'Machine Learning Fundamentals in R']","['Zachary Deane-Mayer', 'Max Kuhn', 'Nick Carchedi', 'Tom Jeon']","[('Diamonds', 'https://assets.datacamp.com/production/repositories/223/datasets/3c40c5115e2e4ea8f44290cbdaf41672a415b6ca/Diamonds.RData'), ('Sonar', 'https://assets.datacamp.com/production/repositories/223/datasets/8d02389dadccc3ac563dd406272b68b718fe0906/Sonar.RData'), ('Wine', 'https://assets.datacamp.com/production/repositories/223/datasets/bc9cb62d94940056f0554a307d1409274aa26d19/wine_100.RDS'), ('Overfit data', 'https://assets.datacamp.com/production/repositories/223/datasets/0bd5f7c30d9aec3e1f1fa677a19bee3af407453a/overfit.csv'), ('Breast Cancer', 'https://assets.datacamp.com/production/repositories/223/datasets/810f476d8487fc38efca3244c0db83b43e3ecfd3/BreastCancer.RData'), ('Blood-brain', 'https://assets.datacamp.com/production/repositories/223/datasets/cfe043817f5b9296ccb9f616b7ab99612c92aa96/BloodBrain.RData'), ('Churn', 'https://assets.datacamp.com/production/repositories/223/datasets/3eee876775d01dec167ff8ab1e8ea61a6065d6b7/Churn.RData')]","['Introduction to R', 'Intermediate R', 'Correlation and Regression']",https://www.datacamp.com/courses/machine-learning-toolbox,Machine Learning,R
191,Machine Learning for Finance in Python,4,15,59,"8,880","5,150",Machine Learning Finance,"Machine Learning for Finance in Python
Time series data is all around us; some examples are the weather, human behavioral patterns as consumers and members of society, and financial data. In this course, you'll learn how to calculate technical indicators from historical stock data, and how to create features and targets out of the historical stock data. You'll understand how to prepare our features for linear models, xgboost models, and neural network models. We will then use linear models, decision trees, random forests, and neural networks to predict the future price of stocks in the US markets. You will also learn how to evaluate the performance of the various models we train in order to optimize them, so our predictions have enough accuracy to make a stock trading strategy profitable.
In this chapter, we will learn how machine learning can be used in finance.  We will also explore some stock data, and prepare it for machine learning algorithms.  Finally, we will fit our first machine learning model -- a linear model, in order to predict future price changes of stocks.
Learn how to use tree-based machine learning models to predict future values of a stock's price, as well as how to use forest-based machine learning methods for regression and feature selection.
We will learn how to normalize and scale data for use in KNN and neural network methods.  Then we will learn how to use KNN and neural network regression to predict the future values of a stock's price (or any other regression problem).
In this chapter, you'll learn how to use modern portfolio theory (MPT) and the Sharpe ratio to plot and find optimal stock portfolios.  You'll also use machine learning to predict the best portfolios.  Finally, you'll evaluate performance of the ML-predicted portfolios.",[],"['Nathan George', 'Chester Ismay', 'David Campos', 'Shon Inouye']","[('NASDAQ: AAPL', 'https://assets.datacamp.com/production/repositories/2168/datasets/3748d04d5afec762dab8d26dee3844c0f8f8f540/AAPL.csv'), ('NASDAQ: AMD', 'https://assets.datacamp.com/production/repositories/2168/datasets/91af558c2465d48fb15c687a2d22a5abf2b64984/AMD.csv'), ('QQQ ETF', 'https://assets.datacamp.com/production/repositories/2168/datasets/33592b2d41dd5448a92994f63b84a68b669fadc7/QQQ.csv'), ('SPY', 'https://assets.datacamp.com/production/repositories/2168/datasets/0e72705a29abc4ad16fd29acd0304570ea27111e/SPY.csv'), ('LNG', 'https://assets.datacamp.com/production/repositories/2168/datasets/b1c089b35aa59a7290964640af811125f74fe3ec/LNG.csv'), ('SMLV', 'https://assets.datacamp.com/production/repositories/2168/datasets/ef577d4f7dc35a1222db1b63fb4525b45023b4b9/SMLV.csv')]",['Supervised Learning with scikit-learn'],https://www.datacamp.com/courses/machine-learning-for-finance-in-python,Machine Learning,Python
192,Machine Learning for Time Series Data in Python,4,13,53,"8,299","4,550",Machine Learning Time Series Data,"Machine Learning for Time Series Data in Python
Time series data is ubiquitous. Whether it be stock market fluctuations, sensor data recording climate change, or activity in the brain, any signal that changes over time can be described as a time series. Machine learning has emerged as a powerful method for leveraging complexity in data in order to generate predictions and insights into the problem one is trying to solve. This course is an intersection between these two worlds of machine learning and time series data, and covers feature engineering, spectograms, and other advanced techniques in order to classify heartbeat sounds and predict stock prices. 
This chapter is an introduction to the basics of machine learning, time series data, and the intersection between the two.
The easiest way to incorporate time series into your machine learning pipeline is to use them as features in a model. This chapter covers common features that are extracted from time series in order to do machine learning.
If you want to predict patterns from data over time, there are special considerations to take in how you choose and construct your model. This chapter covers how to gain insights into the data before fitting your model, as well as best-practices in using predictive modeling for time series data.
Once you've got a model for predicting time series data, you need to decide if it's a good or a bad model. This chapter coves the basics of generating predictions with models in order to validate them against ""test"" data.",[],"['Chris Holdgraf', 'Lore Dirick', 'Sumedh Panchadhar', 'Eunkyung Park']",[],"['Manipulating Time Series Data in Python', 'Visualizing Time Series Data in Python', 'Supervised Learning with scikit-learn']",https://www.datacamp.com/courses/machine-learning-for-time-series-data-in-python,Machine Learning,Python
193,Machine Learning in the Tidyverse,5,15,52,"3,824","4,300",Machine Learning in Tidyverse,"Machine Learning in the Tidyverse
This course will teach you to leverage the tools in the ""tidyverse"" to generate, explore, and evaluate machine learning models. Using a combination of tidyr and purrr packages, you will build a foundation for how to work with complex model objects in a ""tidy"" way. You will also learn how to leverage the broom package to explore your resulting models. You will then be introduced to the tools in the test-train-validate workflow, which will empower you evaluate the performance of both classification and regression models as well as provide the necessary information to optimize model performance via hyperparameter tuning.
This chapter will introduce you to the backbone of machine learning in the tidyverse, the List Column Workflow (LCW). The LCW will empower you to work with many models in one dataframe.  This chapter will also introduce you to the fundamentals of the broom package for exploring your models.
This chapter leverages the List Column Workflow to build and explore the attributes of 77 models. You will use the tools from the broom package to gain a multidimensional understanding of all of these models.
In this chapter you will learn how to use the List Column Workflow to build, tune and evaluate regression models. You will have the chance to work with two types of models: linear models and random forest models.
In this chapter you will shift gears to build, tune and evaluate classification models.","['Data Scientist with R', 'Intermediate Tidyverse Toolbox']","['Dmitriy Gorenshteyn', 'Chester Ismay', 'Sumedh Panchadhar', 'Eunkyung Park']","[('Gapminder', 'https://assets.datacamp.com/production/repositories/2524/datasets/3a9e81711749a5c094920899b88bed24c0cf8d39/gapminder.rds'), ('Attrition', 'https://assets.datacamp.com/production/repositories/2524/datasets/6d1f408987b85d0a23dbfe92ff37eee36f411244/attrition.rds')]",['Modeling with Data in the Tidyverse'],https://www.datacamp.com/courses/machine-learning-in-the-tidyverse,Machine Learning,R
194,Machine Learning with Apache Spark,4,16,56,"2,575","4,550",Machine Learning Apache Spark,"Machine Learning with Apache Spark
Spark is a powerful, general purpose tool for working with Big Data. Spark transparently handles the distribution of compute tasks across a cluster. This means that operations are fast, but it also allows you to focus on the analysis rather than worry about technical details. In this course you'll learn how to get data into Spark and then delve into the three fundamental Spark Machine Learning algorithms: Linear Regression, Logistic Regression/Classifiers, and creating pipelines. Along the way you'll analyse a large dataset of flight delays and spam text messages. With this background you'll be ready to harness the power of Spark and apply it on your own Machine Learning projects!
Spark is a framework for working with Big Data. In this chapter you'll cover some background about Spark and Machine Learning. You'll then find out how to connect to Spark using Python and load CSV data.
Now that you are familiar with getting data into Spark, you'll move onto building two types of classification model: Decision Trees and Logistic Regression. You'll also find out about a few approaches to data preparation.
Next you'll learn to create Linear Regression models. You'll also find out how to augment your data by engineering new predictors as well as a robust approach to selecting only the most relevant predictors.
Finally you'll learn how to make your models more efficient. You'll find out how to use pipelines to make your code clearer and easier to maintain. Then you'll use cross-validation to better test your models and select good model parameters. Finally you'll dabble in two types of ensemble model.",[],"['Andrew Collier', 'Hadrien Lacroix', 'Mona Khalil']","[('Flights', 'https://assets.datacamp.com/production/repositories/3918/datasets/c601f67a55e03400acb6c72ac0937c4b95906e88/flights.csv'), ('SMS', 'https://assets.datacamp.com/production/repositories/3918/datasets/9da06fbd2f1decf3b91e2dbcb08eb0c44aa4d18e/sms.csv')]","['Introduction to PySpark', 'Statistical Thinking in Python (Part 1)']",https://www.datacamp.com/courses/machine-learning-with-apache-spark,Machine Learning,Python
195,Machine Learning with Tree-Based Models in Python,5,15,57,"10,765","4,650",Machine Learning Tree-Based Models,"Machine Learning with Tree-Based Models in Python
Decision trees are supervised learning models used for problems involving classification and regression. Tree models present a high flexibility that comes at a price: on one hand, trees are able to capture complex non-linear relationships; on the other hand, they are prone to memorizing the noise present in a dataset. By aggregating the predictions of trees that are trained differently, ensemble methods take advantage of the flexibility of trees while reducing their tendency to memorize noise. Ensemble methods are used across a variety of fields and have a proven track record of winning many machine learning competitions.
In this course, you'll learn how to use Python to train decision trees and tree-based models with the user-friendly scikit-learn machine learning library. You'll understand the advantages and shortcomings of trees and demonstrate how ensembling can alleviate these shortcomings, all while practicing on real-world datasets. Finally, you'll also understand how to tune the most influential hyperparameters in order to get the most out of your models.
Classification and Regression Trees (CART) are a set of supervised learning models used for problems involving classification and regression. In this chapter, you'll be introduced to the CART algorithm.
The bias-variance tradeoff is one of the fundamental concepts in supervised machine learning. In this chapter, you'll understand how to diagnose the problems of overfitting and underfitting. You'll also be introduced to the concept of ensembling where the predictions of several models are aggregated to produce predictions that are more robust.
Bagging is an ensemble method involving training the same algorithm many times using different subsets sampled from the training data. In this chapter, you'll understand how bagging can be used to create a tree ensemble. You'll also learn how the random forests algorithm can lead to further ensemble diversity through randomization at the level of each split in the trees forming the ensemble.
Boosting refers to an ensemble method in which several models are trained sequentially with each model learning from the errors of its predecessors. In this chapter, you'll be introduced to the two boosting methods of AdaBoost and Gradient Boosting.
The hyperparameters of a machine learning model are parameters that are not learned from data. They should be set prior to fitting the model to the training set. In this chapter, you'll learn how to tune the hyperparameters of a tree-based model using grid search cross validation.",['Data Scientist with Python'],"['Elie Kawerk', 'Kara Woo', 'Eunkyung Park', 'Sumedh Panchadhar']","[('Auto-mpg', 'https://assets.datacamp.com/production/repositories/1796/datasets/3781d588cf7b04b1e376c7e9dda489b3e6c7465b/auto.csv'), ('Bike Sharing Demand', 'https://assets.datacamp.com/production/repositories/1796/datasets/594538f54a854b322d6e4c8031f3f31bc522d3e5/bikes.csv'), ('Wisconsin Breast Cancer', 'https://assets.datacamp.com/production/repositories/1796/datasets/0eb6987cb9633e4d6aa6cfd11e00993d2387caa4/wbc.csv'), ('Indian Liver Patient', 'https://assets.datacamp.com/production/repositories/1796/datasets/24126c0cd9d2bd1ca0e72446c2caa40b222193d6/indian_liver_patient.zip')]",['Supervised Learning with scikit-learn'],https://www.datacamp.com/courses/machine-learning-with-tree-based-models-in-python,Machine Learning,Python
196,Machine Learning with Tree-Based Models in R,4,20,58,"10,201","4,450",Machine Learning Tree-Based Models in R,"Machine Learning with Tree-Based Models in R
In this course you'll learn how to work with tree-based models in R. This course covers everything from using a single tree for regression or classification to more advanced ensemble methods. You'll learn to implement bagged trees, Random Forests, and boosted trees using the Gradient Boosting Machine, or GBM. These powerful techinques will allow you to create high performance regression and classification models for your data.
This chapter covers supervised machine learning with classification trees.
In this chapter you'll learn how to use a single tree for regression, instead of classification.
In this chapter, you will learn about Bagged Trees, an ensemble method, that uses a combination of trees (instead of only one).
In this chapter, you will learn about the Random Forest algorithm, another tree-based ensemble method.  Random Forest is a modified version of bagged trees with better performance. Here you'll learn how to train, tune and evaluate Random Forest models in R.
In this chapter, you will see the boosting methodology with a focus on the Gradient Boosting Machine (GBM) algorithm, another popular tree-based ensemble method. Here you'll learn how to train, tune and evaluate GBM models in R.",[],"['Erin LeDell', 'Gabriela de Queiroz', 'Nick Carchedi', 'Nick Solomon']","[('Credit data', 'https://assets.datacamp.com/production/repositories/710/datasets/b649085c43111c83ba7ab6ec172d83cdc14a2942/credit.csv'), ('Grade data', 'https://assets.datacamp.com/production/repositories/710/datasets/3d720e80d1ad70a88322c2175fa0e6041761a5f9/grade.csv')]",['Introduction to R'],https://www.datacamp.com/courses/machine-learning-with-tree-based-models-in-r,Machine Learning,R
197,Machine Learning with the Experts: School Budgets,4,15,51,"27,058","3,800",Machine Learning Experts: School Budgets,"Machine Learning with the Experts: School Budgets
Data science isn't just for predicting ad-clicks-it's also useful for social impact! This course is a case study from a machine learning competition on DrivenData. You'll explore a problem related to school district budgeting. By building a model to automatically classify items in a school's budget, it makes it easier and faster for schools to compare their spending with other schools. In this course, you'll begin by building a baseline model that is a simple, first-pass approach. In particular, you'll do some natural language processing to prepare the budgets for modeling. Next, you'll have the opportunity to try your own techniques and see how they compare to participants from the competition. Finally, you'll see how the winner was able to combine a number of expert techniques to build the most accurate model.
In this chapter, you'll be introduced to the problem you'll be solving in this course. How do you accurately classify line-items in a school budget based on what that money is being used for? You will explore the raw text and numeric values in the dataset, both quantitatively and visually. And you'll learn how to measure success when trying to predict class labels for each row of the dataset.
In this chapter, you'll build a first-pass model. You'll use numeric data only to train the model. Spoiler alert - throwing out all of the text data is bad for performance! But you'll learn how to format your predictions. Then, you'll be introduced to natural language processing (NLP) in order to start working with the large amounts of text in the data.
Here, you'll improve on your benchmark model using pipelines. Because the budget consists of both text and numeric data, you'll learn to how build pipielines that process multiple types of data. You'll also explore how the flexibility of the pipeline workflow makes testing different approaches efficient, even in complicated problems like this one!
In this chapter, you will learn the tricks used by the competition winner, and implement them yourself using scikit-learn. Enjoy!","['Data Scientist with Python', 'Machine Learning with Python']","['Peter Bull', 'Hugo Bowne-Anderson', 'Yashas Roy', 'Casey Fitzpatrick']",[],['Supervised Learning with scikit-learn'],https://www.datacamp.com/courses/machine-learning-with-the-experts-school-budgets,Case Studies,Python
198,Manipulating DataFrames with pandas,4,19,75,"58,391","6,300",Manipulating DataFrames pandas,"Manipulating DataFrames with pandas
In this course, you'll learn how to leverage pandas' extremely powerful data manipulation engine to get the most out of your data. You’ll learn how to drill into the data that really matters by extracting, filtering, and transforming data from DataFrames. The pandas library has many techniques that make this process efficient and intuitive. You will learn how to tidy, rearrange, and restructure your data by pivoting or melting and stacking or unstacking DataFrames. These are all fundamental next steps on the road to becoming a well-rounded data scientist, and you will have the chance to apply all the concepts you learn to real-world datasets.
In this chapter, you will learn how to index, slice, filter, and transform DataFrames using a variety of datasets, ranging from 2012 US election data for the state of Pennsylvania to Pittsburgh weather data.
Having learned the fundamentals of working with DataFrames, you will now move on to more advanced indexing techniques. You will learn about MultiIndexes, or hierarchical indexes, and learn how to interact with and extract data from them.
Here, you will learn how to reshape your DataFrames using techniques such as pivoting, melting, stacking, and unstacking. These are powerful techniques that allow you to tidy and rearrange your data into the optimal format for data analysis.
In this chapter, you'll learn how to identify and split DataFrames by groups or categories for further aggregation or analysis. You'll also learn how to transform and filter your data, and how to detect outliers and impute missing values. Knowing how to effectively group data in pandas can be a seriously powerful addition to your data science toolbox.
We’ll bring together everything you have learned in this course while working with data recorded from the Summer Olympic games that goes as far back as 1896! This is a rich dataset that will allow you to fully apply the data manipulation techniques you have learned. You will pivot, unstack, group, slice, and reshape your data as you explore this dataset and uncover some truly fascinating insights.","['Data Analyst with Python', 'Data Manipulation with Python', 'Data Scientist with Python', 'Python Programmer']","['Team Anaconda', 'Yashas Roy', 'Hugo Bowne-Anderson']","[('Olympic medals', 'https://assets.datacamp.com/production/repositories/502/datasets/bf22326ecc9171f68796ad805a7c1135288120b6/all_medalists.csv'), ('Gapminder', 'https://assets.datacamp.com/production/repositories/502/datasets/09378cc53faec573bcb802dce03b01318108a880/gapminder_tidy.csv'), ('2012 US election results (Pennsylvania)', 'https://assets.datacamp.com/production/repositories/502/datasets/502f4eedaf44ad1c94b3595c7691746f282e0b0a/pennsylvania2012_turnout.csv'), ('Pittsburgh weather data', 'https://assets.datacamp.com/production/repositories/502/datasets/6c4984cb81ea50971c1660434cc4535a6669a848/pittsburgh2013.csv'), ('Sales', 'https://assets.datacamp.com/production/repositories/502/datasets/4c6d3be9e8640e2d013298230c415d3a2a2162d4/sales.zip'), ('Titanic', 'https://assets.datacamp.com/production/repositories/502/datasets/e280ed94bf4539afb57d8b1cbcc14bcf660d3c63/titanic.csv'), ('Users', 'https://assets.datacamp.com/production/repositories/502/datasets/eaf29468b9fbaad454a74d3c2b59b36e5ab4558b/users.csv')]","['Introduction to Python', 'Intermediate Python for Data Science', 'pandas Foundations']",https://www.datacamp.com/courses/manipulating-dataframes-with-pandas,Data Manipulation,Python
199,Manipulating Time Series Data in Python,4,16,55,"11,038","4,700",Manipulating Time Series Data,"Manipulating Time Series Data in Python
In this course you'll learn the basics of manipulating time series data. Time series data are data that are indexed by a sequence of dates or times. You'll learn how to use methods built into Pandas to work with this index. You'll also learn how resample time series to change the frequency. This course will also show you how to calculate rolling and cumulative values for times series. Finally, you'll use all your new skills to build a value-weighted stock index from actual stock data.
This chapter lays the foundations to leverage the powerful time series functionality made available by how Pandas represents dates, in particular by the DateTimeIndex. You will learn how to create and manipulate date information and time series, and how to do calculations with time-aware DataFrames to shift your data in time or create period-specific returns.
This chapter dives deeper into the essential time series functionality made available through the pandas DataTimeIndex. It introduces resampling and how to compare different time series by normalizing their start points.
This chapter will show you how to use window function to calculate time series metrics for both rolling and expanding windows.
This chapter combines the previous concepts by teaching you how to create a value-weighted index. This index uses market-cap data contained in the stock exchange listings to calculate weights and 2016 stock price information. Index performance is then compared against benchmarks to evaluate the performance of the index you created.",[],"['Stefan Jansen', 'Lore Dirick', 'Nick Solomon']","[('Air quality data', 'https://assets.datacamp.com/production/repositories/1130/datasets/416e7c98d4bfc23130d4f0ab4c1a216d479429bd/air_quality_data.zip'), ('Stock data', 'https://assets.datacamp.com/production/repositories/1130/datasets/81c22b266e69254a1cc2646c63ec73e20c7c84a3/stock_data.zip')]","['Introduction to Python', 'Intermediate Python for Data Science']",https://www.datacamp.com/courses/manipulating-time-series-data-in-python,Data Manipulation,Python
200,Manipulating Time Series Data in R with xts & zoo,4,15,55,"25,717","4,500",Manipulating Time Series Data in R xts & zoo,"Manipulating Time Series Data in R with xts & zoo
Time series are all around us, from server logs to high frequency financial data. Managing and manipulating ordered observations is central to all time series analysis. The xts and zoo packages provide a set of powerful tools to make this task fast and mistake free. In this course, you will learn everything from the basics of xts to advanced tips and tricks for working with time series data in R.
xts and zoo are just two of the many different types of objects that exist in R. This chapter will introduce the basic objects in xts and zoo and their components, and offers examples of how to construct and examine the data.
Now that you can create basic xts objects, it's time to see how powerful they can be. This chapter will cover the basics of one of the most useful features of xts: time based subsetting. From there you'll explore additional ways to extract data using time phrases, and conclude with how to do basic operations like adding and subtracting your xts objects.
One of the most important parts of working with time series data involves creating derived time series. To do this effectively, it is critical to keep track of dates and times. In this chapter you will look at how xts handles merging new columns and rows into existing data, how to deal with the inevitable missing observations in time series, and how to shift your series in time.
Now the fun begins! A very common usage pattern for time series is to calculate values for disjoint periods of time or aggregate values from a higher frequency to a lower frequency. For most series, you'll often want to see the weekly mean of a price or measurement. You may even find yourself looking at data that has different frequencies and you need to normalize to the lowest frequency. This chapter is where it all happens. Hang tight, and lets get going!
Now that you are comfortable with most of the core features, its time to explore some of the lesser known (but powerful!) aspects of working with xts. In this final chapter you will use the internals of the index to find repeating itervals, see how xts provides intuitive time zone support, and experiment with ways to explore your data by time - including identifying frequency and coverage in time. Let's finish this course!","['Finance Basics with R', 'Quantitative Analyst with R', 'Time Series with R']","['Jeffrey Ryan', 'Lore Dirick']","[('Chicago summer temperature data', 'https://assets.datacamp.com/production/repositories/283/datasets/3914fa34c958b2b651d7586126fe8746ada0aaaf/Temps.csv'), ('Daily USD/EUR exchange rate', 'https://assets.datacamp.com/production/repositories/283/datasets/a52f05c476188854ee0cfb5ea60cf06b35ca7319/USDEUR.csv')]","['Introduction to R for Finance', 'Intermediate R for Finance']",https://www.datacamp.com/courses/manipulating-time-series-data-in-r-with-xts-zoo,Data Manipulation,R
201,Manipulating Time Series Data in R: Case Studies,4,12,50,"6,983","3,950",Manipulating Time Series Data in R: Case Studies,"Manipulating Time Series Data in R: Case Studies
This follow-up course on manipulating time series data in R does not cover new data manipulation concepts. Instead, you will strengthen your knowledge of the topics covered in Manipulating Time Series Data in R with xts & zoo using new exercises and interesting datasets.

You've been hired to understand the travel needs of tourists visiting the Boston area. As your first assignment on the job, you'll practice the  skills you've learned for time series data manipulation in R by exploring  data on flights arriving at Boston's Logan International Airport (BOS)  using xts & zoo.
In this chapter, you'll expand your time series data library to include weather data in the Boston area. Before you can conduct any analysis, you'll need to do some data manipulation, including merging multiple xts objects  and isolating certain periods of the data. It's a great opportunity for  more practice!
Now it's time to go further afield. In addition to flight delays, your client is interested in how Boston's tourism industry is affected by economic trends. You'll need to manipulate some time series data on economic indicators,  including GDP per capita and unemployment in the United States in general and  Massachusetts (MA) in particular.
Having exhausted other options, your client now believes Boston's tourism industry must be related to the success of local sports teams. In your final task on this project, your supervisor has asked you to assemble some time series data on Boston's sports teams over the past few years.","['Quantitative Analyst with R', 'Time Series with R']","['Lore Dirick', 'Matt Isaacs']","[('Flights arriving at Boston Logan airport', 'https://assets.datacamp.com/production/repositories/647/datasets/af4f7a6864df16c66b8e5d46ea9f1b39b451cbf4/flights.RData'), ('Boston monthly average visibility', 'https://assets.datacamp.com/production/repositories/647/datasets/3cee4e307ee445edcc80c780a6d83d78a55a798d/vis.RData'), ('Wind speeds in Boston', 'https://assets.datacamp.com/production/repositories/647/datasets/56cdb9eb7aebc102add00144fc4e083353a190b8/wind.RData'), ('Boston monthly temperature data', 'https://assets.datacamp.com/production/repositories/647/datasets/e094e8aa1685b83dfbeb3accc0a5bdab9bd3f652/temps_monthly.RData'), ('US and Massachusetts unemployment', 'https://assets.datacamp.com/production/repositories/647/datasets/eac1e9697aa5fc2d071edb1849e60c0c173e1258/unemployment.RData'), ('US GDP data', 'https://assets.datacamp.com/production/repositories/647/datasets/19ba9cbfe118e9fe4c20bb7878b1f28986d71d7b/us_gdp.RData'), ('Boston-area sports teams data', 'https://assets.datacamp.com/production/repositories/647/datasets/969202749bed4eccca2ecbf95e197bbb0c652bb6/sports.RData')]","['Introduction to R', 'Intermediate R', 'Manipulating Time Series Data in R with xts & zoo']",https://www.datacamp.com/courses/manipulating-time-series-data-in-r-case-studies,Case Studies,R
202,Marketing Analytics in R: Choice Modeling,4,17,54,"2,148","4,100",Marketing Analytics in R: Choice Modeling,"Marketing Analytics in R: Choice Modeling
People make choices everyday. They choose products like orange juice or a car, decide who to vote for, and choose how to get to work. Marketers, retailers, product designers, political scientists, transportation planners, sociologists, and many others want to understand what drives these choices. Choice models predict what people will choose as a function of the features of the options available and can be used to make important product design decisions. This course will teach you how to organize choice data, estimate choice models in R and present findings. This course covers both analyses of observed real-world choices and the survey-based approach called conjoint analysis.
Our goal for this chapter is to get you through the entire choice modeling process as quickly as possible, so that you get a broad understanding of what we can do with choice models and how the choice modeling process works. The main idea here is that we can use a choice model to understand how customers' product choices depend on the features of those products. Do sportscar buyers prefer manual transmissions to automatic? By how much? In order to give you an overview, we will skip over many of the details. In later chapters, we will go back and cover important issues in preparing data, specifying and interpreting models and reporting your findings, so that you are fully prepared to use these methods with your own choice data.
There are many different places to get choice data and different ways it can be formatted. In this chapter, we will take data that is provided in several alternative formats and learn how to get it into shape for choice modeling. We will also discuss how you can build a survey to collect your own choice data.
In this chapter, we take deeper dive into estimating choice models. To give you a foundation for thinking about choice models, we will focus on how the multinomial logit model converts the product features into a prediction for what the decision maker will choose. This will give you a framework for making decisions about which features to include in your model.
Different people have different tastes and preferences. This seems intuitively obvious, but there is also extensive research in marketing showing that this is true. This chapter covers choice models where we assume that different decision makers have different preferences that influence their choices. When our models recognize that different consumers have different preferences, they tend to make larger share predictions for niche products that appeal to a subset of consumers. Hierarchical models are used in most commercial choice modeling applications, so it is important to understand how they work.",['Marketing Analytics with R'],"['Elea McDonnell Feit', 'Chester Ismay', 'David Campos', 'Shon Inouye']","[('Sportscar choice dataset', 'https://assets.datacamp.com/production/repositories/2160/datasets/b8de80499f7c2de2b3a048fe8fd4096ec513e3c3/sportscar_choice.zip'), ('Chocolate choice dataset', 'https://assets.datacamp.com/production/repositories/2160/datasets/be7fd252c251494a73617cdbf3374c0c11e6813d/chocolate_choice.zip')]",['Multiple and Logistic Regression'],https://www.datacamp.com/courses/marketing-analytics-in-r-choice-modeling,Probability & Statistics,R
203,Marketing Analytics in R: Statistical Modeling,4,17,60,"6,922","4,200",Marketing Analytics in R: Statistical Modeling,"Marketing Analytics in R: Statistical Modeling
This is your chance to dive into the worlds of marketing and business analytics using R. Day by day, there are a multitude of decisions that companies have to face. With the help of statistical models, you're going to be able to support the business decision-making process based on data, not your gut feeling. Let us show you what a great impact statistical modeling can have on the performance of businesses. You're going to learn about and apply strategies to communicate your results and help them make a difference.
How can you decide which customers are most valuable for your business? Learn how to model the customer lifetime value using linear regression.
Predicting if a customer will leave your business, or churn, is important for targeting valuable customers and retaining those who are at risk. Learn how to model customer churn using logistic regression.
Learn how to model the time to an event using survival analysis. This could be the time until next order or until a person churns.
CRM data can get very extensive. Each metric you collect could carry some interesting information about your customers. But handling a dataset with too many variables is difficult. Learn how to reduce the number of variables in your data using principal component analysis. Not only does this help to get a better understanding of your data. PCA also enables you to condense information to single indices and to solve multicollinearity problems in a regression analysis with many intercorrelated variables.",['Marketing Analytics with R'],"['Verena Pflieger', 'Chester Ismay', 'Nick Solomon']","[('Churn data', 'https://assets.datacamp.com/production/repositories/1861/datasets/fd6acd777bdcbb516b167472ac3d5619582779c4/churn_data.csv'), ('Sales data', 'https://assets.datacamp.com/production/repositories/1861/datasets/c18f1ab90ca134ecd8ab51ed34d285df9fc3059b/salesData.csv'), ('Sales data, months 2-4', 'https://assets.datacamp.com/production/repositories/1861/datasets/cd1d99b917950fe8598a55fec5f884f04765a3aa/salesDataMon2To4.csv'), ('Survival data', 'https://assets.datacamp.com/production/repositories/1861/datasets/284a58cc6f9c5873937a97eca0420b980b5e5b23/survivalDataExercise.csv'), ('Default data', 'https://assets.datacamp.com/production/repositories/1861/datasets/0b0772985a8676c3613e8ac2c6053f5e60a3aebd/defaultData.csv'), ('News data', 'https://assets.datacamp.com/production/repositories/1861/datasets/f91399541c79294165bb8a028db694850f21724c/newsData.RData'), ('First CLV dataset', 'https://assets.datacamp.com/production/repositories/1861/datasets/4c612c0c990fccbd16c695043096aafbb66bd266/clvData1.csv'), ('Second CLV dataset', 'https://assets.datacamp.com/production/repositories/1861/datasets/29ce8bb869c5af2ec627c2df583781025ab0b4db/clvData2.csv')]","['Introduction to the Tidyverse', 'Correlation and Regression']",https://www.datacamp.com/courses/marketing-analytics-in-r-statistical-modeling,Probability & Statistics,R
204,Marketing Analytics in Spreadsheets,4,15,56,151,"4,650",Marketing Analytics in Spreadsheets,"Marketing Analytics in Spreadsheets
Spreadsheets are an essential tool for any marketing professional, but how does one keep these spreadsheets clean and accurate - especially when multiple parties contribute data? Data validation and regular expressions are powerful tools for marketing analysts, but having clean data is only half the battle. After we learn how to clean the data, we will visualize it by building charts! Throughout the course, we will explore a dataset that includes the kind of information you will encounter in the world of digital marketing. We will spot errors in metrics using data validation, use regular expressions to aggregate campaign metrics, build charts to analyze campaign performance, and use everything we've learned to build a dynamic dashboard!
In this chapter, you will explore the data validation options that Google Sheets offers to aid in clean data entry. You will also learn about the Bing and Google Ads paid advertising data you will explore throughout the course. After this chapter, you will be able to create spreadsheets that can be used by any number of people, without having to worry about disorganization.
In the digital marketing world, naming conventions may differ among paid advertising campaigns or ad groups, which poses a problem when the user wants to analyze campaign performance. Regular expressions can help match certain strings, replace parts of strings, or extract a portion of a string. In this chapter, you will learn to use regular expressions, along with Google Sheets' built-in functions REGEXMATCH(), REGEXREPLACE(), and REGEXEXTRACT(), to reorganize and aggregate data with ease.
In this chapter, you will explore Google and Bing Ad campaigns and ad group data. In addition to a refresher on some basic charts, you will explore new ways to use these charts and experiment with the chart editor settings to create both informative and visually appealing charts. You will learn to explain paid advertising data through visualizations, which is an important task in the fast-paced digital advertising world.
In the final chapter, you will be tasked with building a paid advertising dashboard that can be dynamically filtered by both source and campaign name. After completing the chapter, you should be able to tackle almost any data mitigation or dashboard creation project that you, or your boss, may think of!",[],"['Luke Pajer', 'Chester Ismay', 'Amy Peterson']",[],['Intermediate Spreadsheets for Data Science'],https://www.datacamp.com/courses/marketing-analytics-in-spreadsheets,Case Studies,Spreadsheets
205,Merging DataFrames with pandas,4,14,56,"37,458","4,650",Merging DataFrames pandas,"Merging DataFrames with pandas
As a data scientist, you'll often find that the data you need is not in a single file. It may be spread across a number of text files, spreadsheets, or databases. You’ll want to be able to import the data you’re interested in as a collection of DataFrames and combine them to answer your central questions. This course is all about the act of combining—or merging—DataFrames, an essential part of any data scientist's toolbox. You'll hone your pandas skills by learning how to organize, reshape, and aggregate multiple datasets to answer your specific questions.
In this chapter, you'll learn about different techniques you can use to import multiple files into DataFrames. Having imported your data into individual DataFrames, you'll then learn how to share information between DataFrames using their indexes. Understanding how indexes work is essential to merging DataFrames, which you’ll learn later in the course.
You'll learn how to perform database-style operations to combine DataFrames. In particular, you'll learn about appending and concatenating DataFrames while working with a variety of real-world datasets.
You'll learn all about merging pandas DataFrames. You'll explore different techniques for merging, and learn about left joins, right joins, inner joins, and outer joins, as well as when to use which. You'll also learn about ordered merging, which is useful when you want to merge DataFrames with columns that have natural orderings, like date-time columns.
To reinforce your new skills, you'll apply them to an in-depth case study using Olympic medal data. The analysis involves integrating your multi-DataFrame skills from this course and skills you've gained in previous pandas courses. This is a rich dataset that will allow you to fully leverage your pandas data manipulation skills.","['Data Analyst with Python', 'Data Manipulation with Python', 'Data Scientist with Python', 'Python Programmer']","['Team Anaconda', 'Hugo Bowne-Anderson', 'Yashas Roy']","[('Baby names', 'https://assets.datacamp.com/production/repositories/516/datasets/43c9b6bf4c283ab024b2d7d61fbf15a0baa1e44d/Baby names.zip'), ('Summer Olympic medals', 'https://assets.datacamp.com/production/repositories/516/datasets/2d14df8d3c6a1773358fa000f203282c2e1107d6/Summer Olympic medals.zip'), ('Automobile fuel efficiency', 'https://assets.datacamp.com/production/repositories/516/datasets/2f3d8b2156d5669fb7e12137f1c2e979c3c9ce0b/automobiles.csv'), ('Exchange rates', 'https://assets.datacamp.com/production/repositories/516/datasets/e91482db6a7bae394653278e4e908e63ed9ac833/exchange.csv'), ('GDP', 'https://assets.datacamp.com/production/repositories/516/datasets/a0858a700501f88721ca9e4bdfca99b9e10b937f/GDP.zip'), ('Oil prices', 'https://assets.datacamp.com/production/repositories/516/datasets/707566cf46c4dd6290b9029f5e07a92baf3fe3f7/oil_price.csv'), ('Pittsburgh weather data', 'https://assets.datacamp.com/production/repositories/516/datasets/58c1ead59818b2451324e9e84239db7bda6b11d3/pittsburgh2013.csv'), ('Sales', 'https://assets.datacamp.com/production/repositories/516/datasets/2b89c1b00016e1ebcfd7f08a127d2c79589ce5c0/Sales.zip'), ('S&P 500 Index', 'https://assets.datacamp.com/production/repositories/516/datasets/7a9b570a02ef589891d9576a86876a616ca5f3c8/sp500.csv')]","['pandas Foundations', 'Manipulating DataFrames with pandas']",https://www.datacamp.com/courses/merging-dataframes-with-pandas,Data Manipulation,Python
206,Mixture Models in R,4,14,47,"1,809","3,600",Mixture Models in R,"Mixture Models in R
Mixture modeling is a way of representing populations when we are interested in their heterogeneity. Mixture models use familiar probability distributions (e.g. Gaussian, Poisson, Binomial) to provide a convenient yet formal statistical framework for clustering and classification. Unlike standard clustering approaches, we can estimate the probability of belonging to a cluster and make inference about the sub-populations. For example, in the context of marketing, you may want to cluster different customer groups and find their respective probabilities of purchasing specific products to better target them with custom promotions. When applying natural language processing to a large set of documents, you may want to cluster documents into different topics and understand how important each topic is across each document. In this course, you will learn what Mixture Models are, how they are estimated, and when it is appropriate to apply them!
In this chapter, you will be introduced to fundamental concepts in model-based clustering and how this approach differs from other clustering techniques. You will learn the generating process of Gaussian Mixture Models as well as how to visualize the clusters.
In this chapter, you will be introduced to the main structure of Mixture Models, how to address different data with this approach and how to estimate the parameters involved. To accomplish the estimation, you will learn an iterative method called Expectation-Maximization algorithm.
This chapter shows how to fit Gaussian Mixture Models in 1 and 2 dimensions with `flexmix` package. The data used is formed by 10.000 observations of people with their weight, height, body mass index and informed gender.
In this module, you will learn how Mixture Models extends to consider probability distributions different from the Gaussian and how these models are fitted with `flexmix`. The datasets used are handwritten digits images and the number of crimes in Chicago city. For the first dataset you will find clusters that summarize the handwritten digits and for the second dataset, you will find clusters of communities where is more or less dangerous to live in.",['Probability and Distributions with R'],"['Víctor Medina', 'Chester Ismay', 'David Campos', 'Benjamin  Feder', 'Shon Inouye']","[('Chicago Crimes dataset', 'https://assets.datacamp.com/production/repositories/2215/datasets/1774703667f9cfaca3e29ccb784a38f583dc397b/CoC_crimes.csv'), ('Digits dataset', 'https://assets.datacamp.com/production/repositories/2215/datasets/a1bc241deee405e5a74e88bbe0a7ae1341194170/digits.csv'), ('Gender dataset', 'https://assets.datacamp.com/production/repositories/2215/datasets/5658bb05c6963f4c72bae17add0d3c7e5d7a7493/gender.csv')]","['Intermediate R', 'Introduction to the Tidyverse', 'Foundations of Probability in R']",https://www.datacamp.com/courses/mixture-models-in-r,Probability & Statistics,R
207,Model Validation in Python,4,15,47,"1,776","3,700",Model Validation,"Model Validation in Python
Machine learning models are easier to implement now more than ever before. Without proper validation, the results of running new data through a model might not be as accurate as expected. Model validation allows analysts to confidently answer the question, how good is your model? We will answer this question for classification models using the complete set of tic-tac-toe endgame scenarios, and for regression models using fivethirtyeight’s ultimate Halloween candy power ranking dataset. In this course, we will cover the basics of model validation, discuss various validation techniques, and begin to develop tools for creating validated and high performing models.
Before we can validate models, we need an understanding of how to create and work with them. This chapter provides an introduction to running regression and classification models in scikit-learn. We will use this model building foundation throughout the remaining chapters.
This chapter focuses on the basics of model validation. From splitting data into training, validation, and testing datasets, to creating an understanding of the bias-variance tradeoff, we build the foundation for the techniques of K-Fold and Leave-One-Out validation practiced in chapter three.
Holdout sets are a great start to model validation. However, using a single train and test set if often not enough. Cross-validation is considered the gold standard when it comes to validating model performance and is almost always used when tuning model hyper-parameters. This chapter focuses on performing cross-validation to validate model performance.
The first three chapters focused on model validation techniques. In chapter 4 we apply these techniques, specifically cross-validation, while learning about hyperparameter tuning. After all, model validation makes tuning possible and helps us select the overall best model.",[],"['Kasey Jones', 'Chester Ismay', 'Becca Robins']","[('Candy dataset', 'https://assets.datacamp.com/production/repositories/3981/datasets/bdbcfeff5aff20449bad8a8f1e66ae0169b9a26d/candy-data.csv'), ('Tic-Tac-Toe dataset', 'https://assets.datacamp.com/production/repositories/3981/datasets/e6ee6604b9eed121a015a993bfb225ddf656cf81/tic-tac-toe.csv')]",['Supervised Learning with scikit-learn'],https://www.datacamp.com/courses/model-validation-in-python,Machine Learning,Python
208,Modeling with Data in the Tidyverse,4,17,49,"4,676","3,900",Modeling Data in Tidyverse,"Modeling with Data in the Tidyverse
In this course, you will learn to model with data. Models attempt to capture the relationship between an outcome variable of interest and a series of explanatory/predictor variables. Such models can be used for both explanatory purposes, e.g. ""Does knowing professors' ages help explain their teaching evaluation scores?"", and predictive purposes, e.g., ""How well can we predict a house's price based on its size and condition?"" You will leverage your tidyverse skills to construct and interpret such models. This course centers around the use of linear regression, one of the most commonly-used and easy to understand approaches to modeling. Such modeling and thinking is used in a wide variety of fields, including statistics, causal inference, machine learning, and artificial intelligence.
This chapter will introduce you to some background theory and terminology for modeling, in particular, the general modeling framework, the difference between modeling for explanation and modeling for prediction, and the modeling problem. Furthermore, you'll start performing your first exploratory data analysis, a crucial first step before any formal modeling.
Equipped with your understanding of the general modeling framework, in this chapter, we'll cover basic linear regression where you'll keep things simple and model the outcome variable y as a function of a single explanatory/ predictor variable x. We'll use both numerical and categorical x variables. The outcome variable of interest in this chapter will be teaching evaluation scores of instructors at the University of Texas, Austin.
In the previous chapter, you learned about basic regression using either a single numerical or a categorical predictor. But why limit ourselves to using only one variable to inform your explanations/predictions? You will now extend basic regression to multiple regression, which allows for incorporation of more than one explanatory or one predictor variable in your models. You'll be modeling house prices using a dataset of houses in the Seattle, WA metropolitan area.
In the previous chapters, you fit various models to explain or predict an outcome variable of interest. However, how do we know which models to choose? Model assessment measures allow you to assess how well an explanatory model ""fits"" a set of data or how accurate a predictive model is. Based on these measures, you'll learn about criteria for determining which models are ""best"".","['Data Analyst with R', 'Tidyverse Fundamentals with R']","['Albert Y. Kim', 'Chester Ismay', 'Sumedh Panchadhar', 'Benjamin  Feder']",[],['Working with Data in the Tidyverse'],https://www.datacamp.com/courses/modeling-with-data-in-the-tidyverse,Machine Learning,R
209,Multiple and Logistic Regression,4,19,59,"19,713","4,250",Multiple and Logistic Regression,"Multiple and Logistic Regression
In this course you'll take your skills with simple linear regression to the next level. By learning multiple and logistic regression techniques you will gain the skills to model and predict both numeric and categorical outcomes using multiple input variables. You'll also learn how to fit, visualize, and interpret these models. Then you'll apply your skills to learn about Italian restaurants in New York City!
In this chapter you'll learn about the class of linear models called ""parallel slopes models."" These include one numeric and one categorical explanatory variable.
This chapter covers model evaluation. By looking at different properties of the model, including the adjusted R-squared, you'll learn to compare models so that you can select the best one. You'll also learn about interaction terms in linear models.
This chapter will show you how to add two, three, and even more numeric explanatory variables to a linear model.
In this chapter you'll learn about using logistic regression, a generalized linear model (GLM), to predict a binary outcome and classify observations.
Explore the relationship between price and the quality of food, service, and decor for Italian restaurants in NYC.","['Data Scientist with R', 'Statistics Fundamentals with R']","['Ben Baumer', 'Nick Solomon']","[('Average SAT scores by state', 'https://assets.datacamp.com/production/repositories/845/datasets/1a12a19d2cec83ca0b58645689987e2025d91383/SAT.csv'), ('New York City Zagat restaurant reviews', 'https://assets.datacamp.com/production/repositories/845/datasets/639a7a3f9020edb51bcbc4bfdb7b71cbd8b9a70e/nyc.csv')]",['Correlation and Regression'],https://www.datacamp.com/courses/multiple-and-logistic-regression,Probability & Statistics,R
210,Multivariate Probability Distributions in R,4,15,51,"2,915","4,000",Multivariate Probability Distributions in R,"Multivariate Probability Distributions in R
When working with data that contains many variables, we are often interested in studying the relationship between these variables using multivariate statistics. In this course, you'll learn ways to analyze these datasets. You will also learn about common multivariate probability distributions, including the multivariate normal, the multivariate-t, and some multivariate skew distributions. You will then be introduced to techniques for representing high dimensional data in fewer dimensions, including principal component analysis (PCA) and multidimensional scaling (MDS).
In this introduction to multivariate data, you will learn how to read and summarize it. You will learn how to summarize multivariate data using descriptive statistics, such as the mean vector, variance-covariance, and correlation matrices. You'll then explore plotting techniques to provide insights into multivariate data.
This chapter will introduce you to the most important and widely used multivariate probability distribution, the multivariate normal. You will learn how to generate random samples from a multivariate normal distribution and how to calculate and plot the densities and probabilities under this distribution. You will also learn how to test if a dataset follows multivariate normality.
This chapter introduces a host of probability distributions to model non-normal data. In particular, you will be introduced to multivariate t-distributions, which can model heavier tails and are a generalization of the univariate Student's t-distribution. You will be introduced to various skew distributions, which are specifically designed to model data that are right or left skewed.
In the final chapter, you will be introduced to techniques for analyzing high dimensional data, including principal component analysis (PCA) and multidimensional scaling (MDS). You will also learn to implement these techniques by analyzing data.",['Probability and Distributions with R'],"['Surajit Ray', 'Chester Ismay', 'Nick Solomon', 'Amy Peterson']","[('Iris', 'https://assets.datacamp.com/production/repositories/1925/datasets/d11777fac6574637a4ec5f0effeab8542ae88b65/iris.txt'), ('Wine', 'https://assets.datacamp.com/production/repositories/1925/datasets/23b12a9fe40eea74f3a6b27a3d804207f663446d/wine.csv'), ('Birthweight', 'https://assets.datacamp.com/production/repositories/1925/datasets/e07b808b66a78f8fb278d56651aff9b4236b6d07/birthweight.csv')]","['Intermediate R', 'Foundations of Probability in R']",https://www.datacamp.com/courses/multivariate-probability-distributions-in-r,Probability & Statistics,R
211,Natural Language Processing Fundamentals in Python,4,15,51,"37,218","3,750",Natural Language Processing Fundamentals,"Natural Language Processing Fundamentals in Python
In this course, you'll learn natural language processing (NLP) basics, such as how to identify and separate words, how to extract topics in a text, and how to build your own fake news classifier. You'll also learn how to use basic libraries such as NLTK, alongside libraries which utilize deep learning to solve common NLP problems. This course will give you the foundation to process and parse text as you move forward in your Python learning.
This chapter will introduce some basic NLP concepts, such as word tokenization and regular expressions to help parse text. You'll also learn how to handle non-English text and more difficult tokenization you might find.
This chapter will introduce you to topic identification, which you can apply to any text you encounter in the wild. Using basic NLP models, you will identify topics from texts based on term frequencies. You'll experiment and compare two simple methods: bag-of-words and Tf-idf using NLTK, and a new library Gensim.
This chapter will introduce a slightly more advanced topic: named-entity recognition. You'll learn how to identify the who, what, and where of your texts using pre-trained models on English and non-English text. You'll also learn how to use some new libraries, polyglot and spaCy, to add to your NLP toolbox.
You'll apply the basics of what you've learned along with some supervised machine learning to build a ""fake news"" detector. You'll begin by learning the basics of supervised machine learning, and then move forward by choosing a few important features and testing ideas to identify and classify fake news articles.",[],"['Katharine Jarmul', 'Hugo Bowne-Anderson', 'Yashas Roy']","[('English stopwords', 'https://assets.datacamp.com/production/repositories/932/datasets/8042ed46ae7faef4951fcda771c5acc4fc3c0bf6/english_stopwords.txt'), ('Monty Python and the Holy Grail', 'https://assets.datacamp.com/production/repositories/932/datasets/4921d0bf6a73fd645f49f528faf74a871bb3a0e9/grail.txt'), ('News articles', 'https://assets.datacamp.com/production/repositories/932/datasets/cd04303b8b2904d1025809dfb29076de696a1ffc/News articles.zip'), ('Wikipedia articles', 'https://assets.datacamp.com/production/repositories/932/datasets/37f1eb18b6b7909cd72996edbf90939f7c7b575f/Wikipedia articles.zip')]","['Introduction to Python', 'Intermediate Python for Data Science', 'Python Data Science Toolbox (Part 1)', 'Python Data Science Toolbox (Part 2)']",https://www.datacamp.com/courses/natural-language-processing-fundamentals-in-python,Machine Learning,Python
212,Network Analysis in Python (Part 1),4,14,50,"42,176","4,100",Network Analysis,"Network Analysis in Python (Part 1)
From online social networks such as Facebook and Twitter to transportation networks such as bike sharing systems, networks are everywhere—and knowing how to analyze them will open up a new world of possibilities for you as a data scientist. This course will equip you with the skills to analyze, visualize, and make sense of networks. You'll apply the concepts you learn to real-world network data using the powerful NetworkX library. With the knowledge gained in this course, you'll develop your network thinking skills and be able to look at your data with a fresh perspective.
In this chapter, you'll be introduced to fundamental concepts in network analytics while exploring a real-world Twitter network dataset. You'll also learn about NetworkX, a library that allows you to manipulate, analyze, and model graph data. You'll learn about the different types of graphs and how to rationally visualize them.
You'll learn about ways to identify nodes that are important in a network. In doing so, you'll be introduced to more advanced concepts in network analysis as well as the basics of path-finding algorithms. The chapter concludes with a deep dive into the Twitter network dataset which will reinforce the concepts you've learned, such as degree centrality and betweenness centrality.
This chapter is all about finding interesting structures within network data. You'll learn about essential concepts such as cliques, communities, and subgraphs, which will leverage all of the skills you acquired in Chapter 2. By the end of this chapter, you'll be ready to apply the concepts you've learned to a real-world case study.
In this final chapter of the course, you'll consolidate everything you've learned through an in-depth case study of GitHub collaborator network data. This is a great example of real-world social network data, and your newly acquired skills will be fully tested. By the end of this chapter, you'll have developed your very own recommendation system to connect GitHub users who should collaborate together.",['Data Scientist with Python'],"['Eric Ma', 'Hugo Bowne-Anderson', 'Yashas Roy']","[('Twitter network', 'https://assets.datacamp.com/production/repositories/580/datasets/64cf6963a7e8005e3771ef3b256812a5797320f0/ego-twitter.p'), ('GitHub users', 'https://assets.datacamp.com/production/repositories/580/datasets/69ada08d5cce7f35f38ffefe8f2291b7cfcd6000/github_users.p')]","['Introduction to Python', 'Intermediate Python for Data Science', 'Python Data Science Toolbox (Part 1)', 'Python Data Science Toolbox (Part 2)']",https://www.datacamp.com/courses/network-analysis-in-python-part-1,Probability & Statistics,Python
213,Network Analysis in Python (Part 2),4,13,46,"6,238","3,850",Network Analysis,"Network Analysis in Python (Part 2)
Have you taken DataCamp's Network Analysis in Python (Part 1) course and are yearning to learn more sophisticated techniques to analyze your networks, whether they be social, transportation, or biological? Then this is the course for you! Herein, you'll build on your knowledge and skills to tackle more advanced problems in network analytics! You'll gain the conceptual and practical skills to analyze evolving time series of networks, learn about bipartite graphs, and how to use bipartite graphs in product recommendation systems. You'll also learn about graph projections, why they're so useful in Data Science, and figure out the best ways to store and load graph data from files. You'll consolidate all of this knowledge in a final chapter case study, in which you'll analyze a forum dataset and come out of this course a Pythonista Network Analyst ninja!
In this chapter, you will learn about bipartite graphs and how they are used in recommendation systems. You will explore the GitHub dataset from the previous course, this time analyzing the underlying bipartite graph that was used to create the graph that you used earlier. Finally, you will get a chance to build the basic components of a recommendation system using the GitHub data!
In this chapter, you will use a famous American Revolution dataset to dive deeper into exploration of bipartite graphs. Here, you will learn how to create the unipartite projection of a bipartite graph, a very useful method for simplifying a complex network for further analysis. Additionally, you will learn how to use matrices to manipulate and analyze graphs - with many computing routines optimized for matrices, you'll be able to analyze many large graphs quickly and efficiently!
In this chapter, you will delve into the fundamental ways that you can analyze graphs that change over time. You will explore a dataset describing messaging frequency between students, and learn how to visualize important evolving graph statistics.
In this chapter, you will apply everything you've learned in the previous three chapters to a forum posting dataset. You will analyze the temporal changes in forum user connectivity patterns, and make visualizations of evolving graph statistics over time.",[],"['Eric Ma', 'Hugo Bowne-Anderson', 'Yashas Roy']","[('American Revolution', 'https://assets.datacamp.com/production/repositories/907/datasets/96aae928f1611cf3fef4d876040ffdbd1eb9d4f3/american-revolution.csv'), ('GitHub', 'https://assets.datacamp.com/production/repositories/907/datasets/bb23b5a929d8113214ef43f6212562335fe35e11/github.p'), ('College forum messages', 'https://assets.datacamp.com/production/repositories/907/datasets/9843a2eee221996f0c5b1a3035a029e28e099e94/uci-forum.p')]","['Introduction to Python', 'Intermediate Python for Data Science', 'Network Analysis in Python (Part 1)']",https://www.datacamp.com/courses/network-analysis-in-python-part-2,Probability & Statistics,Python
214,Network Analysis in R,4,12,50,"7,331","4,000",Network Analysis in R,"Network Analysis in R
In this course you'll learn how to work with and visualize network data. You'll use the igraph package to create networks from edgelists and adjacency matrices. You'll also learn how to plot networks and their attributes. Then, you'll learn how to identify important vertices using measures like betweenness and degree. Next, this course covers network structures, including triangles and cliques. Next, you'll learn how to identify special relationships between vertices, using metrics like assortativity. Finally, you'll see how to create interactive network plots using threejs.
In this chapter, you will be introduced to fundamental concepts in social network analysis. You will learn how to use the igraph R package to explore and analyze social network data as well as learning how to visualize networks.
In this chapter you will learn about directed networks.  You will also learn how to identify key relationships between vertices in a network as well as how to use these relationships to identify important or influential vertices. 
Throughout this chapter you will use a network of measles transmission. The data come from the German city of Hagelloch in 1861. Each directed edge of the network indicates a child becoming infected with measles after coming into contact with an infected child.
This module will show how to characterize global network structures and sub-structures. It will also introduce generating random network graphs.
This chapter will further explore the partitioning of networks into sub-networks and determining which vertices are more highly related to one another than others.  You will also develop visualization methods by creating three-dimensional visualizations.",['Network Analysis with R'],"['James Curley', 'Richie Cotton', 'Nick Solomon']","[('Friendship network data', 'https://assets.datacamp.com/production/repositories/949/datasets/9d00d982149311f5f5c49f4f4e3e5bc6e576c3d6/friends.csv'), ('Friendship network edge data', 'https://assets.datacamp.com/production/repositories/949/datasets/38efe894025e5542a3f3838edc6057e994f39696/friends1_edges.csv'), ('Friendship network node data', 'https://assets.datacamp.com/production/repositories/949/datasets/b17b8379f47e5a17dfccec64732dc17f52746654/friends1_nodes.csv'), ('Forrest Gump network data', 'https://assets.datacamp.com/production/repositories/949/datasets/479e812d5efa6df32dd2730b0118e3ac856475d5/gump.csv'), ('Measles network data', 'https://assets.datacamp.com/production/repositories/949/datasets/7d7d7303138b56e6f1d85678c1ccb9504578f2f7/measles.csv')]",['Intermediate R'],https://www.datacamp.com/courses/network-analysis-in-r,Probability & Statistics,R
215,Network Analysis in R: Case Studies,4,11,47,982,"4,150",Network Analysis in R: Case Studies,"Network Analysis in R: Case Studies
Now that you're familiar with the basics of network analysis it's time to see how to apply those concepts to large real-world data sets. You'll work through three different case studies, each building on your previous work. These case studies are working with the kinds of data you'll see in both academic and industry settings. We'll explore some of the computational and visualization challenges you'll face and how to overcome them. Your knowledge of igraph will continue to grow, but we'll also leverage other visualization libraries that will help you bring your visualizations to the web.
In this chapter you'll explore a subset of an Amazon purchase graph. You'll build on what you've already learned, finding important products and discovering what drives purchases. You'll also examine how graphs can change through time by looking at the graph during different time periods.
In this lesson you'll explore some Twitter data about R by looking at conversations using '#rstats'.  First you'll look at the raw data and think about how you want to build your graph.  There's a number of ways to do this, and we'll cover two ways: retweets and mentions.  You'll build those graphs and then compare them on a number of metrics.
In this chapter you will analyze data from a Chicago bike sharing network.  We will build on the concepts already covered in the introductory course, and add a few new ones to handle graphs with weighted edges. You will also start with data in a slightly more raw form and cover how to build your graph up from a data source you might find.
So far everything we've done has been using plotting from igraph. It provides many powerful ways to plot your graph data.  However many people prefer interacting with other plotting frameworks like ggplot2, or even interactive frameworks like d3.js. In this lesson you'll look at other plotting libraries that build on the ggplot2 framework. You'll also look at other non-""hairball"" type methods like hive plots, as well as building interactive and animated plots.",['Network Analysis with R'],"['Ted Hart', 'Chester Ismay', 'Nick Solomon', 'Benjamin  Feder']","[('Amazon graph', 'https://assets.datacamp.com/production/repositories/1576/datasets/fe80f537276ef9a229f782a6ffdee6910201c4ed/amzn_g.gml'), ('Amazon purchase graph over time', 'https://assets.datacamp.com/production/repositories/1576/datasets/d6d41923348092c7e2699efc803e32fbb42dd4e7/time_graph.rds'), ('Twitter retweet graph', 'https://assets.datacamp.com/production/repositories/1576/datasets/edd5a8859ff65d52f34340fd003ffa53efc2ffd8/rt_g.gml'), ('Twitter mention graph', 'https://assets.datacamp.com/production/repositories/1576/datasets/3616921dcf861543ac08b6da2525d40ca3af0c31/ment_g.gml'), ('Bike sharing data', 'https://assets.datacamp.com/production/repositories/1576/datasets/7a52a18bc881e56fa9f1b6c1bb1c5a8c7034feec/divvy_bike_sample.csv')]",['Network Analysis in R'],https://www.datacamp.com/courses/network-analysis-in-r-case-studies,Probability & Statistics,R
216,Network Science in R: A Tidy Approach,4,12,47,"2,412","3,950",Network Science in R: A Tidy Approach,"Network Science in R: A Tidy Approach
If you've ever wanted to understand more about social networks, information networks, or even the neural networks of our brains, then you need to know network science! It will demonstrate network analysis using several R packages, including dplyr, ggplot2, igraph, ggraph as well as visNetwork. You will take on the role of Interpol Analyst and investigate the terrorist network behind the Madrid train bombing in 2004. Following the course, you will be able to analyse any network with basic centrality and similarity measures and create beautiful and interactive network visualizations.
The challenge in this chapter is to spot the most highly connected terrorists in the network. We will first import the dataset and build the network. Then we will learn how to visualize it in different layouts using ggraph package. Later on, we will compute two basic yet important centrality measures in network science - degree and strength. We will use them to spot highly connected terrorists. We will finally touch two alternative centrality measures, betweenness and closeness.
In this chapter we will spot the most influential ties among terrorists in the network. We will use a centrality measure on ties, called betweenness, and will learn how to visualize the network highlighting connections with high betweenness centrality. Moreover, we will provide some alternative evidence regarding Mark Granovetter's theory of strength of weak ties, confirming that looser connections are crucial as demonstrated in the Madrid terrorism network.
The challenge in this chapter is to discover pairs of similar (and dissimilar) terrorists. We will introduce the adjacency matrix as a mathematical representation of a network and use it to find terrorists with similar connection patterns. We will also learn how to visualize similar and dissimilar pairs of individuals using ggraph.
In this chapter we will discover cells of similar terrorists. We will explore hierarchical clustering to find groups of similar terrorists building on the notion of similarity of connection patterns developed in the previous chapter. Furthermore, we will explore the visNetwork package to produce fulfilling interactive network visualizations.",['Network Analysis with R'],"['Massimo Franceschet', 'Chester Ismay', 'Becca Robins']","[('Nodes of the network', 'https://assets.datacamp.com/production/repositories/2048/datasets/92011bee6be869ca8dfc5a6c44a2febf4b6b73b3/nodes.csv'), ('Ties of the network', 'https://assets.datacamp.com/production/repositories/2048/datasets/afe8ca0e602ec92d73881f1bdf7a02466612c295/ties.csv')]",['Introduction to the Tidyverse'],https://www.datacamp.com/courses/network-science-in-r-a-tidy-approach,Data Visualization,R
217,Nonlinear Modeling in R with GAMs,4,15,50,"2,125","4,050",Nonlinear Modeling in R GAMs,"Nonlinear Modeling in R with GAMs
Generalized Additive Models are a powerful tool for both prediction and inference.  More flexible than linear models, and more understandable than black-box methods, GAMs model relationships in data as nonlinear functions that are highly adaptable to different types of data and data science problems.  In this course, you'll learn how GAMs work and how to construct them with the popular mgcv package.  You'll learn how to interpret, explain and visualize your model results, and how to diagnose and fix model problems. You'll work with data sets that will show you how to apply GAMs to a variety of situations: automobile performance data for building mixed linear and nonlinear models, soil pollution data for building geospatial models, and consumer purchasing data for classification and prediction. By the end of this course, you'll have a toolbox for solving many data science problems.
In this chapter, you will learn how Generalized additive models work and how to use flexible, nonlinear functions to model data without over-fitting.  You will learn to use the gam() function in the mgcv package, and how to build multivariate models that mix nonlinear, linear, and categorical effects to data.
In this chapter, you will take a closer look at the models you fit in chapter 1 and learn how to interpret and explain them.  You will learn how to make plots that show how different variables affect model outcomes.  Then you will diagnose problems in models arising from under-fitting the data or hidden relationships between variables, and how to iteratively fix those problems and get better results.
In this chapter, you will extend the types of models you can fit to those with interactions of multiple variables.  You will fit models of geospatial data by using these interactions to model complex surfaces, and visualize those surfaces in 3D.  Then you will learn about interactions between smooth and categorical variables, and how to model interactions between very different variables like space and time.
In the first three chapters, you used GAMs for regression of continuous outcomes.  In this chapter, you will use GAMs for classification. You will build logistic GAMs to predict binary outcomes like customer purchasing behavior, learn to visualize this new type of model, make predictions, and learn how to explain the variables that influence each prediction.",[],"['DataCamp Content Creator', 'Chester Ismay', 'Rasmus Bååth', 'Sumedh Panchadhar', 'Eunkyung Park']","[('Insurance (csale) data', 'https://assets.datacamp.com/production/repositories/1786/datasets/3e4c79b9eb67139b5469ee87658f0b2fdca2b0f7/csale.rds')]",['Correlation and Regression'],https://www.datacamp.com/courses/nonlinear-modeling-in-r-with-gams,Probability & Statistics,R
218,Object-Oriented Programming in Python,4,14,48,"11,536","3,600",Object-Oriented Programming,"Object-Oriented Programming in Python
Object-oriented programming (OOP) is a powerful programming paradigm that reduces the complexity of systems as is employs heavy re-usability of code. OOP leverages the concept of objects and classes. Objects may contain data (known as attributes, stored as instance or class variables) as well as procedures (defined as methods). Objects are generated from blueprints known as classes, which specify the attributes and methods to be generated at the time of object instantiation. This course introduces the internals of classes and the utilization of objects, as well as important object-oriented programming fundamentals such as inheritance, polymorphism, and composition.
In this chapter we quickly review functions and data structures, as they are building blocks of object-oriented programming. Then we transition into the conceptual definition of objects and classes.
Here, we dive deep into the internals of classes, instantiation of objects, the initialization method, instance variables, class variables, overriding class variables, and methods.
We create fancier classes with sophisticated methods such as importing datasets, getting descriptive statistics, and renaming columns. Also, we cover best practices for creating and documenting classes according to PEP-8.
In this chapter we cover inheritance, which is when we create a class that employs (or 'inherits') all class variables and methods from a 'parent' class. We also cover polymorphism, which is when multiple classes inherit from a single class. Finally, we cover composition, which is when classes employ specific class functionality from other classes without necessarily inheriting from them.",[],"['Vicki Boykis', 'David Campos', 'Shon Inouye']","[('US Life Expectancy', 'https://assets.datacamp.com/production/repositories/2097/datasets/5dd3a8250688a4f08306206fa1d40f63b66bc8a9/us_life_expectancy.csv'), ('mtcars', 'https://assets.datacamp.com/production/repositories/2097/datasets/054e2f80f31a05a79db90f58068523b5a92d1ba1/mtcars.csv')]",['Python Data Science Toolbox (Part 1)'],https://www.datacamp.com/courses/object-oriented-programming-in-python,Programming,Python
219,Object-Oriented Programming in R: S3 and R6,4,18,56,"10,152","4,250",Object-Oriented Programming in R: S3 and R6,"Object-Oriented Programming in R: S3 and R6
Object-oriented programming (OOP) lets you specify relationships between functions and the objects that they can act on, helping you manage complexity in your code. This is an intermediate level course, providing an introduction to OOP, using the S3 and R6 systems. S3 is a great day-to-day R programming tool  that simplifies some of the functions that you write. R6 is especially useful  for industry-specific analyses, working with web APIs, and building GUIs. The course concludes with an interview with Winston Chang, creator of the R6 package.
Learn what object-oriented programming (OOP) consists of, when to use it, and what OOP systems are available in R.  You'll also learn how R identifies different types of variable, using classes, types, and modes.
S3 is a very simple object-oriented system that lets you define different behavior for functions, depending upon their input argument. This chapter explains how to use S3, and how generics and methods work.
Learn how to define R6 classes, and to create R6 objects.  You'll also learn about the structure of R6 classes, and how to separate the user interface from the implementation details.
Learn how to inherit from an R6 class, and how the relationship between parent and child classes works.
Complete your mastery of R6 by learning about advanced topics such as copying by reference, shared fields, cloning objects, and finalizing objects. The chapter concludes with an interview with Winston Chang, creator of the R6 package.",['R Programmer'],['Richie Cotton'],"[('Cooking times (SQLite file)', 'https://assets.datacamp.com/production/repositories/610/datasets/5070c405a88c5640048498ed6d9295f4fbe30ff4/cooking-times.sqlite')]","['Introduction to R', 'Intermediate R']",https://www.datacamp.com/courses/object-oriented-programming-in-r-s3-and-r6,Programming,R
220,Optimizing Python Code with pandas,4,14,45,"2,267","3,500",Optimizing Python Code pandas,"Optimizing Python Code with pandas
The ability to efficiently work with big datasets and extract valuable information is an indispensable tool for every aspiring data scientist. When working with a small amount of data, we often don’t realize how slow code execution can be. This course will build on your knowledge of Python and the pandas library and introduce you to efficient built-in pandas functions to perform tasks faster. Pandas’ built-in functions allow you to tackle the simplest tasks, like targeting specific entries and features from the data, to the most complex tasks, like applying functions on groups of entries, much faster than Python's usual methods. By the end of this course, you will be able to apply a function to data based on a feature value, iterate through big datasets rapidly, and manipulate data belonging to different groups efficiently. You will apply these methods on a variety of real-world datasets, such as poker hands or restaurant tips.
This chapter will give you an overview of why efficient code matters and selecting specific and random rows and columns efficiently.
This chapter shows the usage of the replace() function for replacing one or multiple values using lists and dictionaries.
This chapter presents different ways of iterating through a Pandas DataFrame and why vectorization is the most efficient way to achieve it.
This chapter describes the groupby() function and how we can use it to transform values in place, replace missing values and apply complex functions group-wise.",[],"['Leonidas Souliotis', 'Hillary Green-Lerman', 'Hadrien Lacroix']","[('Poker', 'https://assets.datacamp.com/production/repositories/3832/datasets/c715cfae17d00d26693da8e612cb5fbd64e69589/poker_hand.csv'), ('Popular Baby Names', 'https://assets.datacamp.com/production/repositories/3832/datasets/51d9e4abea83c4b10498a936d2ffbc28c0745b4f/Popular_Baby_Names.csv'), ('Restaurant', 'https://assets.datacamp.com/production/repositories/3832/datasets/5496a658cc7ea792ee295d61937f8b12659884ff/restaurant_data.csv')]",['Manipulating DataFrames with pandas'],https://www.datacamp.com/courses/optimizing-python-code-with-pandas,Programming,Python
221,Optimizing R Code with Rcpp,4,15,52,963,"4,350",Optimizing R Code Rcpp,"Optimizing R Code with Rcpp
R is a great language for data science, but sometimes the code can be slow to run. Combining the comfort of R with the speed of a compiled language
is a great way to reclaim the performance your code deserves. 

C++ is a modern, high performance language that is simple enough to learn
in the context of accelerating R code. With the help of the Rcpp package, 
C++ integrates very neatly with R. You will learn how to create and manipulate
typical R objects (vectors and lists), and write your own C++ functions
to dramatically boost the performance of your R code. 

Writing, benchmarking, and debugging your first C++ code.
Writing functions, controlling the flow with if and else, and learning to use the three kinds of loops in C++.
Manipulate and compute with Rcpp and native C++ vectors.
Use random numbers and write algorithms for applied time series models.",[],"['Team ThinkR', 'Sascha Mayr', 'Sumedh Panchadhar']",[],['Writing Functions in R'],https://www.datacamp.com/courses/optimizing-r-code-with-rcpp,Programming,R
222,Parallel Computing with Dask,4,17,58,"5,084","4,650",Parallel Computing Dask,"Parallel Computing with Dask
Python is now well established as a major platform for data analysis and data science. For many data scientists, the largest limitation of Python is that all data must fit into the resident memory of the available workstation. Further, traditionally, Python has only been able to utilize one CPU. Data scientists constantly ask, ""How can I read and process large amounts of data?"" and ""How can I make use of more computational processing resources?"" This course will introduce you to Dask, a flexible parallel computing library for analytic computing. With Dask, you will be able to take the Python workflows you currently have and easily scale them up to large datasets on your workstation without the need to migrate to a distributed computing environment.

In this chapter you'll learn how to leverage traditional Python techniques for reading and processing large datasets stored in either a single file or in multiple files. Finally, you'll learn how the Dask library can be used to execute a pipeline of Python functions in parallel with the added goal of being able to process large amounts of data on modest computational resources.
For this course, the data set sizes have been reduced so that the exercises can be completed rapidly. Many of these data sets were originally several Gigabytes in size.
In this chapter we'll explore how we can use `dask.array` to read multiple data sources and perform computations with them as a single data array. We'll learn some advanced uses of NumPy arrays when dealing with high dimensional data that also work on Dask arrays. Finally, we'll examine climate patterns in the US from monthly weather data in the US.
The Dask DataFrame is built upon the Pandas DataFrame. Dask provides the ability to scale your Pandas workflows to large data sets stored in either a single file or separated across multiple files. In this chapter you'll learn how to build a pipeline of delayed computation with Dask DataFrame, and you'll use these skills to study how much NYC taxi riders tip their drivers.
Datasets that have not already been standardized and provided as CSV files can be challenging to work with. In this chapter you'll use the Dask Bag to read raw text files and perform simple text processing workflows over large datasets in parallel. Conceptually, the Dask Bag is a parallel list that can store any Python datatype with convenient functions that map over all of the elements.
Now that you've learned how to utilize Dask to read and process large data sets in parallel, you'll put these skills together to search for correlations between flight delays and reported weather events at selected airports. You'll read files in multiple directories containing flight statistics for selected airports from the Bureau of Transportation Statistics and merge them with daily weather data from wunderground.com into a single Dask DataFrame.",[],"['Team Anaconda', 'Hugo Bowne-Anderson', 'Yashas Roy']","[('Congressional bills', 'https://assets.datacamp.com/production/repositories/990/datasets/ac6621a48b021e44eba45caeae9dd0c0745f22a6/congress.zip'), ('Flight delays', 'https://assets.datacamp.com/production/repositories/990/datasets/1f27fdf9f19d027a50f9f56a3f39f5e20ca89256/flightdelays.zip'), ('NYC taxi rides', 'https://assets.datacamp.com/production/repositories/990/datasets/4cec996e51830bb3a85de3c616c172b21a0b2a86/nyctaxi.zip'), ('Presidents (JSON)', 'https://assets.datacamp.com/production/repositories/990/datasets/faf1d936b027253be6e0b5530e0f86a3b9a2952b/presidents.json'), ('State of the Union addresses', 'https://assets.datacamp.com/production/repositories/990/datasets/d959d46b9ff5f657d429c846459f5aaa3162516e/sotu.zip'), ('Texas electricity consumption (HDF5)', 'https://assets.datacamp.com/production/repositories/990/datasets/aa27166887027e6d7f5aae70f1e61d99dd3b235c/Texas.zip'), ('WDI (World Development Indicators)', 'https://assets.datacamp.com/production/repositories/990/datasets/6434827abc41259cef581f06e7a6dbd6a04c095c/WDI.zip'), ('Weather', 'https://assets.datacamp.com/production/repositories/990/datasets/54ff980659dca4cbcecde2b50fc1d6062dc18e09/weatherdata.zip')]","['pandas Foundations', 'Manipulating DataFrames with pandas']",https://www.datacamp.com/courses/parallel-computing-with-dask,Data Manipulation,Python
223,Parallel Programming in R,4,15,55,"2,342","4,350",Parallel Programming in R,"Parallel Programming in R
With an increasing amount of data and more complex algorithms available to scientists and practitioners today, parallel processing is almost always a must, and in fact, is expected in packages implementing time-consuming methods. This course introduces you to concepts and tools available in R for parallel computing and provides solutions to a few important non-trivial issues in parallel processing like reproducibility, generating random numbers and load balancing.
In order to take advantage of parallel environment, the application needs to be split into pieces. In this introductory chapter, you will learn about different ways of partitioning and how it fits different hardware configurations. You will also be introduced to various R packages that support parallel programming.
This chapter will dive deeper into the parallel package. You'll learn about the various backends and their differences and get a deep understanding about the workhorse of the package, namely the clusterApply() function. Strategies for task segmentation including their pitfalls will also be discussed.
In this chapter, you will look at two user-contributed packages, namely foreach and future.apply, which make parallel programming in R even easier. They are built on top of the parallel and future packages. In the last lesson of this chapter, you will learn about the advantages and pitfalls of load balancing and scheduling.
Now you might ask, can I reproduce my results if the application uses random numbers? Can I generate the same results regardless of if the code runs sequentially or in parallel? This chapter will answer these questions. You will learn about a random number generator well suited to a parallel environment and how the various packages make use of it.","['Big Data with R', 'R Programmer', 'R Programming']","['Hana Sevcikova', 'Richie Cotton', 'Benjamin  Feder']","[(""Words (Jane Austen's 6 books)"", 'https://assets.datacamp.com/production/repositories/1722/datasets/9f329fad6d0e6460748030916f2c117b77267e92/jane_austen_words.RDS'), ('US migration', 'https://assets.datacamp.com/production/repositories/1722/datasets/fb0a1ccc435929f324636c6b7d130af055038032/USmigrationAR1.csv'), ('SOTU (2016)', 'https://assets.datacamp.com/production/repositories/1722/datasets/c92a8384ff7a6997a2c537122929fcde5f8fb407/obama.txt')]",['Writing Efficient R Code'],https://www.datacamp.com/courses/parallel-programming-in-r,Programming,R
224,Pivot Tables with Spreadsheets,4,13,54,"10,784","4,150",Pivot Tables Spreadsheets,"Pivot Tables with Spreadsheets
Working with large quantities of data in spreadsheets can be difficult and time-consuming. Have you ever wished there was a quick and efficient way to organize and evaluate your data within seconds? Pivot Tables are your answer! In this course we will explore the world of Pivot Tables within Google Sheets, and learn how to quickly organize thousands of datapoints with just a few clicks of the mouse. We will analyze the Average rainfall across multiple US cities, the Top 10 of the Fortune Global 500, and a selection of Films released between 2010 and 2016. You will learn techniques such as sorting, subtotaling, and filtering your data using these real world examples. By the end of the course, you will be able to create your own custom pivot tables with datasets of any size!
This chapter will discuss what a pivot table is, how it works, and will lead you through the steps of creating and setting up your first pivot table.
This chapter discuss how a pivot table works, and how it deals with the underlying data.
This chapter will discuss more advanced options that will help you to get the most of your pivot tables.
This chapter will walk you through modifying data in a pivot table, and troubleshooting various issues you may encounter.",[],"['Frank Sumanski', 'Richie Cotton', 'Amy Peterson']",[],['Intermediate Spreadsheets for Data Science'],https://www.datacamp.com/courses/pivot-tables-with-spreadsheets,Data Manipulation,Spreadsheets
225,PostgreSQL Functions for Manipulating Data,4,13,50,286,"4,200",PostgreSQL Functions Manipulating Data,"PostgreSQL Functions for Manipulating Data
This course will provide you an understanding of how to use built-in PostgreSQL functions in your SQL queries to manipulate different types of data including strings, character, numeric and date/time. We'll travel back to a time where Blockbuster video stores were on every corner and if you wanted to
watch a movie, you actually had to leave your house to rent a DVD! You'll also get an introduction into the robust full-text search capabilities which provides a powerful tool for indexing and matching keywords in a PostgreSQL document. And finally, you'll learn how to extend these features by using PostgreSQL extensions.
Learn about the properties and characteristics of common data types including strings, numerics and arrays and how to retrieve information about your database.
Explore how to manipulate and query date and time objects including how to use the current timestamp in your queries, extract subfields from existing date and time fields and what to expect when you perform date and time arithmetic.
Learn how to manipulate string and text data by transforming case, parsing and truncating text and extracting substrings from larger strings.
An introduction into some more advanced capabilities of PostgreSQL like full-text search and extensions.",[],"['Brian Piccolo', 'Marianna Lamnina']","[('Sakila schema', 'https://assets.datacamp.com/production/repositories/4471/datasets/3956cbd703315b2ebfea2dcbdfc36c8f6ce6cc8a/postgres-sakila-incremental-schema_06172019.sql')]",['Joining Data in SQL'],https://www.datacamp.com/courses/postgresql-functions-for-manipulating-data,Data Manipulation,SQL
226,Predicting Customer Churn in Python,4,13,45,"1,945","3,550",Predicting Customer Churn,"Predicting Customer Churn in Python

Churn is when a customer stops doing business or ends a relationship with a company. It’s a common problem across a variety of industries, from telecommunications to cable TV to SaaS, and a company that can predict churn can take proactive action to retain valuable customers and get ahead of the competition.  This course will provide you a roadmap to create your own customer churn models. You’ll learn how to explore and visualize your data, prepare it for modeling, make predictions using machine learning, and communicate important, actionable insights to stakeholders. By the end of the course, you’ll become comfortable using the pandas library for data analysis and the scikit-learn library for machine learning.
Begin exploring the Telco Churn Dataset using pandas to compute summary statistics and Seaborn to create attractive visualizations.
Having explored your data, it's now time to preprocess it and get it ready for machine learning. Learn the why, what, and how of preprocessing, including feature selection and feature engineering.
With your data preprocessed and ready for machine learning, it's time to predict churn! Learn how to build supervised learning machine models in Python using scikit-learn.
Learn how to improve the performance of your models using hyperparameter tuning and gain a better understanding of the drivers of customer churn that you can take back to the business.",[],"['Mark Peterson', 'Lore Dirick', 'Yashas Roy']","[('Telco Churn Dataset', 'https://assets.datacamp.com/production/repositories/1764/datasets/79c5446a4a753e728e32b4a67138344847b8f131/Churn.csv')]",['Manipulating DataFrames with pandas'],https://www.datacamp.com/courses/predicting-customer-churn-in-python,Case Studies,Python
227,Predictive Analytics using Networked Data in R,4,14,56,"1,792","4,300",Predictive Analytics using Networked Data in R,"Predictive Analytics using Networked Data in R
In this course, you will learn to perform state-of-the art predictive analytics using networked data in R.  The aim of network analytics is to predict to which class a network node belongs, such as churner or not, fraudster or not, defaulter or not, etc.  To accomplish this, we discuss how to leverage information from the network and its underlying structure in a predictive way.  More specifically, we introduce the idea of featurization such that network features can be added to non-network features as such boosting the performance of any resulting analytical model.  In this course, you will use the igraph package to generate and label a network of customers in a churn setting and learn about the foundations of network learning. Then, you will learn about homophily, dyadicity and heterophilicty, and how these can be used to get key exploratory insights in your network. Next, you will use the functionality of the igraph package to compute various network features to calculate both node-centric as well as neighbor based network features.  Furthermore, you will use the Google PageRank algorithm to compute network features and empirically validate their predictive power. Finally, we teach you how to generate a flat dataset from the network and analyze it using logistic regression and random forests.
In this chapter you will be introduced to labelled networks, network learning and the challanges that can arise.
In this chapter you will learn about homophily and how to compute the two measures that can be used to characterice it, dyadicity and heterophilicty.
In this chapter you will use the igraph package to compute various network features and add them to the network.
In this chapter you will use the network from Chapter 3 to create a flat dataset. Using standard data mining techniques, you will build predictive models and measure their performance with AUC and top decile lift.",['Network Analysis with R'],"['Maria Oskarsdottir', 'Bart Baesens', 'David Campos', 'Shon Inouye', 'Chester Ismay']","[('Student Customers dataset', 'https://assets.datacamp.com/production/repositories/2073/datasets/1a31b28a3370b665299c2d357afc081281cdafda/StudentCustomers.RData'), ('Student Edge List dataset', 'https://assets.datacamp.com/production/repositories/2073/datasets/0056ee79c1bff1eeedd6ac9b3b18d8a16286887b/StudentEdgelist.RData'), ('Student Network dataset', 'https://assets.datacamp.com/production/repositories/2073/datasets/adadd462bb8546ca3f106af9c646b15ea8d33ead/StudentNetwork.RData')]","['Network Analysis in R', 'Supervised Learning in R: Classification']",https://www.datacamp.com/courses/predictive-analytics-using-networked-data-in-r,Probability & Statistics,R
228,Preparing for Coding Interview Questions in Python,4,16,61,"2,445","5,050",Preparing Coding Interview Questions,"Preparing for Coding Interview Questions in Python
Coding interviews can be challenging. You might be asked questions to test your knowledge of a programming language. On the other side, you can be given a task to solve in order to check how you think. And when you are interviewed for a data scientist position, it's likely you can be asked on the corresponding tools available for the language. In either of the cases, to get a cool position as a data scientist, you need to do a little work to perform the best. That's why it's very important to practice in order to prove your expertise! This course serves as a guide for those who just start their path to become a professional data scientist and as a refresher for those who seek for other opportunities. We'll go through fundamental as well as advanced topics that aim to prepare you for a coding interview in Python. Since it is not a normal step-by-step course, some exercises can be quite complex. But who said that interviews are easy to pass, right?
In this chapter, we'll refresh our knowledge of the main data structures used in Python.  We'll cover how to deal with lists, tuples, sets, and dictionaries.  We'll also consider strings and how to write regular expressions to retrieve specific character sequences from a given text.
This chapter focuses on iterable objects. We'll refresh the definition of iterable objects and explain, how to identify one.  Next, we'll cover list comprehensions, which is a very special feature of Python programming language to define lists.  Then, we'll recall how to combine several iterable objects into one. Finally, we'll cover how to create custom iterable objects using generators.
This chapter will focus on the functional aspects of Python. We'll start by defining functions with a variable amount of positional as well as keyword arguments.  Next, we'll cover lambda functions and in which cases they can be helpful. Especially, we'll see how to use them with such functions as map(), filter(), and reduce(). Finally, we'll recall what is recursion and how to correctly implement one.
This chapter will cover topics on scientific computing in Python. We'll start by explaining the difference between NumPy arrays and lists. We'll define why the former ones suit better for complex calculations. Next, we'll cover some useful techniques to manipulate with pandas DataFrames. Finally, we'll do some data visualization using scatterplots, histograms, and boxplots.",[],"['Kirill Smirnov', 'Hadrien Lacroix', 'Hillary Green-Lerman']","[('Diabetes', 'https://assets.datacamp.com/production/repositories/4298/datasets/203ff6d8dd61bede3cbac6fffdd77a222177f478/diabetes.csv'), ('Exams', 'https://assets.datacamp.com/production/repositories/4298/datasets/d77902052165a763cc4bc84e2457e1f41e80d233/exams.csv'), ('Heroes', 'https://assets.datacamp.com/production/repositories/4298/datasets/58a461693fbca1f972a11024c3873004b102160c/heroes_information.csv'), ('Retinol', 'https://assets.datacamp.com/production/repositories/4298/datasets/d5ac8c46a84933e937927fba352f27d0b6b6291a/retinol.csv')]","['Python Data Science Toolbox (Part 2)', 'Manipulating DataFrames with pandas']",https://www.datacamp.com/courses/preparing-for-coding-interview-questions-in-python,Programming,Python
229,Preparing for Statistics Interview Questions in Python,4,15,46,"3,932","3,700",Preparing Statistics Interview Questions,"Preparing for Statistics Interview Questions in Python
Are you looking to land that next job or hone your statistics interview skills to stay sharp? Get ready to master classic interview concepts ranging from conditional probabilities to A/B testing to the bias-variance tradeoff, and much more! You’ll work with a diverse collection of datasets including web-based experiment results and Australian weather data. Following the course, you’ll be able to confidently walk into your next interview and tackle any statistics questions with the help of Python!
This chapter kicks the course off by reviewing conditional probabilities, Bayes' theorem, and central limit theorem. Along the way, you will learn how to handle questions that work with commonly referenced probability distributions.
In this chapter, you will prepare for statistical concepts related to exploratory data analysis. The topics include descriptive statistics, dealing with categorical variables, and relationships between variables.  The exercises will prepare you for an analytical assessment or stats-based coding question.
Prepare to dive deeper into crucial concepts regarding experiments and testing by reviewing confidence intervals, hypothesis testing, multiple tests, and the role that power and sample size play. We'll also discuss types of errors, and what they mean in practice.
Wrapping up, we'll address concepts related closely to regression and classification models. The chapter begins by reviewing fundamental machine learning algorithms and quickly ramps up to model evaluation, dealing with special cases, and the bias-variance tradeoff.",[],"['Conor Dewey', 'Mona Khalil', 'Amy Peterson']",[],"['Intermediate Python for Data Science', 'Statistical Thinking in Python (Part 1)']",https://www.datacamp.com/courses/preparing-for-statistics-interview-questions-in-python,Probability & Statistics,Python
230,Preprocessing for Machine Learning in Python,4,20,62,"6,804","4,700",Preprocessing Machine Learning,"Preprocessing for Machine Learning in Python
This course covers the basics of how and when to perform data preprocessing. This essential step in any machine learning project is when you get your data ready for modeling. Between importing and cleaning your data and fitting your machine learning model is when preprocessing comes into play. You'll learn how to standardize your data so that it's in the right form for your model, create new features to best leverage the information in your dataset, and select the best features to improve your model fit. Finally, you'll have some practice preprocessing by getting a dataset on UFO sightings ready for modeling.
In this chapter you'll learn exactly what it means to preprocess data. You'll take the first steps in any preprocessing journey, including exploring data types and dealing with missing data.
This chapter is all about standardizing data. Often a model will make some assumptions about the distribution or scale of your features. Standardization is a way to make your data fit these assumptions and improve the algorithm's performance.
In this section you'll learn about feature engineering. You'll explore different ways to create new, more useful, features from the ones already in your dataset. You'll see how to encode, aggregate, and extract information from both numerical and textual features.
This chapter goes over a few different techniques for selecting the most important features from your dataset. You'll learn how to drop redundant features, work with text vectors, and reduce the number of features in your dataset using principal component analysis (PCA).
Now that you've learned all about preprocessing you'll try these techniques out on a dataset that records information on UFO sightings.",[],"['Sarah Guido', 'Nick Solomon', 'Kara Woo']","[('Hiking data', 'https://assets.datacamp.com/production/repositories/1816/datasets/4f26c48451bdbf73db8a58e226cd3d6b45cf7bb5/hiking.json'), ('Wine data', 'https://assets.datacamp.com/production/repositories/1816/datasets/9bd5350dfdb481e0f94eeef6acf2663452a8ef8b/wine_types.csv'), ('UFO sightings data', 'https://assets.datacamp.com/production/repositories/1816/datasets/a5ebfe5d2ed194f2668867603b563963af4769e9/ufo_sightings_large.csv'), ('Volunteering data', 'https://assets.datacamp.com/production/repositories/1816/datasets/668b96955d8b252aa8439c7602d516634e3f015e/volunteer_opportunities.csv')]","['Cleaning Data in Python', 'Supervised Learning with scikit-learn']",https://www.datacamp.com/courses/preprocessing-for-machine-learning-in-python,Machine Learning,Python
231,Probability Puzzles in R,4,13,45,826,"3,750",Probability Puzzles in R,"Probability Puzzles in R
Do you want to take your probability skills to the next level? This course will help get you there, using problem-based learning with probability puzzles as the framework. As you are guided through their solutions, you will gain coding tools and general strategies for solving probability problems that you might encounter in many other situations. Organized by theme, the course begins with classic problems like the Birthday Problem and Monty Hall, and ends with puzzles that involve poker like Texas Hold'em and the World Series of Poker!
This chapter will introduce some basic principles that will be used throughout the course, such as writing loops and functions. Then, we dive into a couple of classic problems: the Birthday Problem, and Monty Hall.
In this chapter, we explore games in which dice are rolled, including Yahtzee, Settlers of Catan, and Craps. You will learn tools such as using built-in R functions to calculate combinatorics, and using functions such as replicate and the %in% operator.
The puzzles in this chapter were inspired by ideas encountered on the internet. In order to solve them, you will learn to combine tools such as nested for loops, and the functions round, identical, and sapply.
This chapter explores questions in poker, including the most often televised version of Texas Hold'em. We will learn to code for win probabilities with any given number of outs, and also explore a more theoretical model of poker known as the von Neumann model. We will learn to use functions such as Reduce, runif, and ifelse.",['Probability and Distributions with R'],"['Peter Chi', 'Chester Ismay', 'Amy Peterson']",[],"['Intermediate R', 'Foundations of Probability in R']",https://www.datacamp.com/courses/probability-puzzles-in-r,Probability & Statistics,R
232,Python Data Science Toolbox (Part 1),3,12,46,"142,559","3,650",Python Data Science Toolbox,"Python Data Science Toolbox (Part 1)
It's time to push forward and develop your Python chops even further. There are tons of fantastic functions in Python and its library ecosystem. However, as a data scientist, you'll constantly need to write your own functions to solve problems that are dictated by your data. You will learn the art of function writing in this first Python Data Science Toolbox course. You'll come out of this course being able to write your very own custom functions, complete with multiple parameters and multiple return values, along with default arguments and variable-length arguments. You'll gain insight into scoping in Python and be able to write lambda functions and handle errors in your function writing practice. And you'll wrap up each chapter by using your new skills to write functions that analyze Twitter DataFrames.
In this chapter, you'll learn how to write simple functions, as well as functions that accept multiple arguments and return multiple values. You'll also have the opportunity to apply these new skills to questions commonly encountered by data scientists.
In this chapter, you'll learn to write functions with default arguments so that the user doesn't always need to specify them, and variable-length arguments so they can pass an arbitrary number of arguments on to your functions. You'll also learn about the essential concept of scope.
Learn about lambda functions, which allow you to write functions quickly and on the fly. You'll also practice handling errors in your functions, which is an essential skill. Then, apply your new skills to answer data science questions.","['Data Analyst with Python', 'Data Scientist with Python', 'Python Programmer', 'Python Programming']","['Hugo Bowne-Anderson', 'Francisco Castro']","[('Tweets', 'https://assets.datacamp.com/production/repositories/463/datasets/82e9842c09ad135584521e293091c2327251121d/tweets.csv')]","['Introduction to Python', 'Intermediate Python for Data Science']",https://www.datacamp.com/courses/python-data-science-toolbox-part-1,Programming,Python
233,Python Data Science Toolbox (Part 2),4,12,46,"88,981","3,800",Python Data Science Toolbox,"Python Data Science Toolbox (Part 2)
In this second Python Data Science Toolbox course, you'll continue to build your Python data science skills. First, you'll learn about iterators, objects you have already encountered in the context of for loops. You'll then learn about list comprehensions, which are extremely handy tools for all data scientists working in Python. You'll end the course by working through a case study in which you'll apply all the techniques you learned in both parts of this course.
You'll learn all about iterators and iterables, which you have already worked with when writing for loops. You'll learn some handy functions that will allow you to effectively work with iterators. And you’ll finish the chapter with a use case that is pertinent to the world of data science and dealing with large amounts of data—in this case, data from Twitter that you will load in chunks using iterators.
In this chapter, you'll build on your knowledge of iterators and be introduced to list comprehensions, which allow you to create complicated lists—and lists of lists—in one line of code! List comprehensions can dramatically simplify your code and make it more efficient, and will become a vital part of your Python data science toolbox. You'll then learn about generators, which are extremely helpful when working with large sequences of data that you may not want to store in memory, but instead generate on the fly.
This chapter will allow you to apply your newly acquired skills toward wrangling and extracting meaningful information from a real-world dataset—the World Bank's World Development Indicators. You'll have the chance to write your own functions and list comprehensions as you work with iterators and generators to solidify your Python data science chops.","['Data Scientist with Python', 'Python Programmer', 'Python Programming']","['Hugo Bowne-Anderson', 'Yashas Roy', 'Francisco Castro']","[('Tweets', 'https://assets.datacamp.com/production/repositories/464/datasets/82e9842c09ad135584521e293091c2327251121d/tweets.csv'), ('World Bank World Development Indicators', 'https://assets.datacamp.com/production/repositories/464/datasets/2175fef4b3691db03449bbc7ddffb740319c1131/world_ind_pop_data.csv')]","['Introduction to Python', 'Intermediate Python for Data Science', 'Python Data Science Toolbox (Part 1)']",https://www.datacamp.com/courses/python-data-science-toolbox-part-2,Programming,Python
234,Python for MATLAB Users,4,15,51,"2,339","4,200",Python MATLAB Users,"Python for MATLAB Users
Python is a versatile programming language that is becoming more and more popular for doing data science. Companies worldwide are using Python to harvest insights from their data and get a competitive edge. This course focuses on helping Matlab users learn to use Python specifically for data science. You will quickly learn how to migrate from Matlab to Python for data analysis and visualization. Learn the fundamentals of Python syntax, how to use numpy arrays to store and manipulate data. You will learn how to use matplotlib to discover trends, correlations, and patterns in real datasets, including bicycle traffic in the city of Seattle and avocado prices across the United States.
This chapter gets you started moving from MATLAB to Python. You'll learn about some of the similarities and differences between MATLAB and Python, how to use methods and packages, and be introduced to the popular NumPy package.
In this chapter, you will build on your new NumPy knowledge. You will dive into NumPy arrays, the Python analog to matrices by performing mathematical operations and indexing. You will also begin to explore another important Python data structure, the list, and then round out the chapter by making customs plots of your arrays using Matplotlib.
This chapter introduces some powerful Python data structures: the dictionary and the pandas DataFrame. You will learn to create dictionaries by setting key-value pairs, and view then how to view and modify your dictionary. Then you will be introduced to one of the most important packages in the Pythonista's toolbox, pandas. Specifically, you'll focus on the pandas' structure, the DataFrame, which organizes tabular data in an easily accessible way. Lastly, you'll learn how to transform different data types into DataFrames to make your data easier to work with.
You'll finish the course by controlling your  Python flow. You will learn how to iterate through different Python data structures using for loops. You will also learn about Python contingencies using if, else, and elif and the Python comparison operators (greater than, less than, etc.) that will decide which lines of your code will be executed. Lastly, you'll circle back to NumPy arrays by using Python comparison operators to filter your arrays.",[],"['Justin Kiggins', 'Mari Nazary', 'Becca Robins']","[('FAA Wildlife Strikes', 'https://assets.datacamp.com/production/repositories/2408/datasets/355fc2a3bcb92422a58d10afab25e8e28813a7f1/FAA_wildlife_strikes_CA.csv'), ('Fremont Bridge - hourly traffic', 'https://assets.datacamp.com/production/repositories/2408/datasets/bec1ffaffb41b92ad8612a76f7d2540d3dd8b229/fremont_bridge_hourly_weekdays.csv'), ('Fremont Bridge - daily traffic', 'https://assets.datacamp.com/production/repositories/2408/datasets/9eeea64ad3021ec139ab1134d910e809e372a31a/fremont_bridge_average_daily_by_month.csv'), ('Animal taxonomy', 'https://assets.datacamp.com/production/repositories/2408/datasets/b3f8ff5a2789a262b67ddd4ed21c3fe9ef7f4ffb/anage_data.txt'), ('Historical avocado data', 'https://assets.datacamp.com/production/repositories/2408/datasets/f9b7ed631da00b07bf73d22df795fbd7f1fb3413/avocado.csv')]",[],https://www.datacamp.com/courses/python-for-matlab-users,Programming,Python
235,Python for R Users,5,15,57,"6,403","4,950",Python R Users,"Python for R Users
Python and R have seen immense growth in popularity in the ""Machine Learning Age"". They both are high-level languages that are easy to learn and write. The language you use will depend on your background and field of study and work. R is a language made by and for statisticians, whereas Python is a more general purpose programming language. Regardless of the background, there will be times when a particular algorithm is implemented in one language and not the other, a feature is better documented, or simply, the tutorial you found online uses Python instead of R.

In either case, this would require the R user to work in Python to get his/her work done, or try to understand how something is implemented in Python for it to be translated into R. This course helps you cross the R-Python language barrier.
Learn about some of the most important data types (integers, floats, strings, and booleans) and data structures (lists, dictionaries, numpy arrays, and pandas DataFrames) in Python and how they compare to the ones in R.
This chapter covers control flow statements (if-else if-else), for loops and shows you how to write your own functions in Python!
In this chapter you will learn more about one of the most important Python libraries, Pandas. In addition to DataFrames, pandas provides several data manipulation functions and methods.
You will learn about the rich ecosystem of visualization libraries in Python. This chapter covers matplotlib, the core visualization library in Python along with the pandas and seaborn libraries.
As a final capstone, you will apply your Python skills on the NYC Flights 2013 dataset.",[],"['Daniel Chen', 'Hugo Bowne-Anderson', 'Eunkyung Park', 'Sumedh Panchadhar']","[('Air quality', 'https://assets.datacamp.com/production/repositories/1361/datasets/c16448e3f4219f900f540c455fdf87b0f3da70e0/airquality.csv'), ('Country timeseries', 'https://assets.datacamp.com/production/repositories/1361/datasets/6da83b3d2017245217d35989960184234a6c4e7f/country_timeseries.csv'), ('Flights (sample)', 'https://assets.datacamp.com/production/repositories/1361/datasets/bd8ec150d4ce6996e437a64fc49bdbf708f579e2/flights_sample.csv'), ('Inflammation', 'https://assets.datacamp.com/production/repositories/1361/datasets/f7745f913c209b115aadbb83d276cc8d79fc9904/Inflammation.zip'), ('NYC Flights 2013', 'https://assets.datacamp.com/production/repositories/1361/datasets/df9b795088ce17100c881152f2031c2e09e0720e/NYC Flights 2013.zip'), ('Tips', 'https://assets.datacamp.com/production/repositories/1361/datasets/5496a658cc7ea792ee295d61937f8b12659884ff/tips.csv')]",['Writing Functions in R'],https://www.datacamp.com/courses/python-for-r-users,Programming,Python
236,Python for Spreadsheet Users,4,13,48,"11,941","3,900",Python Spreadsheet Users,"Python for Spreadsheet Users
Are you looking for a better solution than the one you’ve built in a spreadsheet? If so, then Python for Spreadsheet Users is a great introduction to the Python language, and will put you on the right path towards automating repetitive work, diving deeper into your data, and widening the scope of what you are capable of accomplishing. Throughout the course, we’ll draw parallels to common spreadsheet functions and techniques, so you’ll always have a familiar reference point as you dive head first into Python.
Let's get right into it! In this chapter, you'll become acclimated with Python syntax, loading your data into a Python session, and how to explore and edit this data to answer business questions.
The pivot table is a core tool in the savvy spreadsheet user's arsenal. In this chapter, we'll focus on simply recreating this functionality in Python using some handy DataFrame methods.
This chapter will focus on how to import and manage multiple sheets from a workbook, as well as how to join these sheets together using the Python equivalent of a VLOOKUP: the left join.
Now that you're able to import and manipulate your data in Python, let's shift our focus to visualizing this data so that our insights are easily communicable to others.",[],"['Chris Cardillo', 'Hillary Green-Lerman', 'Amy Peterson']","[('Movie Theater Sales dataset', 'https://assets.datacamp.com/production/repositories/3833/datasets/644f45ae7496de772347e34dedfd106e451b98e3/movie theater sales data.xlsx')]",['Pivot Tables with Spreadsheets'],https://www.datacamp.com/courses/python-for-spreadsheet-users,Programming,Python
237,Quantitative Risk Management in R,5,18,55,"5,665","4,350",Quantitative Risk Management in R,"Quantitative Risk Management in R
In Quantitative Risk Management (QRM), you will build models to understand the risks of financial portfolios. This is a vital task across the banking, insurance and asset management industries. The first step in the model building process is to collect data on the underlying risk factors that affect portfolio value and analyze their behavior. In this course, you will learn how to work with risk-factor return series, study the empirical properties or so-called ""stylized facts"" of these data - including their typical non-normality and volatility, and make estimates of value-at-risk for a portfolio.
In this chapter, you will learn how to form return series, aggregate them over longer periods and plot them in different ways. You will look at examples using the qrmdata package.
In this chapter, you will learn about graphical and numerical tests of normality, apply them to different datasets, and consider the alternative Student t model.
In this chapter, you will learn about volatility and how to detect it using act plots. You will learn how to apply Ljung-Box tests for serial correlation and estimate cross correlations.
In this chapter, the concept of value-at-risk and simple methods of estimating VaR based on historical simulation are introduced.",['Quantitative Analyst with R'],"['Alexander J. McNeil', 'Lore Dirick']",[],"['Introduction to R for Finance', 'Intermediate R for Finance', 'Manipulating Time Series Data in R with xts & zoo']",https://www.datacamp.com/courses/quantitative-risk-management-in-r,Applied Finance,R
238,R For SAS Users,4,15,56,535,"4,700",R For SAS Users,"R For SAS Users
If you have experience with SAS and want to learn R, this is the course for you. R is FREE (cost) and OPEN (license) and is one of the fastest growing software languages for statistics and data science. This course is a gentle introduction to the R language with every chapter providing a detailed mapping of R functions to SAS procedures highlighting similarities and differences. You will orient yourself in the R environment and discover how to wrangle, visualize, and model data plus customize your output for final presentation. Throughout the course, you will follow a consistent workflow of data quality checking and cleaning, exploring relationships, modeling, and presenting results. You will leave this course with coded examples that provide a template to use immediately with a dataset of your own.
This first chapter will get you oriented into the R programming environment. You'll learn how to get help, load a dataset, and increase functionality by adding packages. You'll begin working with the abalone dataset (through the dplyr package workflow) to get descriptive statistics and create helpful visualizations (using the ggplot2 package).
Now that you are oriented in the R environment, this chapter will advance your understanding of R's versatility working with data objects. You'll learn how to create and modify variables in the abalone data set. Using your ggplot2 visualization skills, you will discover the data errors in the abalone data and then create a final cleaned data set ready for analysis and modeling.
Once your data set has been cleaned, the next step is exploration. In chapter 3 you will learn how to compute descriptive statistics, explore associations (e.g., correlations) among the variables, and perform bi-variate statistical tests (e.g., t-tests and chi-square tests). You will also create graphical visualizations which illustrate the bi-variate associations and group comparison tests.
In this final chapter, you will learn how to work with one of the most versatile data object types in R called a list. These skills will enable you to save and manipulate your output from descriptive statistics, associations, and group comparison computations. You will also learn how to perform ANOVA (analysis of variance) and linear regression in R. All your skills are put to use in the final exercises to create the best models for predicting abalone ages from their sex, size, and weight measurements.",[],"['Melinda Higgins', 'Sascha Mayr', 'Sara Billen']","[('Abalone', 'https://assets.datacamp.com/production/repositories/2299/datasets/5a80498a0bba2da70fd4af7764c3ab4e71e32dda/abalone.csv')]",[],https://www.datacamp.com/courses/r-for-sas-users,Programming,R
239,RNA-Seq Differential Expression Analysis,4,16,44,"2,726","3,150",RNA-Seq Differential Expression Analysis,"RNA-Seq Differential Expression Analysis
RNA-Seq is an exciting next-generation sequencing method used for identifying genes and pathways underlying particular diseases or conditions. As high-throughput sequencing becomes more affordable and accessible to a wider community of researchers, the knowledge to analyze this data is becoming an increasingly valuable skill. Join us in learning about the RNA-Seq workflow and discovering how to identify which genes and biological processes may be important for your condition of interest! We will start the course with a brief overview of the RNA-Seq workflow with an emphasis on differential expression (DE) analysis. Starting with the counts for each gene, the course will cover how to prepare data for DE analysis, assess the quality of the count data, and identify outliers and detect major sources of variation in the data. The DESeq2 R package will be used to model the count data using a negative binomial model and test for differentially expressed genes. Visualization of the results with heatmaps and volcano plots will be performed and the significant differentially expressed genes will be identified and saved.
In this chapter we explore what we can do with RNA-Seq data and why it is exciting. We learn about the different steps and considerations involved in an RNA-Seq workflow.
In this chapter, we perform quality control on the RNA-Seq count data using heatmaps and principal component analysis. We explore the similarity of the samples to each other and determine whether there are any sample outliers.
In this chapter, we execute the differential expression analysis, generate results and identify the differentially expressed genes.
In this final chapter we explore the differential expression results using visualizations, such as heatmaps and volcano plots. We also review the steps in the analysis and summarize the differential expression workflow with DESeq2.",[],"['Mary Piper', 'Sascha Mayr', 'David Campos', 'Shon Inouye']","[('Fibrosis raw counts dataset', 'https://assets.datacamp.com/production/repositories/1766/datasets/bf1d0eff910f1b2cad36e5acdc2a182e95c63965/fibrosis_smoc2_rawcounts_unordered.csv')]","['Introduction to Bioconductor', 'Data Visualization with ggplot2 (Part 1)']",https://www.datacamp.com/courses/rna-seq-differential-expression-analysis,Other,R
240,Regular Expressions in Python,4,15,54,"3,157","4,650",Regular Expressions,"Regular Expressions in Python
As a data scientist, you will encounter many situations where you will need to extract key information from huge corpora of text, clean messy data containing strings, or detect and match patterns to find useful words. All of these situations are part of text mining and are an important step before applying machine learning algorithms. This course will take you through understanding compelling concepts about string manipulation and regular expressions. You will learn how to split strings, join them back together, interpolate them, as well as detect, extract, replace, and match strings using regular expressions. On the journey to master these skills, you will work with datasets containing movie reviews or streamed tweets that can be used to determine opinion, as well as with raw text scraped from the web.
Start your journey into the regular expression world! From slicing and concatenating, adjusting the case, removing spaces, to finding and replacing strings. You will learn how to master basic operation for string manipulation using a movie review dataset.
Following your journey, you will learn the main approaches that can be used to format or interpolate strings in python using a dataset containing information scraped from the web. You will explore the advantages and disadvantages of using positional formatting, embedding expressing inside string constants, and using the Template class.
Time to discover the fundamental concepts of regular expressions! In this key chapter, you will learn to understand the basic concepts of regular expression syntax. Using a real dataset with tweets meant for sentiment analysis, you will learn how to apply pattern matching using normal and special characters, and greedy and lazy quantifiers.
In the last step of your journey, you will learn more complex methods of pattern matching using parentheses to group strings together or to match the same text as matched previously. Also, you will get an idea of how you can look around expressions.",[],"['Maria Eugenia Inzaugarat', 'Hillary Green-Lerman', 'Sara Billen']","[('Movie Reviews', 'https://assets.datacamp.com/production/repositories/4510/datasets/5bf483cecc79c428638abb2acb6989bf5e140360/short_movies.csv'), ('Wikipedia Web Page', 'https://assets.datacamp.com/production/repositories/4510/datasets/77d309b724976747ab6b7de76c22ce408e3aaa4c/wikipedia.csv'), ('Sentiment140', 'https://assets.datacamp.com/production/repositories/4510/datasets/19e319827a9c77db36702cf59bf61b1ef039de43/short_tweets.csv')]",['Intermediate Python for Data Science'],https://www.datacamp.com/courses/regular-expressions-in-python,Data Manipulation,Python
241,Reporting in SQL,4,15,54,"6,395","4,450",Reporting in SQL,"Reporting in SQL
Become a master at building complex reports!  In this course, you will apply all the SQL concepts and functions you have learned in previous courses to build out your very own dashboard.  By navigating through an Olympics database, you will become an expert data explorer and learn how to understand novel database quickly and effectively.  Since data is never perfect, you will gain valuable strategies to deal with real-world issues commonly found with SQL, including how to remove data duplication and how to turn messy data into clean, organized reports.  Lastly, you’ll conquer complex calculations using window functions and layered calculations, all within the same report.  This is a perfect class for anyone who will be commonly pulling data from databases and is a great complement for those who use R or Python for data science.
Before you can start building out reports to answer specific questions, you should get familiar with the data.  In this chapter, you will learn how to use E:R diagrams and data exploration techniques to get a solid understanding of the data to better answer business-related questions.
Queries can get large,  fast. It's important to take a logical approach when building more complicated queries. In this chapter, you will take a step-by-step approach to plan and build a complex query that requires you to combine tables in multiple ways and create different types of fields.
Although it would be nice, data in the real-world is rarely stored in an ideal way.  Simply put: data can get messy.  In chapter 3, you will learn how to deal with this messy data by fixing data type issues, cleaning messy strings, handling nulls, and removing duplication.
The value of reporting really shows when presenting not-so-obvious insights through complex calculations.  In this chapter, you will learn how to build more complicated fields by leveraging window functions and layered calculations. You will gain hands-on experience building two advanced calculations in particular: the percent of a total calculation and the performance index calculation.",[],"['Tyler Pernes', 'Mona Khalil', 'Sara Billen']","[('Course Database Entity Relationship Diagram', 'https://assets.datacamp.com/production/repositories/3815/datasets/ed6586166b9158f3bc66814cb40b059ace13667d/ER_diagram_pdf.png'), ('Course Database Creation Code', 'https://assets.datacamp.com/production/repositories/3815/datasets/d16b25882bebd353f4fc0cd640fcfcab32bc0ac8/Olympics.sql'), ('Athletes', 'https://assets.datacamp.com/production/repositories/3815/datasets/b68860c77542f92e108d5c09db510df7a3d28b03/athletes.csv'), ('Countries', 'https://assets.datacamp.com/production/repositories/3815/datasets/3f2822876f807bd2a430ff506712e79ba5ae48df/countries.csv'), ('Country Stats', 'https://assets.datacamp.com/production/repositories/3815/datasets/a6f42e47f0e1259ca74b11e3bd6a20bcd1d5b75a/country_stats.csv'), ('Summer Games', 'https://assets.datacamp.com/production/repositories/3815/datasets/26bb9942d1d1e750255c3eaf83383f043c8772fd/summer_games.csv'), ('Winter Games', 'https://assets.datacamp.com/production/repositories/3815/datasets/c6a559e920e509e6f0bcc66f5c278aaa38c95be8/winter_games.csv')]",['Joining Data in SQL'],https://www.datacamp.com/courses/reporting-in-sql,Reporting,SQL
242,SQL Server Functions for Manipulating Data,4,14,54,"1,080","4,600",SQL Server Functions Manipulating Data,"SQL Server Functions for Manipulating Data
In this course, you will learn how to make use of the most important functions for manipulating data provided by SQL Server.  You can use these functions for processing and transforming data to get the results you want.
In this chapter, you will learn what are the most used data types in SQL Server. You will understand the differences between implicit and explicit conversion and how each type of conversion manifests. You will also get familiar with the functions used for explicitly converting data: CAST() and CONVERT().
Date and time functions are an important topic for databases. In this chapter, you will get familiar with the most common functions for date and time manipulation. You will learn how to retrieve the current date, only parts from a date, to assemble a date from pieces and to check if an expression is a valid date or not.
Strings are one of the most commonly used data types in databases. It's important to know what you can do with them. In this chapter, you will learn how to manipulate strings, to get the results you want.
In this chapter, you will work with functions applied to numeric data. You will use aggregate functions for calculating the minimum, maximum or the sum of values from a set. You will learn how to raise a number to a power or to calculate its square root.",[],"['Ana Voicu', 'Mona Khalil', 'Sara Billen', 'Marianna Lamnina']","[('Voters dataset', 'https://assets.datacamp.com/production/repositories/4181/datasets/f86596737f108b1baef2574ea585b0ac6d3f542c/voters.csv'), ('Ratings dataset', 'https://assets.datacamp.com/production/repositories/4181/datasets/a13386f814d792ea4fec4b8212469a41df87adc4/ratings.csv')]",['Intermediate SQL Server'],https://www.datacamp.com/courses/sql-server-functions-for-manipulating-data,Data Manipulation,SQL
243,SQL for Exploratory Data Analysis,4,16,58,"9,410","4,800",SQL Exploratory Data Analysis,"SQL for Exploratory Data Analysis
You have access to a database. Now what do you do? Building on your existing skills joining tables, using basic functions, grouping data, and using subqueries, the next step in your SQL journey is learning how to explore a database and the data in it. Using data from Stack Overflow, Fortune 500 companies, and 311 help requests from Evanston, IL, you'll get familiar with numeric, character, and date/time data types. You'll use functions to aggregate, summarize, and analyze data without leaving the database. Errors and inconsistencies in the data won't stop you!  You'll learn common problems to look for and strategies to clean up messy data. By the end of this course, you'll be ready to start exploring your own PostgreSQL databases and analyzing the data in them.
Start exploring a database by identifying the tables and the foreign keys that link them.  Look for missing values, count the number of observations, and join tables to understand how they're related.  Learn about coalescing and casting data along the way.
You'll build on functions like min and max to summarize numeric data in new ways.  Add average, variance, correlation, and percentile functions to your toolkit, and learn how to truncate and round numeric values too.  Build complex queries and save your results by creating temporary tables.
Text, or character, data can get messy, but you'll learn how to deal with inconsistencies in case, spacing, and delimiters. 
Learn how to use a temporary table to recode messy categorical data to standardized values you can count and aggregate.  Extract new variables from unstructured text as you explore help requests submitted to the city of Evanston, IL.
What time is it?  In this chapter, you'll learn how to find out.  You'll aggregate date/time data by hour, day, month, or year and practice both constructing time series and finding gaps in them.",['SQL Fundamentals'],"['Christina Maimone', 'Chester Ismay', 'Mona Khalil', 'Adrián Soto']","[('Stack Overflow Question Counts', 'https://assets.datacamp.com/production/repositories/3567/datasets/1e9257c9d86e03a979124c6d99a0ff154da953fd/stackexchange.csv'), ('Fortune 500 Companies', 'https://assets.datacamp.com/production/repositories/3567/datasets/19cf8e7841e26d71feb3516c7a4b135aff8a8b4f/fortune.csv'), ('Evanston 311 Help Requests', 'https://assets.datacamp.com/production/repositories/3567/datasets/48ea25f9557bdad445f18055f13903455189359c/ev311.csv'), ('Course Database Creation Code', 'https://assets.datacamp.com/production/repositories/3567/datasets/d28e86acf2732c061fcf40b19809188aa59fd198/sql_eda_dbcreate.sql'), ('Course Database Entity Relationship Diagram', 'https://assets.datacamp.com/production/repositories/3567/datasets/63f06eacb55f42967415232093d7c25f38b29043/erdiagram.png')]",['Joining Data in SQL'],https://www.datacamp.com/courses/sql-for-exploratory-data-analysis,Data Manipulation,SQL
244,Scalable Data Processing in R,4,15,49,"3,232","3,950",Scalable Data Processing in R,"Scalable Data Processing in R
Datasets are often larger than available RAM, which causes problems for R programmers since by default all the variables are stored in memory. You’ll learn tools for processing, exploring, and analyzing data directly from disk. You’ll also implement the split-apply-combine approach and learn how to write scalable code using the bigmemory and iotools packages. In this course, you'll make use of the Federal Housing Finance Agency's data, a publicly available data set chronicling all mortgages that were held or securitized by both Federal National Mortgage Association (Fannie Mae) and Federal Home Loan Mortgage Corporation (Freddie Mac) from 2009-2015.
In this chapter, we cover the reasons you need to apply new techniques when data sets are larger than available RAM.  We show that importing and exporting data using the base R functions can be slow and some easy ways to remedy this. Finally, we introduce the bigmemory package.
Now that you've got some experience using bigmemory, we're going to go through some simple data exploration and analysis techniques. In particular, we'll see how to create tables and implement the split-apply-combine approach.
We'll use the iotools package that can process both numeric and string data, and introduce the concept of chunk-wise processing.
In the previous chapters, we've introduced the housing data and shown how to compute with data that is about as big, or bigger than, the amount of RAM on a single machine. In this chapter, we'll go through a preliminary analysis of the data, comparing various trends over time.",['Big Data with R'],"['Michael Kane', 'Simon Urbanek', 'Sascha Mayr', 'Sumedh Panchadhar']","[('Mortgage data (sample)', 'https://assets.datacamp.com/production/repositories/721/datasets/c07698e46226e841b9cdb1ad0248c664c2c20bbb/mortgage-sample.csv')]","['Intermediate R', 'Writing Efficient R Code']",https://www.datacamp.com/courses/scalable-data-processing-in-r,Programming,R
245,Sentiment Analysis in Python,4,16,60,891,"5,050",Sentiment Analysis,"Sentiment Analysis in Python
Have you left a review to express how you feel about a product or a service? And do you have a habit of checking a product’s reviews online before you buy it? This kind of information is valuable not only for you but also for companies. In this course, you will learn how to make sense of the sentiment expressed in various documents. You will use real-world datasets featuring tweets, movie and product reviews, and use Python’s nltk and scikit-learn packages. By the end of the course, you will be able to carry an end-to-end sentiment analysis task based on how US airline passengers expressed their feelings on Twitter.
Have you ever checked the reviews or ratings of a product or a service before you purchased it? Then you have very likely came face-to-face with sentiment analysis. In this chapter,  you will learn the basic structure of a sentiment analysis problem and start exploring the sentiment of movie reviews.
Imagine you are in the shoes of a company offering a variety of products. You want to know which of your products are bestsellers and most of all - why. We embark on step 1 of understanding the reviews of products, using a dataset with Amazon product reviews. To that end, we transform the text into a numeric form and consider a few complexities in the process.
This chapter continues the process of understanding product reviews. We will cover additional complexities, especially when working with sentiment analysis data from social media platforms such as Twitter.  We will also learn other ways to obtain numeric features from the text.
We employ machine learning to predict the sentiment of a review based on the words used in the review.  We use logistic regression and evaluate its performance in a few different ways. These are some solid first models!",[],"['Violeta Misheva', 'Hillary Green-Lerman', 'Chester Ismay', 'Ruanne Van Der Walt']","[('Amazon product reviews', 'https://assets.datacamp.com/production/repositories/4444/datasets/646e9378d0e44ccb55cebd81ecffdc0c755ab2a2/amazon_reviews_sample.csv'), ('IMDB movie reviews', 'https://assets.datacamp.com/production/repositories/4444/datasets/68959c0ad485c329b4990242f16f98e3dbb777c1/IMDB_sample.csv'), ('Tweets of US airline passengers', 'https://assets.datacamp.com/production/repositories/4444/datasets/935f11320d56b1e3730ba786d0cae2f16cc152a9/Tweets.csv')]",['Python Data Science Toolbox (Part 2)'],https://www.datacamp.com/courses/sentiment-analysis-in-python,Machine Learning,Python
246,Sentiment Analysis in R,4,14,52,"6,101","4,200",Sentiment Analysis in R,"Sentiment Analysis in R
Add sentiment analysis to your text mining toolkit! Sentiment analysis is used by text miners in marketing, politics, customer service and elsewhere. In this course you will learn to identify positive and negative language, specific emotional intent, and make compelling visualizations. You will end the course by applying your sentiment analysis skills to Airbnb reviews to learn what makes for a good rental.
In the first chapter, you will learn how to apply qdap's sentiment function called  polarity() .
In the second chapter you will explore 3 subjectivity lexicons from tidytext.  Then you will do an inner join to score some text.
Make compelling visuals with your sentiment output.
Is your property a good rental?  What do people look for in a good rental?",['Text Mining with R'],"['Ted Kwartler', 'Richie Cotton']","[('Line by line polarity for 4 books', 'https://assets.datacamp.com/production/repositories/591/datasets/0c20fb94d054c9b86c264efd7e3f4735f26c3587/all_book_polarity.rds'), ('4 books as a tidy data frame', 'https://assets.datacamp.com/production/repositories/591/datasets/7041f7019b83dae070bc2ded9213f8f0c611fbe3/all_books.rds'), ('4 books as DocumentTermMatrices', 'https://assets.datacamp.com/production/repositories/591/datasets/5e012008bbcfb1897be17accadf5718c9b036bb9/all_tdm.rds'), ('Polarity scores of Boston Airbnb reviews', 'https://assets.datacamp.com/production/repositories/591/datasets/8514c01e859182c749d07276a3af50b3c192eed1/bos_pol.rds'), ('Housing rental reviews from Airbnb in Boston', 'https://assets.datacamp.com/production/repositories/591/datasets/87fb7cd77a6bc8b22ba2aa4f150132b66b57483c/bos_reviews.rds')]","['Introduction to R', 'Intermediate R', 'Text Mining: Bag of Words']",https://www.datacamp.com/courses/sentiment-analysis-in-r,Probability & Statistics,R
247,Sentiment Analysis in R: The Tidy Way,4,15,53,"10,542","4,350",Sentiment Analysis in R: The Tidy Way,"Sentiment Analysis in R: The Tidy Way
Text datasets are diverse and ubiquitous, and sentiment analysis provides an approach to understand the attitudes and opinions expressed in these texts. In this course, you will develop your text mining skills using tidy data principles. You will apply these skills by performing sentiment analysis in several case studies, on text data from Twitter to TV news to Shakespeare. These case studies will allow you to practice important data handling skills, learn about the ways sentiment analysis can be applied, and extract relevant insights from real-world data.
In this chapter you will implement sentiment analysis using tidy data principles using geocoded Twitter data.
Your next real-world text exploration uses tragedies and comedies by Shakespeare to show how sentiment analysis can lead to insight into differences in word use. You will learn how to transform raw text into a tidy format for further analysis.
Text analysis using tidy principles can be applied to diverse kinds of text, and in this chapter, you will explore a dataset of closed captioning from television news. You will apply the skills you have learned so far to explore how different stations report on a topic with different words, and how sentiment changes with time.
In this final chapter on sentiment analysis using tidy principles, you will explore pop song lyrics that have topped the charts from the 1960s to today. You will apply all the techniques we have explored together so far, and use linear modeling to find what the sentiment of song lyrics can predict.",[],"['DataCamp Content Creator', 'Richie Cotton']","[('Text spoken on TV news programs', 'https://assets.datacamp.com/production/repositories/888/datasets/164d41774be67fd44956a71363b02e22a75fe01f/climate_text.rda'), ('Geocoded Twitter data', 'https://assets.datacamp.com/production/repositories/888/datasets/95651961f1d9a3b7d65b8b5e36c1a62b606f2076/geocoded_tweets.rda'), (""Six of Shakespeare's plays"", 'https://assets.datacamp.com/production/repositories/888/datasets/01670d67182ad73d6fcf62b31e0a560384f82365/shakespeare.rda'), ('Lyrics from pop songs over the last 50 years', 'https://assets.datacamp.com/production/repositories/888/datasets/73dbe822d4c5facd51fb2e88012022244d36af85/song_lyrics.rda')]","['Introduction to R', 'Data Visualization with ggplot2 (Part 1)']",https://www.datacamp.com/courses/sentiment-analysis-in-r-the-tidy-way,Probability & Statistics,R
248,Single-Cell RNA-Seq Workflows in R,4,12,50,"2,057","4,100",Single-Cell RNA-Seq Workflows in R,"Single-Cell RNA-Seq Workflows in R
Novel single-cell transcriptome sequencing assays allow researchers to measure gene expression levels at the resolution of single cells and offer the unprecedented opportunity to investigate fundamental biological questions at the cellular level, such as stem cell differentiation or the discovery and characterization of rare cell types. The majority of the computational methods to analyze single-cell RNA-Seq data are implemented in R making it a natural tool to start working with single-cell transcriptomic data. Using real single-cell datasets, this course provides a step-by-step tutorial to the methodology and associated R packages for the following four main tasks: (1) normalization, (2) dimensionality reduction, (3) clustering, (4) differential expression analysis.
In Chapter 1, you will learn what single-cell RNA-Seq is and why it is a such a powerful technique. By the end of this chapter, you'll also know how to load, create, and access single-cell datasets in R.
In Chapter 2, we go over the first steps of the workflow to analyze single-cell RNA-seq data, which include quality control and normalization. These two steps should get all the technical issues and biases out of the way so that in the next chapters we can focus on the biological signal of interest.
When studying single-cell data at the cellular level, the number of dimensions is the number of genes. The goal of dimensionality reduction is to reduce the number of dimensions to a smaller number either to visualize the data in 2 dimensions or to prepare the dataset for subsequent steps like clustering.
In Chapter 4, we cluster cells with similar gene expression profiles and then perform differential expression (DE) analysis to find genes differentially expressed between known groups of cells. We then visualize DE genes with volcano plots and heatmaps.",[],"['Fanny Perraudeau', 'Sascha Mayr', 'David Campos', 'Shon Inouye']","[('Mouse Epithelium dataset', 'https://assets.datacamp.com/production/repositories/2194/datasets/db708e91fb6931a80f481d4ed27ce66eadd72255/fletcher.rds'), ('Toy dataset', 'https://assets.datacamp.com/production/repositories/2194/datasets/2f0f44eeb82da5b72f2f2998ced5ad88590f2729/toy.rda'), ('Tung dataset', 'https://assets.datacamp.com/production/repositories/2194/datasets/8dbe2a97317db19c2367cafc9e5565b9a8a5637a/tung.rds')]","['Intermediate R', 'Introduction to Bioconductor']",https://www.datacamp.com/courses/single-cell-rna-seq-workflows-in-r,Other,R
249,Software Engineering for Data Scientists in Python,4,15,51,"4,843","4,100",Software Engineering Data Scientists,"Software Engineering for Data Scientists in Python
Data scientists can experience huge benefits by learning concepts from the field of software engineering, allowing them to more easily reutilize their code and share it with collaborators. In this course, you'll learn all about the important ideas of modularity, documentation, & automated testing, and you'll see how they can help you solve Data Science problems quicker and in a way that will make future you happy. You'll even get to use your acquired software engineering chops to write your very own Python package for performing text analytics.
Why should you as a Data Scientist care about Software Engineering concepts?  Here we'll cover specific Software Engineering concepts and how these important ideas can revolutionize your Data Science workflow!
Become a fully fledged Python package developer by writing your first package!  You'll learn how to structure and write Python code that you can be installed, used, and distributed just like famous packages such as NumPy and Pandas.
Object Oriented Programming is a staple of Python development.  By leveraging classes and inheritance your Python package will become a much more powerful tool for your users.
You've now written a fully functional Python package for text analysis!  To make maintaining your project as easy as possible we'll leverage best practices around concepts such as documentation and unit testing.",[],"['Adam Spannbauer', 'Chester Ismay', 'Adrián Soto']",[],"['Python Data Science Toolbox (Part 1)', 'Introduction to Shell for Data Science']",https://www.datacamp.com/courses/software-engineering-for-data-scientists-in-python,Programming,Python
250,Spatial Analysis in R with sf and raster,4,15,53,"4,487","4,550",Spatial Analysis in R sf and raster,"Spatial Analysis in R with sf and raster
There has never been a better time to use R for spatial analysis! The brand new sf package has made working with vector data in R a breeze and the raster package provides a set of powerful and intuitive tools to work gridded data like satellite imagery. Instead of the painful process of performing your spatial analysis in GIS systems like ArcGIS or QGIS and then shuffling your results into another system for analysis you can move your entire spatial analysis workflow into R. In this course you will learn why the sf package is rapidly taking over spatial analysis in R. You will read in spatial data, manipulate vectors using the dplyr package and learn how to work with coordinate reference systems. You'll also learn how to perform geoprocessing of vectors including buffering, spatial joins, computing intersections, simplifying and measuring distance. With rasters you will aggregate, reclassify, crop, mask and extract. The last chapter of the course is devoted to showing you how to make maps in R with the ggplot2 and tmap packages and performing a fun mini-analysis that brings together all your new skills.
An introduction to import/export, learning the formats and getting to know spatial data. Some discussion of why we're using sf rather than sp.
In this lesson you will learn how to prepare layers so that you can conduct spatial analysis. This includes ensuring that the layers all share the same coordinate reference system.
Now that you have learned about sf and raster objects and have prepared your layers for analysis we can begin conducting true spatial analysis. Both sf and raster have a suite of functions that allow you to do single-layer kinds of analysis like buffering and computing hulls as well as multi-layer operations like intersections, overlaps, masking and clipping.
You are now ready to combine your skills into a mini-analysis. The goal is to evaluate whether the average canopy density by NYC neighborhood is correlated with the number of trees by neighborhood and to create a nice plot of the result.",['Spatial Data with R'],"['Zev Ross', 'Richie Cotton']","[('Canopy', 'https://assets.datacamp.com/production/repositories/738/datasets/79cb56df0fa27272e16b366a697aba8ac1d3e923/canopy.zip'), ('Impervious', 'https://assets.datacamp.com/production/repositories/738/datasets/de17563d1cd9f04d8aa606c0cc81a55defa33711/impervious.zip'), ('Landcover', 'https://assets.datacamp.com/production/repositories/738/datasets/06f95b22b2165a012db448009a8ebcf11aad9d0f/landcover.zip'), ('Manhattan', 'https://assets.datacamp.com/production/repositories/738/datasets/30830f8ba4a60aa1711f41e9a842b22cba3204f3/manhattan.zip'), ('Neighborhoods', 'https://assets.datacamp.com/production/repositories/738/datasets/96a72364e69d872645038b3a6dc7c0dbcb1114d6/neighborhoods.zip'), ('Parks', 'https://assets.datacamp.com/production/repositories/738/datasets/09c98c73cf6db769f7ff379f19fe842c931c4fba/parks.zip'), ('Trees', 'https://assets.datacamp.com/production/repositories/738/datasets/08a3684dc4d538d59ba051a64a834166883ab5d1/trees.zip')]",['Working with Geospatial Data in R'],https://www.datacamp.com/courses/spatial-analysis-in-r-with-sf-and-raster,Probability & Statistics,R
251,Spatial Statistics in R,4,16,60,"5,203","4,950",Spatial Statistics in R,"Spatial Statistics in R
Everything happens somewhere, and increasingly the place where all these things happen is being recorded in a database. There is some truth behind the oft-repeated statement that 80% of data have a spatial component. So what can we do with this spatial data? Spatial statistics, of course! Location is an important explanatory variable in so many things - be it a disease outbreak, an animal's choice of habitat, a traffic collision, or a vein of gold in the mountains - that we would be wise to include it whenever possible. This course will start you on your journey of spatial data analysis. You'll learn what classes of statistical problems present themselves with spatial data, and the basic techniques of how to deal with them. You'll see how to look at a mess of dots on a map and bring out meaningful insights.
After a quick review of spatial statistics as a whole, you'll go through some point-pattern analysis. You'll learn how to recognize and test different types of spatial patterns.
Point Pattern Analysis answers questions about why things appear where they do. The things could be trees, disease cases, crimes, lightning strikes - anything with a point location.
So much data is collected in administrative divisions that there are specialized techniques for analyzing them. This chapter presents several methods for exploring data in areas.
Originally developed for the mining industry, geostatistics covers the analysis of location-based measurement data. It enables model-based interpolation of measurements with uncertainty estimation.",['Spatial Data with R'],"['Barry Rowlingson', 'Tom Jeon', 'Richie Cotton']","[('Canadian geological survey soil acidity', 'https://assets.datacamp.com/production/repositories/748/datasets/472befdcbd5200b8240b7ebc310bb4c4eac0acaf/ca_geo.rds'), ('Bounding region for ca_geo.rds', 'https://assets.datacamp.com/production/repositories/748/datasets/3b9492bb5cc60ada87ca069282fc66e42f369a62/ca_geo_bounds.rds'), ('Flu incidents by London borough, 2017', 'https://assets.datacamp.com/production/repositories/748/datasets/79608213208c8b7d819e5918bf631e895151baa6/london_2017_2.rds'), ('EU referendum voting by London borough, 2016', 'https://assets.datacamp.com/production/repositories/748/datasets/a984e9b27490b78c48bf9130328b0f09b88be586/london_eu.rds'), ('An OpenStreetMap map of Preston, UK', 'https://assets.datacamp.com/production/repositories/748/datasets/448e73a7e02bd962d60b34ae0e0c2ff3bc92d451/osm_preston_gray.rds'), ('Crime in Preston, UK', 'https://assets.datacamp.com/production/repositories/748/datasets/6b32a67b58072c2c181daf2dae81d5944934fbea/pcrime-spatstat.rds'), ('Ph grid', 'https://assets.datacamp.com/production/repositories/748/datasets/89cbfc7312412d5ad7402bd335bdb7bf50ea2b6b/ph_grid.rds'), ('Sasquatch sightings, 1990 to 2016', 'https://assets.datacamp.com/production/repositories/748/datasets/f4b473c6d5b4e280108123355a9c0cee63aa057c/sasquatch.rds')]","['Introduction to R', 'Intermediate R', 'Working with Geospatial Data in R']",https://www.datacamp.com/courses/spatial-statistics-in-r,Probability & Statistics,R
252,Spreadsheet Basics,2,0,23,"22,779","2,300",Spreadsheet Basics,"Spreadsheet Basics
Spreadsheet software is one of the most popular and powerful tools in data analysis. Millions of people use tools like Google Sheets or Microsoft Excel on a daily basis. Even the most experienced data scientists often started their careers with spreadsheets and still use it to test assumptions or to look at data for the first time. In this course, you will learn the basics of spreadsheets by working with rows, columns, addresses, and ranges. You will create your own formulas and learn how to use references.
In this chapter, you’ll learn how to navigate within spreadsheets using concepts like rows, columns, cells, and ranges. Then you’ll practice using an essential part of spreadsheets: formulas. Finally, you'll learn how different data types are used in Google Sheets.
In this chapter, you’ll learn how to use a powerful technique in Google Sheets: referencing. This chapter will cover concepts like absolute references, autofilling, and reactivity. After this chapter, your productivity with spreadsheets will have increased by a factor of n.",[],"['Vincent Vankrunkelsven', 'Sascha Mayr']",[],[],https://www.datacamp.com/courses/spreadsheet-basics,Programming,Spreadsheets
253,Statistical Modeling in R (Part 1),4,10,43,"19,207","3,800",Statistical Modeling in R,"Statistical Modeling in R (Part 1)
Statistical Modeling in R is a multi-part course designed to get you up to speed with the most important and powerful methodologies in statistics. In Part 1, we'll take a look at what modeling is and what it's used for, R tools for constructing models, using models for prediction (and using prediction to test models), and how to account for the combined influences of multiple variables. This course has been written from scratch, specifically for DataCamp users. As you'll see, by using computing and concepts from machine learning, we'll be able to leapfrog many of the marginal and esoteric topics encountered in traditional 'regression' courses.
This chapter explores what a statistical model is, R objects which build models, and the basic R notation, called formulas used for models.
In this chapter, you'll start building models: specifying what variables models should relate to one another and training models on the available data. You'll also provide new inputs to models to generate the corresponding outputs.
This chapter is about techniques for deciding whether an explanatory variable improves the prediction performance of a model. You'll use cross validation to compare different models.
This chapter is about constructing models to explore masses of data, for instance to generate hypotheses about what factors are important in how a system works. You'll see how the recursive partitioning model architecture, which has an internal logic for selecting explanatory variables, can be used to explore potentially complex relationships among variables. The chapter also covers the evaluation of prediction performance in models where the response variable is categorical, that is, models used for classification.
Real-world systems are complicated. To faithfully reflect that complexity, models can incorporate multiple explanatory variables. This chapter introduces the notion of covariates and how they allow you to model the effect of an explanatory variable while taking into account the effects of other variables.",[],"['Daniel Kaplan', 'Nick Carchedi', 'Tom Jeon']","[('Ran twice', 'https://assets.datacamp.com/production/repositories/482/datasets/90c859390dc363dfe0a9b96cd51bd66748ab2a3b/Ran_twice.csv'), ('100 Runners', 'https://assets.datacamp.com/production/repositories/482/datasets/af9404ae58b9e7238ae692cf36dce770d4f52c59/Runners100.csv')]","['Introduction to R', 'Intermediate R']",https://www.datacamp.com/courses/statistical-modeling-in-r-part-1,Probability & Statistics,R
254,Statistical Modeling in R (Part 2),4,10,41,"4,995","3,600",Statistical Modeling in R,"Statistical Modeling in R (Part 2)
Statistical Modeling in R is a multi-part course designed to get you up to speed with the most important and powerful methodologies in statistics. In Part 2, we'll take a look at effect size and interaction, the concepts of total and partial change, sampling variability and mathematical transforms, and the implications of something called collinearity. This course has been written from scratch, specifically for DataCamp users. As you'll see, by using computing and concepts from machine learning, we'll be able to leapfrog many of the marginal and esoteric topics encountered in traditional 'regression' courses.
Effect sizes were introduced in Part 1 of this course series as a way to quantify how each explanatory variable is connected to the response. In this chapter, you'll meet some high-level tools that make it easier to calculate and visualize effect sizes. You'll see how to extend the notion of effect size to models with a categorical response variable. And you'll start to use interactions in constructing models to reflect the way that one explanatory variable can influence the effect size of another explanatory variable on the response.
In many circumstances, an effect size tells you exactly what you need to know: how much the model output will change when one, and only one, explanatory variable changes. This is called partial change. In other situations, you will want to look at total change, which combines the effects of two or more explanatory variables. You'll also see an additional, but limited way of quantifying the extent to which the explanatory variables influence the response: R-squared. Finally, we'll describe the notion of degrees of freedom, a way of describing the complexity of a model.
This chapter examines the precision with which a model can estimate an effect size. The lack of precision comes from sampling variability, which can be quantified using resampling and bootstrapping. You'll also see some ways to improve precision using mathematical transformations of variables.
In this final chapter, you'll learn about why you'd want to avoid collinearity, a common phenomenon in statistical modeling. You'll wrap up the course by discussing some of the ways models can be improved by involving the modeler in the design of the data collecting process.",[],"['Daniel Kaplan', 'Nick Carchedi', 'Tom Jeon']","[('Used Fords', 'https://assets.datacamp.com/production/course_1586/datasets/Fords.csv')]","['Introduction to R', 'Intermediate R', 'Statistical Modeling in R (Part 1)']",https://www.datacamp.com/courses/statistical-modeling-in-r-part-2,Probability & Statistics,R
255,Statistical Simulation in Python,4,16,58,"4,660","4,800",Statistical Simulation,"Statistical Simulation in Python
Simulations are a class of computational algorithms that use the relatively simple idea of random sampling to solve increasingly complex problems. Although they have been around for ages, they have gained in popularity recently due to the rise in computational power and have seen applications in multiple domains including Artificial Intelligence, Physics, Computational Biology and Finance just to name a few. Students will use simulations to generate and analyze data over different probability distributions using the important NumPy package. This course will give students hands-on experience with simulations using simple, real-world applications.
This chapter gives you the tools required to run a simulation. We'll start with a review of random variables and probability distributions. We will then learn how to run a simulation by first looking at a simulation workflow and then recreating it in the context of a game of dice. Finally, we will learn how to use simulations for making decisions.
This chapter provides a basic introduction to probability concepts and a hands-on understanding of the data generating process. We'll look at a number of examples of modeling the data generating process and will conclude with modeling an eCommerce advertising simulation.
In this chapter, we will get a brief introduction to resampling methods and their applications. We will get a taste of bootstrap resampling, jackknife resampling, and permutation testing. After completing this chapter, students will be able to start applying simple resampling methods for data analysis.
In this chapter, students will be introduced to some basic and advanced applications of simulation to solve real-world problems. We'll work through a business planning problem, learn about Monte Carlo Integration, Power Analysis with simulation and conclude with a financial portfolio simulation. After completing this chapter, students will be ready to apply simulation to solve everyday problems.",['Statistics Fundamentals with Python'],"['Tushar Shanker', 'Lore Dirick', 'Becca Robins', 'Sara Snell']",[],"['Introduction to Python', 'Intermediate Python for Data Science', 'Statistical Thinking in Python (Part 1)']",https://www.datacamp.com/courses/statistical-simulation-in-python,Probability & Statistics,Python
256,Statistical Thinking in Python (Part 1),3,18,61,"67,390","4,550",Statistical Thinking,"Statistical Thinking in Python (Part 1)
After all of the hard work of acquiring data and getting them into a form you can work with, you ultimately want to make clear, succinct conclusions from them. This crucial last step of a data analysis pipeline hinges on the principles of statistical inference. In this course, you will start building the foundation you need to think statistically, speak the language of your data, and understand what your data is telling you. The foundations of statistical thinking took decades to build, but can be grasped much faster today with the help of computers. With the power of Python-based tools, you will rapidly get up-to-speed and begin thinking statistically by the end of this course.
Before diving into sophisticated statistical inference techniques, you should first explore your data by plotting them and computing simple summary statistics. This process, called exploratory data analysis, is a crucial first step in statistical analysis of data.
In this chapter, you will compute useful summary statistics, which serve to concisely describe salient features of a dataset with a few numbers.
Statistical inference rests upon probability. Because we can very rarely say anything meaningful with absolute certainty from data, we use probabilistic language to make quantitative statements about data. In this chapter, you will learn how to think probabilistically about discrete quantities: those that can only take certain values, like integers.
It’s time to move onto continuous variables, such as those that can take on any fractional value. Many of the principles are the same, but there are some subtleties. At the end of this final chapter, you will be speaking the probabilistic language you need to launch into the inference techniques covered in the sequel to this course.","['Data Analyst with Python', 'Data Scientist with Python', 'Statistics Fundamentals with Python']","['Justin Bois', 'Yashas Roy', 'Hugo Bowne-Anderson', 'Vincent Lan']","[('2008 election results (all states)', 'https://assets.datacamp.com/production/repositories/469/datasets/8fb59b9a99957c3b9b1c82b623aea54d8ccbcd9f/2008_all_states.csv'), ('2008 election results (swing states)', 'https://assets.datacamp.com/production/repositories/469/datasets/e079fddb581197780e1a7b7af2aeeff7242535f0/2008_swing_states.csv'), ('Belmont Stakes', 'https://assets.datacamp.com/production/repositories/469/datasets/7507bfed990379f246b4f166ea8a57ecf31c6c9d/belmont.csv'), ('Speed of light', 'https://assets.datacamp.com/production/repositories/469/datasets/df23780d215774ff90be0ea93e53f4fb5ebbade8/michelson_speed_of_light.csv')]","['Introduction to Python', 'Intermediate Python for Data Science', 'Python Data Science Toolbox (Part 1)']",https://www.datacamp.com/courses/statistical-thinking-in-python-part-1,Probability & Statistics,Python
257,Statistical Thinking in Python (Part 2),4,15,66,"34,181","5,350",Statistical Thinking,"Statistical Thinking in Python (Part 2)
After completing Statistical Thinking in Python (Part 1), you have the probabilistic mindset and foundational hacker stats skills to dive into data sets and extract useful information from them. In this course, you will do just that, expanding and honing your hacker stats toolbox to perform the two key tasks in statistical inference, parameter estimation and hypothesis testing. You will work with real data sets as you learn, culminating with analysis of measurements of the beaks of the Darwin's famous finches. You will emerge from this course with new knowledge and lots of practice under your belt, ready to attack your own inference problems out in the world.
When doing statistical inference, we speak the language of probability. A probability distribution that describes your data has parameters. So, a major goal of statistical inference is to estimate the values of these parameters, which allows us to concisely and unambiguously describe our data and draw conclusions from it. In this chapter, you will learn how to find the optimal parameters, those that best describe your data.
To ""pull yourself up by your bootstraps"" is a classic idiom meaning that you achieve a difficult task by yourself with no help at all. In statistical inference, you want to know what would happen if you could repeat your data acquisition an infinite number of times. This task is impossible, but can we use only the data we actually have to get close to the same result as an infinitude of experiments? The answer is yes! The technique to do it is aptly called bootstrapping. This chapter will introduce you to this extraordinarily powerful tool.
You now know how to define and estimate parameters given a model. But the question remains: how reasonable is it to observe your data if a model is true? This question is addressed by hypothesis tests. They are the icing on the inference cake. After completing this chapter, you will be able to carefully construct and test hypotheses using hacker statistics.
As you saw from the last chapter, hypothesis testing can be a bit tricky. You need to define the null hypothesis, figure out how to simulate it, and define clearly what it means to be ""more extreme"" in order to compute the p-value. Like any skill, practice makes perfect, and this chapter gives you some good practice with hypothesis tests.
Every year for the past 40-plus years, Peter and Rosemary Grant have gone to the Galápagos island of Daphne Major and collected data on Darwin's finches. Using your skills in statistical inference, you will spend this chapter with their data, and witness first hand, through data, evolution in action. It's an exhilarating way to end the course!","['Data Analyst with Python', 'Data Scientist with Python', 'Statistics Fundamentals with Python']","['Justin Bois', 'Yashas Roy', 'Hugo Bowne-Anderson', 'Vincent Lan']","[('Anscombe data', 'https://assets.datacamp.com/production/repositories/470/datasets/fe820c6cbe9bcf4060eeb9e31dd86aa04264153a/anscombe.csv'), ('Bee sperm counts', 'https://assets.datacamp.com/production/repositories/470/datasets/e427679d28d154934a6c087b2fa945bc7696db6d/bee_sperm.csv'), ('Female literacy and fertility', 'https://assets.datacamp.com/production/repositories/470/datasets/f1e7f8a98c18da5c60b625cb8af04c3217f4a5c3/female_literacy_fertility.csv'), ('Finch beaks (1975)', 'https://assets.datacamp.com/production/repositories/470/datasets/eb228490f7d823bfa6458b93db075ca5ccd3ec3d/finch_beaks_1975.csv'), ('Finch beaks (2012)', 'https://assets.datacamp.com/production/repositories/470/datasets/b28d5bf65e38460dca7b3c5c0e4d53bdfc1eb905/finch_beaks_2012.csv'), ('Fortis beak depth heredity', 'https://assets.datacamp.com/production/repositories/470/datasets/532cb2fecd1bffb006c79a28f344af2290d643f3/fortis_beak_depth_heredity.csv'), ('Frog tongue data', 'https://assets.datacamp.com/production/repositories/470/datasets/df6e0479c0f292ce9d2b951385f64df8e2a8e6ac/frog_tongue.csv'), ('Major League Baseball no-hitters', 'https://assets.datacamp.com/production/repositories/470/datasets/593c37a3588980e321b126e30873597620ca50b7/mlb_nohitters.csv'), ('Scandens beak depth heredity', 'https://assets.datacamp.com/production/repositories/470/datasets/7ff772e1f4e99ed93685296063b6e604a334236d/scandens_beak_depth_heredity.csv'), ('Sheffield Weather Station', 'https://assets.datacamp.com/production/repositories/470/datasets/129cba08c45749a82701fbe02180c5b69eb9adaf/sheffield_weather_station.csv')]","['Introduction to Python', 'Intermediate Python for Data Science', 'Statistical Thinking in Python (Part 1)']",https://www.datacamp.com/courses/statistical-thinking-in-python-part-2,Probability & Statistics,Python
258,Statistics in Spreadsheets,4,15,51,722,"4,350",Statistics in Spreadsheets,"Statistics in Spreadsheets
Statistics is the science that deals with the collection, analysis, and interpretation of data. Having a solid foundation in statistics will help you effectively work with your data to test hypotheses and uncover insights that can help solve your problems. This course is designed to give you that foundation in statistics. Using Spreadsheets functions, you'll dive into averages, distributions, hypothesis testing, and conclude the course by applying your newfound knowledge in a case study. Along the way, you'll work with a variety of datasets ranging from eBay auctions to train ridership to historical presidential approval ratings. Enjoy!
Begin your journey by learning why and how to summarize your data using statistics such as the mean, median, and mode. While working with a variety of datasets ranging from Amazon revenue to U.S Presidential ratings, you'll learn about the differences between each of these fundamental statistics and most importantly, when to use each.
Data visualization is one of the most important parts of any data science workflow. It leads to a deeper understanding of your dataset which in turn allows you to more effectively communicate results to stakeholders. In this chapter, you'll learn how to visualize your data in Spreadsheets using statistical plots such as the histogram, scatter plot, and bar plot.
This chapter introduces you to statistical hypothesis testing. You'll learn how to construct a hypothesis, test it using different statistical tests, and properly interpret the results.
The final stretch! Apply all of your newfound statistical knowledge and solidify everything you have learned by working through a case study consisting of online dating profile data.",[],"['Ted Kwartler', 'Chester Ismay', 'Yashas Roy']",[],['Data Analysis with Spreadsheets'],https://www.datacamp.com/courses/statistics-in-spreadsheets,Probability & Statistics,Spreadsheets
259,Streamlined Data Ingestion with pandas,4,16,53,"1,072","4,500",Streamlined Data Ingestion pandas,"Streamlined Data Ingestion with pandas
Before you can analyze data, you first have to acquire it. This course teaches you how to build pipelines to import data kept in common storage formats. You’ll use pandas, a major Python library for analytics, to get data from a variety of sources, from spreadsheets of survey responses, to a database of public service requests, to an API for a popular review site. Along the way, you’ll learn how to fine-tune imports to get only what you need and to address issues like incorrect data types. Finally, you’ll assemble a custom dataset from a mix of sources.
Practice using pandas to get just the data you want from flat files, learn how to wrangle data types and handle errors, and look into some U.S. tax data along the way.
Automate data imports from that staple of office life, Excel files. Import part or all of a workbook and ensure boolean and datetime data are properly loaded, all while learning about how other people are learning to code.
Combine pandas with the powers of SQL to find out just how many problems New Yorkers have with their housing. This chapter features introductory SQL topics like WHERE clauses, aggregate functions, and basic joins.
Learn how to work with JSON data and web APIs by exploring a public dataset and getting cafe recommendations from Yelp. End by learning some techniques to combine datasets once they have been loaded into data frames.",[],"['Amany Mahfouz', 'Hillary Green-Lerman', 'Adrián Soto']","[('Vermont tax return data by ZIP code', 'https://assets.datacamp.com/production/repositories/4412/datasets/61bb27bf939aac4344d4f446ce6da1d1bf534174/vt_tax_data_2016.csv'), ('FreeCodeCamp New Developer Survey response subset', 'https://assets.datacamp.com/production/repositories/4412/datasets/fdb113aa942a0e0ad5c155b2f04784886f0038c9/fcc-new-coder-survey.xlsx'), ('NYC weather and 311 housing complaints', 'https://assets.datacamp.com/production/repositories/4412/datasets/86d5855fd30d02afe8cb563da6057190694c6b86/data.db')]","['Intermediate Python for Data Science', 'Intro to SQL for Data Science']",https://www.datacamp.com/courses/streamlined-data-ingestion-with-pandas,Importing & Cleaning Data,Python
260,String Manipulation in R with stringr,4,17,60,"14,783","5,850",String Manipulation in R stringr,"String Manipulation in R with stringr
Character strings can turn up in all stages of a data science project. You might have to clean messy string input before analysis, extract data that is embedded in text or automatically turn numeric results into a sentence to include in a report. Perhaps the strings themselves are the data of interest, and you need to detect and match patterns within them. This course will help you master these tasks by teaching you how to pull strings apart, put them back together and use stringr to detect, extract, match and split strings using regular expressions, a powerful way to express patterns.
You'll start with some basics: how to enter strings in R, how to control how numbers are transformed to strings, and finally how to combine strings together to produce output that combines text and nicely formatted numbers.
Time to meet stringr! You'll start by learning about some stringr functions that are very similar to some base R functions, then how to detect specific patterns in strings, how to split strings apart and how to find and replace parts of strings.
In this chapter you'll learn about regular expressions, a language for describing patterns in strings.  By combining regular expressions with the stringr functions you'll greatly increase your power to manipulate strings.
Now for two advanced ways to use regular expressions along with stringr: selecting parts of a match (a.k.a capturing) and referring back to parts of a match (a.k.a back-referencing).  You'll also learn to deal with and strings or patterns that contain Unicode characters (e.g. é).
Practice your string manipulation skills on a couple of case studies.  You'll also learn a few new skills, reading strings into R and handling problems of case (e.g. A versus a).","['R Programmer', 'Text Mining with R']","['Charlotte Wickham', 'Tom Jeon', 'Richie Cotton']","[('DNA sequences from the genome of Yersinia pestis', 'https://assets.datacamp.com/production/course_2922/datasets/dna.rds'), ('Narratives', 'https://assets.datacamp.com/production/course_2922/datasets/narratives.rds'), ('Adverbs', 'https://assets.datacamp.com/production/course_2922/datasets/adverbs.rds'), ('Importance of being earnest', 'https://assets.datacamp.com/production/course_2922/datasets/importance-of-being-earnest.txt'), ('Cat-related accidents', 'https://assets.datacamp.com/production/course_2922/datasets/catcidents.rds')]","['Introduction to R', 'Intermediate R']",https://www.datacamp.com/courses/string-manipulation-in-r-with-stringr,Programming,R
261,Structural Equation Modeling with lavaan in R,4,14,45,"2,575","3,750",Structural Equation Modeling lavaan in R,"Structural Equation Modeling with lavaan in R
When working with data, we often want to create models to predict future events, but we also want an even deeper understanding of how our data is connected or structured. In this course, you will explore the connectedness of data using using structural equation modeling (SEM) with the R programming language using the lavaan package. SEM will introduce you to latent and manifest variables and how to create measurement models, assess measurement model accuracy, and fix poor fitting models. During the course, you will explore classic SEM datasets, such as the Holzinger and Swineford (1939) and Bollen (1989) datasets. You will also work through a multi-factor model case study using the Wechsler Adult Intelligence Scale. Following this course, you will be able to dive into your data and gain a much deeper understanding of how it all fits together.
In this chapter, you will dive into creating your first structural equation model with lavaan. You will learn important terminology, how to build, and run models. You will create a one-factor model of mental test abilities using the classic Holzinger and Swineford (1939) dataset.
In this chapter, you will expand your skills in lavaan to creating multi-factor models. We will improve the one-factor models from the last chapter by creating multiple latent variables in the classic Holzinger and Swineford (1939) dataset.
Structural equation models do not always run smoothly, and in this chapter, you will learn how to troubleshoot Heywood cases which are common errors. You will also learn how to diagram your model in R using the semPlot library.
This chapter examines the WAIS-III IQ Scale and its structural properties. You will use your skills from the first three chapters to create various models of the WAIS-III, troubleshoot errors in those models, and create diagrams of the final model.",[],"['Erin Buchanan', 'Chester Ismay', 'Becca Robins']","[('WAIS-III IQ Data for Hierarchical Model', 'https://assets.datacamp.com/production/repositories/1919/datasets/99d3c2592e5c972ab296cc6bd2aab772db3bb185/IQdata.csv'), ('Latent Variable Heywood Case Data', 'https://assets.datacamp.com/production/repositories/1919/datasets/c32b24e60af98f4ca20ff1efa1b43a05a4875458/badlatentdata.csv'), ('Negative Variance Heywood Case Data', 'https://assets.datacamp.com/production/repositories/1919/datasets/40fc47bf4b5895247a6c764c72403f414e607378/badvardata.csv')]","['Correlation and Regression', 'Multiple and Logistic Regression']",https://www.datacamp.com/courses/structural-equation-modeling-with-lavaan-in-r,Probability & Statistics,R
262,Supervised Learning in R: Case Studies,4,14,56,"3,039","4,400",Supervised Learning in R: Case Studies,"Supervised Learning in R: Case Studies
Predictive modeling, or supervised machine learning, is a powerful tool for using data to make predictions about the world around us. Once you understand the basic ideas of supervised machine learning, the next step is to practice your skills so you know how to apply these techniques wisely and appropriately. In this course, you will work through four case studies using data from the real world; you will gain experience in exploratory data analysis, preparing data so it is ready for predictive modeling, training supervised machine learning models, and evaluating those models.
In this first case study, you will predict fuel efficiency from a US Department of Energy data set for real cars of today.
Stack Overflow is the world's largest online community for developers, and you have probably used it to find an answer to a programming question. The second chapter uses data from the annual Stack Overflow Developer Survey to practice predictive modeling and find which developers are more likely to work remotely.
In the third case study, you will use data on attitudes and beliefs in the United States to predict voter turnout. You will apply your skills in dealing with imbalanced data and explore more resampling options.
The last case study uses an extensive survey of Catholic nuns fielded in 1967 to once more put your practical machine learning skills to use. You will predict the age of these religious women from their responses about their beliefs and attitudes.",[],"['DataCamp Content Creator', 'Chester Ismay', 'Sumedh Panchadhar']","[('Fuel efficiency of real cars (2018)', 'https://assets.datacamp.com/production/repositories/1918/datasets/a09cc1064f9f211821249efd5f1af986e20226ee/cars2018.csv'), ('Annual Stack Overflow Developer Survey (2017)', 'https://assets.datacamp.com/production/repositories/1918/datasets/b0349d1581fdc6a19d75acfcc8d79ab308bb6f98/stackoverflow.csv'), ('Survey responses of voters (2016 US Presidential elections)', 'https://assets.datacamp.com/production/repositories/1918/datasets/c4e0dde25eece582cfc0cb0b97f191c85769da48/voters.csv'), ('Survey responses of Catholic sisters (1967)', 'https://assets.datacamp.com/production/repositories/1918/datasets/d866dd23119cdc0486a9dd70ce7bc65bc1b91cb2/sisters.csv')]","['Supervised Learning in R: Regression', 'Supervised Learning in R: Classification', 'Machine Learning Toolbox', 'Introduction to the Tidyverse']",https://www.datacamp.com/courses/supervised-learning-in-r-case-studies,Machine Learning,R
263,Supervised Learning in R: Classification,4,14,55,"32,495","3,950",Supervised Learning in R: Classification,"Supervised Learning in R: Classification
This beginner-level introduction to machine learning covers four of the most common classification algorithms. You will come away with a basic understanding of how each algorithm approaches a learning task, as well as learn the R functions needed to apply these tools to your own work.
As the kNN algorithm literally ""learns by example"" it is a case in point for starting to understand supervised machine learning. This chapter will introduce classification while working through the application of kNN to self-driving vehicle road sign recognition.
Naive Bayes uses principles from the field of statistics to make predictions. This chapter will introduce the basics of Bayesian methods while exploring how to apply these techniques to iPhone-like destination suggestions.
Logistic regression involves fitting a curve to numeric data to make predictions about binary events. Arguably one of the most widely used machine learning methods, this chapter will provide an overview of the technique while illustrating how to apply it to fundraising data.
Classification trees use flowchart-like structures to make decisions. Because humans can readily understand these tree structures, classification trees are useful when transparency is needed, such as in loan approval. We'll use the Lending Club dataset to simulate this scenario.",['Machine Learning Fundamentals in R'],"['Brett Lantz', 'Nick Carchedi', 'Nick Solomon']","[('Lending Club loan data', 'https://assets.datacamp.com/production/repositories/718/datasets/7805fceacfb205470c0e8800d4ffc37c6944b30c/loans.csv'), ('Traffic sign image data', 'https://assets.datacamp.com/production/repositories/718/datasets/c274ea22cc3d7e12d7bb9fdc9c2bdabe9ab025f4/knn_traffic_signs.csv'), ('Donation data', 'https://assets.datacamp.com/production/repositories/718/datasets/9055dac929e4515286728a2a5dae9f25f0e4eff6/donors.csv'), (""Brett's location data"", 'https://assets.datacamp.com/production/repositories/718/datasets/571628c39048df59c40c9dcfba146a2cf7a4a0e3/locations.csv')]","['Introduction to R', 'Intermediate R']",https://www.datacamp.com/courses/supervised-learning-in-r-classification,Machine Learning,R
264,Supervised Learning in R: Regression,4,19,65,"12,751","5,300",Supervised Learning in R: Regression,"Supervised Learning in R: Regression
From a machine learning perspective, regression is the task of predicting numerical outcomes from various inputs. In this course, you'll learn about different regression models, how to train these models in R, how to evaluate the models you train and use them to make predictions.
In this chapter we introduce the concept of regression from a machine learning point of view. We will present the fundamental regression method: linear regression. We will show how to fit a linear regression model and to make predictions from the model.
Now that we have learned how to fit basic linear regression models, we will learn how to evaluate how well our models perform. We will review evaluating a model graphically, and look at two basic metrics for regression models. We will also learn how to train a model that will perform well in the wild, not just on training data. Although we will demonstrate these techniques using linear regression, all these concepts apply to models fit with any regression algorithm.
Before moving on to more sophisticated regression techniques, we will look at some other modeling issues: modeling with categorical inputs, interactions between variables, and when you might consider transforming inputs and outputs before modeling. While more sophisticated regression techniques manage some of these issues automatically, it's important to be aware of them, in order to understand which methods best handle various issues -- and which issues you must still manage yourself.
Now that we have mastered linear models, we will begin to look at techniques for modeling situations that don't meet the assumptions of linearity. This includes predicting probabilities and frequencies (values bounded between 0 and 1); predicting counts (nonnegative integer values, and associated rates); and responses that have a non-linear but additive relationship to the inputs. These algorithms are variations on the standard linear model.
In this chapter we will look at modeling algorithms that do not assume linearity or additivity, and that can learn limited types of interactions among input variables. These algorithms are *tree-based* methods that work by combining ensembles of *decision trees* that are learned from the training data.",['Machine Learning Fundamentals in R'],"['Nina Zumel', 'John Mount', 'Sascha Mayr', 'Sumedh Panchadhar']","[('Bikes', 'https://assets.datacamp.com/production/repositories/894/datasets/6feade63b6c9f866433b94752677845ed1b22914/Bikes.RData'), ('Blood Pressure', 'https://assets.datacamp.com/production/repositories/894/datasets/9210064536fb0cb88a265c748f65dfc4761b2292/bloodpressure.rds'), ('Cricket', 'https://assets.datacamp.com/production/repositories/894/datasets/a2b224940b225fc552d243287b564ca126aa71c5/cricket.rds'), ('House Prices', 'https://assets.datacamp.com/production/repositories/894/datasets/6f144237ef9d7da94b2c84aa8eccc519bae4b300/houseprice.rds'), ('Income', 'https://assets.datacamp.com/production/repositories/894/datasets/a3b41b96dcb08ad84c82ca6b45f27a36832700e5/Income.RData'), ('Mpg', 'https://assets.datacamp.com/production/repositories/894/datasets/a3ea08fe688ee3afdc142a2641a0e035b09ef9e0/Mpg.RData'), ('Soybean', 'https://assets.datacamp.com/production/repositories/894/datasets/1702e2cf57da42eb449515ec71aacba41bf3a6db/Soybean.RData'), ('Unemployment', 'https://assets.datacamp.com/production/repositories/894/datasets/e05279335f3b0e5a768f1fcd9b55d047a9f5faeb/unemployment.rds'), ('Sparrow', 'https://assets.datacamp.com/production/repositories/894/datasets/30d1dbf4d0788fdd293f8a2b4ac5614f109eb1c1/sparrow.rds')]","['Introduction to R', 'Correlation and Regression']",https://www.datacamp.com/courses/supervised-learning-in-r-regression,Machine Learning,R
265,Supervised Learning with scikit-learn,4,17,54,"113,837","4,200",Supervised Learning scikit-learn,"Supervised Learning with scikit-learn
Machine learning is the field that teaches machines and computers to learn from existing data to make predictions on new data: Will a tumor be benign or malignant? Which of your customers will take their business elsewhere? Is a particular email spam? In this course, you'll learn how to use Python to perform supervised learning, an essential component of machine learning. You'll learn how to build predictive models, tune their parameters, and determine how well they will perform with unseen data—all while using real world datasets. You'll be using scikit-learn, one of the most popular and user-friendly machine learning libraries for Python.
In this chapter, you will be introduced to classification problems and learn how to solve them using supervised learning techniques. And you’ll apply what you learn to a political dataset, where you classify the party affiliation of United States congressmen based on their voting records.
In the previous chapter, you used image and political datasets to predict binary and multiclass outcomes. But what if your problem requires a continuous outcome? Regression is best suited to solving such problems. You will learn about fundamental concepts in regression and apply them to predict the life expectancy in a given country using Gapminder data.
Having trained your model, your next task is to evaluate its performance. In this chapter, you will learn about some of the other metrics available in scikit-learn that will allow you to assess your model's performance in a more nuanced manner. Next, learn to optimize your classification and regression models using hyperparameter tuning.
This chapter introduces pipelines, and how scikit-learn allows for transformers and estimators to be chained together and used as a single unit. Preprocessing techniques will be introduced as a way to enhance model performance, and pipelines will tie together concepts from previous chapters.","['Data Scientist with Python', 'Machine Learning with Python']","['Hugo Bowne-Anderson', 'DataCamp Content Creator', 'Yashas Roy']","[('Automobile miles per gallon', 'https://assets.datacamp.com/production/repositories/628/datasets/3781d588cf7b04b1e376c7e9dda489b3e6c7465b/auto.csv'), ('Boston housing', 'https://assets.datacamp.com/production/repositories/628/datasets/021d4b9e98d0f9941e7bfc932a5787b362fafe3b/boston.csv'), ('Diabetes', 'https://assets.datacamp.com/production/repositories/628/datasets/444cdbf175d5fbf564b564bd36ac21740627a834/diabetes.csv'), ('Gapminder', 'https://assets.datacamp.com/production/repositories/628/datasets/a7e65287ebb197b1267b5042955f27502ec65f31/gm_2008_region.csv'), ('US Congressional Voting Records (1984)', 'https://assets.datacamp.com/production/repositories/628/datasets/35a8c54b79d559145bbeb5582de7a6169c703136/house-votes-84.csv'), ('White wine quality', 'https://assets.datacamp.com/production/repositories/628/datasets/2d9076606fb074c66420a36e06d7c7bc605459d4/white-wine.csv'), ('Red wine quality', 'https://assets.datacamp.com/production/repositories/628/datasets/013936d2700e2d00207ec42100d448c23692eb6f/winequality-red.csv')]","['Introduction to Python', 'Intermediate Python for Data Science', 'Statistical Thinking in Python (Part 1)']",https://www.datacamp.com/courses/supervised-learning-with-scikit-learn,Machine Learning,Python
266,Supply Chain Analytics in Python,4,16,48,"4,966","3,600",Supply Chain Analytics,"Supply Chain Analytics in Python
Supply Chain Analytics transforms supply chain activities from guessing, to ones that makes decision using data. An essential tool in Supply Chain Analytics is using optimization analysis to assist in decision making. According to Deloitte, 79% of organizations with high performing supply chains achieve revenue growth that is significantly above average. This course will introduce you to PuLP, a Linear Program optimization modeler written in Python. Using PuLP, the course will show you how to formulate and answer Supply Chain optimization questions such as where a production facility should be located, how to allocate production demand across different facilities, and more. We will explore the results of the models and their implications through sensitivity and simulation testing. This course will help you position yourself to improve the decision making of a supply chain by leveraging the power of Python and PuLP.
Linear Programming (LP) is a key technique for Supply Chain Optimization. The PuLP framework is an easy to use tool for working with LP problems and allows the programmer to focus on modeling. In this chapter we learn the basics of LP problems and start to learn how to use the PuLP framework to solve them.
In this chapter we continue to learn how to model LP and IP problems in PuLP. We touch on how to use PuLP for large scale problems. Additionally, we begin our case study example on how to solve the Capacitated Plant location model.
This chapter reviews some common mistakes made when creating constraints, and step through the process of solving the model. Once we have a solution to our LP model, how do we know if it is correct? In this chapter we also review a process for reasonableness checking or sanity checking the results. Furthermore, we continue working through our case study example on the Capacitated Plant location model by completing all the needed constraints.
In our final chapter we review sensitivity analysis of constraints through shadow prices and slack. Additionally, we look at simulation testing our LP models. These different techniques allow us to answer different business-related questions about our models, such as available capacity and incremental costs. Finally, we complete our case study exercise and focus on using sensitivity analysis and simulation testing to answer questions about our model.",[],"['Aaren Stubberfield', 'Hadrien Lacroix', 'Mari Nazary']",[],['pandas Foundations'],https://www.datacamp.com/courses/supply-chain-analytics-in-python,Machine Learning,Python
267,Support Vector Machines in R,4,13,47,"3,006","3,950",Support Vector Machines in R,"Support Vector Machines in R
This course will introduce a powerful classifier, the support vector machine (SVM) using an intuitive, visual approach. Support Vector Machines in R will help students develop an understanding of the SVM model as a classifier and gain practical experience using R’s libsvm implementation from the e1071 package. Along the way, students will gain an intuitive understanding of important concepts, such as hard and soft margins, the kernel trick, different types of kernels, and how to tune SVM parameters. Get ready to classify data with this impressive model.
This chapter introduces some key concepts of support vector machines through a simple 1-dimensional example. Students are also walked through the  creation of a linearly separable dataset that is used in the subsequent chapter.
Introduces students to the basic concepts of support vector machines by applying the svm algorithm to a dataset that is linearly separable. Key concepts are illustrated through ggplot visualisations that are built from the outputs of the algorithm and the role of the cost parameter is highlighted via a simple example.  The chapter closes with a section on how the algorithm deals with multiclass problems.
Provides an introduction to polynomial kernels via a dataset that is radially separable (i.e. has a circular decision boundary). After demonstrating the inadequacy of linear kernels for this dataset, students will see how a simple transformation renders the problem linearly separable thus motivating an intuitive discussion of the kernel trick. Students will then apply the polynomial kernel to the dataset and tune the resulting classifier.
Builds on the previous three chapters by introducing the highly flexible Radial Basis Function (RBF) kernel. Students will create a ""complex"" dataset that shows up the limitations of polynomial kernels. Then, following an intuitive motivation for the RBF kernel, students see how it addresses the shortcomings of the other kernels discussed in this course.",[],"['Kailash Awati', 'Chester Ismay', 'Becca Robins']",[],"['Introduction to R', 'Introduction to Machine Learning']",https://www.datacamp.com/courses/support-vector-machines-in-r,Machine Learning,R
268,Survey and Measurement Development in R,4,13,51,739,"4,450",Survey and Measurement Development in R,"Survey and Measurement Development in R
How can we measure something like “brand loyalty?” It’s an obvious concept of interest to marketers, but we can’t quite take a ruler to it. Instead, we can design and analyze a survey to indirectly measure such a so-called “latent construct.”
In this course, you’ll learn how to design and analyze a marketing survey to describe and even predict customers’ behavior based on how they rate items on “a scale of 1 to 5.” You’ll wrangle survey data, conduct exploratory & confirmatory factor analyses, and conduct various survey diagnostics such as checking for reliability and validity.
In this chapter we will explore the use of surveys in marketing research and the importance of reliability and validity in measurement. We will begin the scale development process and perform exploratory data analysis on freshly-collected survey data.
Now that we have cleaned and summarized our survey results, we will look for hidden patterns in the data using exploratory factor analysis. These patterns form the basis of  developing “factors” of an unobserved or latent variable. Of particular interest in this stage  of survey development is internal reliability, or whether similar items in the survey produce similar scores.
Does the data as collected agree with prior beliefs about the latent variable of interest? In this chapter, we will use confirmatory factor analysis to formally test the hypothesis that our model fits our data. We can now answer the question of construct validity, or: “are we really measuring what we are claiming to measure?”
In this chapter we will use our newly validated scale to test its relationship to demographic variables like age or spending habits. We will also confirm the reproducibility and replicability of the survey. Finally, we will explore the power of factor scores in modeling customer behavior.",[],"['George Mount', 'Chester Ismay']","[('Brand reputation survey', 'https://assets.datacamp.com/production/repositories/4494/datasets/59b5f2d717ddd647415d8c88aa40af6f89ed24df/brandrep-cleansurvey-extraitem.csv'), ('Brand quality survey', 'https://assets.datacamp.com/production/repositories/4494/datasets/f9c4329cd6efb129a70bcc357da9ec071f74dabb/brandquall11-recodedbutextraitem.csv'), ('Customer satisfaction survey', 'https://assets.datacamp.com/production/repositories/4494/datasets/e8d2b4b408bf9b463a73a86cc2c47938ea52a37f/customersatisfactionclean.csv'), ('Customer loyalty survey', 'https://assets.datacamp.com/production/repositories/4494/datasets/0bd08405d761924f47f775ea881eaeba7ca3497a/brandloyalty.csv')]","['Introduction to the Tidyverse', 'Correlation and Regression']",https://www.datacamp.com/courses/survey-and-measurement-development-in-r,Probability & Statistics,R
269,Survival Analysis in R,4,14,50,"2,992","3,650",Survival Analysis in R,"Survival Analysis in R
Do patients taking the new drug survive longer than others? How fast do people get a new job after getting unemployed? What can I do to make my friends stay on the dancefloor at my party? All these questions require the analysis of time-to-event data, for which we use special statistical methods. This course introduces basic concepts of time-to-event data analysis, also called survival analysis. Learn how to deal with time-to-event data and how to compute, visualize and interpret survivor curves as well as Weibull and Cox models.
In the first chapter, we introduce the concept of survival analysis, explain the importance of this topic, and provide a quick introduction to the theory behind survival curves. We discuss why special methods are needed when dealing with time-to-event data and introduce the concept of censoring. We also discuss how we describe the distribution of the elapsed time until an event.
In this chapter, we will look into different methods of estimating survival curves. We will discuss the Kaplan-Meier estimate and the Weibull model as tools for survival curve estimation and learn how to communicate those results through visualization.
In this chapter, we will learn how to estimate and visualize a Weibull model to learn about the effects of covariates on the time-to-event outcome.
In the last chapter, we learn how to compute and interpret Cox models to understand why they are useful and how they differ from Weibull models.",[],"['Heidi Seibold', 'Richie Cotton', 'David Campos', 'Shon Inouye']",[],['Introduction to Data'],https://www.datacamp.com/courses/survival-analysis-in-r,Probability & Statistics,R
270,Text Mining: Bag of Words,4,15,69,"31,353","5,700",Text Mining: Bag of Words,"Text Mining: Bag of Words
It is estimated that over 70% of potentially usable business information is unstructured, often in the form of text data. Text mining provides a collection of techniques that allows us to derive actionable insights from unstructured data. In this course, we explore the basics of text mining using the bag of words method. The first three chapters introduce a variety of essential topics for analyzing and visualizing text data. The final chapter allows you to apply everything you've learned in a real-world case study to extract insights from employee reviews of two major tech companies.
In this chapter, you'll learn the basics of using the bag of words method for analyzing text data.
This chapter will teach you how to visualize text data in a way that's both informative and engaging.
In this chapter, you'll learn more basic text mining techniques based on the bag of words method.
This chapter ties everything together with a case study in text mining for HR analytics.",['Text Mining with R'],"['Ted Kwartler', 'Nick Carchedi', 'Tom Jeon', 'Jeff Paadre']","[('Coffee tweets', 'https://assets.datacamp.com/production/repositories/19/datasets/27a2a8587eff17add54f4ba288e770e235ea3325/coffee.csv'), ('Chardonnay tweets', 'https://assets.datacamp.com/production/repositories/19/datasets/13ae5c66c3990397032b6428e50cc41ac6bc1ca7/chardonnay.csv'), ('Anonymous online reviews: Amazon', 'https://assets.datacamp.com/production/repositories/19/datasets/92c0a61dc0ad77799c8cd46bd6e56d9429eb5ea4/500_amzn.csv'), ('Anonymous online reviews: Google', 'https://assets.datacamp.com/production/repositories/19/datasets/c050b2c388dfe7e9a0478aa3f67dd0ba3c529d3e/500_goog.csv')]","['Introduction to R', 'Intermediate R']",https://www.datacamp.com/courses/intro-to-text-mining-bag-of-words,Case Studies,R
271,Time Series Analysis in SQL Server,5,16,60,507,"5,200",Time Series Analysis in SQL Server,"Time Series Analysis in SQL Server
SQL Server has a robust set of tools to prepare, aggregate, and query time series data. This course will show you how to build and work with dates, parse dates from strings (and deal with invalid strings), and format dates for reporting. From there, you will see how SQL Server's built-in aggregation operators and window functions can solve important business problems like calculating running totals, finding moving averages, and displaying month-over-month differences using realistic sample data sets. You will also see how taking a different perspective on your data can solve difficult problems.
This chapter covers date and time functionality in SQL Server, including building dates from component parts, formatting dates for reporting, and working with calendar tables.
Here, we'll be converting strings and other inputs to date and time data types.
In this chapter, we will learn techniques to aggregate data over time.  We will briefly review aggregation functions and statistical aggregation functions.  We will cover upsampling and downsampling of data.  Finally, we will look at the grouping operators.
In this chapter, we will learn how to use window functions to perform calculations over time, including calculating running totals and moving averages, calculating intervals, and finding the maximum levels of overlap.",[],"['Kevin Feasel', 'Mona Khalil', 'Chester Ismay']","[('Calendar', 'https://assets.datacamp.com/production/repositories/4472/datasets/5391fc5f6b8b3165066af186fca14dfe401920c0/CalendarTable.csv'), ('Day Spa Rollup', 'https://assets.datacamp.com/production/repositories/4472/datasets/66fb0b750ebb9db829469069b2fbbdd47f1e9716/DaySpaRollupTable.csv'), ('Day Spa Visit', 'https://assets.datacamp.com/production/repositories/4472/datasets/62403eef08d47d770f92f87fc0cd61fea0a1440b/DaySpaVisitTable.csv'), ('Incident Rollup', 'https://assets.datacamp.com/production/repositories/4472/datasets/b4ffce43f0c70ea3d66a0f4dfd9fffef918f346d/IncidentRollupTable.csv'), ('Incident Type', 'https://assets.datacamp.com/production/repositories/4472/datasets/4b2a284207da3fedde37cfabef9b97aed69f4800/IncidentTypeTable.csv'), ('Imported Time', 'https://assets.datacamp.com/production/repositories/4472/datasets/c838171b77873c8963d2038f9f423e48abaf430b/ImportedTimeTable.csv')]","['Intermediate SQL Server', 'Data-Driven Decision Making in SQL']",https://www.datacamp.com/courses/time-series-analysis-in-sql-server,Probability & Statistics,SQL
272,Time Series with data.table in R,4,14,52,"1,217","4,200",Time Series data.table in R,"Time Series with data.table in R
Time series data is fun, but challenging. When ordering matters, your datasets get large, and timestamp precision differences can foil your merges, building reliable data processing pipelines requires a principled approach with the right tools. Enter data.table! Its expressive syntax will make your code powerful without sacrificing readability and its support for in-place operations will make your code super fast. Learn how to master time series data in data.table with this course!
This chapter provides an overview of all the cool things that make data.table perfect for working with time series data, including its multiple column-selection options, how to modify data.tables by reference, and calling functions by taking advantage of non-standard evaluation.
Ok, so you have some time series data and you believe me that data.table is great for it. Before you can test that, you need to convert your data! In this chapter, you'll learn how to convert from popular time series data formats into data.table.
Like most other data, time series data you find in the wild are rarely suitable to directly start using in model training. In this chapter, you'll learn how to write powerful, expressive data.table code to implement a few common forms of time series feature engineering.
It's time to put it all together! In this chapter you'll consider a real-world dataset of spot metal prices from the London Metal Exchange (LME). By the end, you'll know how to write reusable functions to perform common time series feature engineering tasks and you'll have experience using those functions to build a statistical model.",[],"['James Lamb', 'Chester Ismay', 'Amy Peterson']","[('AluminumDF', 'https://assets.datacamp.com/production/repositories/2697/datasets/67928f50e5f54920ab4def5d721b085d815618c0/aluminumDF.feather'), ('CobaltDF', 'https://assets.datacamp.com/production/repositories/2697/datasets/323090e313a53345c7ec9dc7f70131a67b7994f8/cobaltDF.feather'), ('CopperDF', 'https://assets.datacamp.com/production/repositories/2697/datasets/d38498fe4a6826469b0d42ec7e0162a4eea38c9f/copperDF.feather'), ('DiagnosticDT', 'https://assets.datacamp.com/production/repositories/2697/datasets/e775f60b56a3f7df6d575ec9902697d0bd88fac9/diagnosticDT.feather'), ('DiagnosticDT2', 'https://assets.datacamp.com/production/repositories/2697/datasets/8ea94b5712d0e746c7dc5e208249c65cba25695f/diagnosticDT2.feather'), ('NickelDF', 'https://assets.datacamp.com/production/repositories/2697/datasets/1366ff50ef1a7ee4cacfc6ae78819b0b5b4e6afe/nickelDF.feather'), ('NickelXTS', 'https://assets.datacamp.com/production/repositories/2697/datasets/91957497fc6c691ac4afd601bfc52d2f0e56d455/nickelXTS.rds'), ('TinDF', 'https://assets.datacamp.com/production/repositories/2697/datasets/52b7d32368671e85013c325e2b0c37150bee41e5/tinDF.feather')]",['Data Manipulation in R with data.table'],https://www.datacamp.com/courses/time-series-with-datatable-in-r,Data Manipulation,R
273,Topic Modeling in R,4,14,49,"1,220","3,950",Topic Modeling in R,"Topic Modeling in R
This course introduces students to the areas involved in topic modeling: preparation of corpus, fitting of topic models using Latent Dirichlet Allocation algorithm (in package topicmodels), and visualizing the results using ggplot2 and wordclouds.
This chapter introduces the workflow used in topic modeling: preparation of a document-term matrix, model fitting, and visualization of results with ggplot2.
This chapter explains how to use join functions to remove or keep words in the document-term matrix, how to make wordcloud charts, and how to use some of the many control arguments.
This chapter goes into detail on how LDA topic models can be used as classifiers. It covers the importance of the Dirichlet shape parameter alpha, construction of word contexts for named entities using regex, and technical issues like corpus alignment and held-out data.
This chapter explains the basic methods used in the search for the optimal number of topics. It also covers how to use a single document as a source of data, and how topic numbering can be controlled using seed words.",[],"['Pavel Oleinikov', 'Hadrien Lacroix', 'Sascha Mayr']",[],"['Introduction to the Tidyverse', 'Text Mining: Bag of Words']",https://www.datacamp.com/courses/topic-modeling-in-r,Machine Learning,R
274,Transactions and Error Handling in SQL Server,4,14,52,80,"3,850",Transactions and Error Handling in SQL Server,"Transactions and Error Handling in SQL Server
It is critical to know how to handle errores and manage transactions when programming SQL scripts. Unhandled errores can be very harmful and can cause unexpected situations, such as inconsistent data in your database, o even worse, errors can lead you to make wrong business decisions.
In this course, you will learn how to handle errors and discover how to manage transactions in case of an error. Additionally, you will study what happens when two or more people interact at the same time with the same data. You will practice all these concepts using two datasets, one of them based on bank accounts and the other one on an electric bike store.
To begin the course, you will learn how to handle errors using the TRY...CATCH construct that provides T-SQL.  You will study the anatomy of errors, and you will learn how to use some functions that can give you information about errors.
In this chapter, you will deepen your knowledge of handling errors. You will learn how to raise errors using RAISERROR and THROW. Additionally, you will discover how to customize errors.
In this chapter, you will be introduced to the concept of transactions. You will discover how to commit and rollback them. You will finish by learning how to return the number of transactions and their state.
This chapter defines what concurrency is and how it can affect transactions. You will learn exciting concepts like dirty reads, repeatable reads, and phantom reads. To avoid or allow this reads, you will explore, one by one, the different transaction isolation levels.",[],"['Miriam Antona', 'Mona Khalil', 'Marianna Lamnina']","[('Electric bike store dataset and Bank accounts dataset', 'https://assets.datacamp.com/production/repositories/4772/datasets/bf4ec0185954908c812430ba38a1e4d6870bc35e/tehandling.sql')]",['Intro to SQL for Data Science'],https://www.datacamp.com/courses/transactions-and-error-handling-in-sql-server,Data Manipulation,SQL
275,Unit Testing for Data Science in Python,4,17,55,296,"4,200",Unit Testing Data Science,"Unit Testing for Data Science in Python
Every data science project needs unit testing. It comes with huge benefits - saving a lot of development and maintenance time, improving documentation, increasing end-user trust and reducing downtime of productive systems. As a result, unit testing has become a must-have skill in the industry, used by almost every company. This course teaches unit testing in Python using the most popular testing framework pytest. By the end of this course, you will have written a complete test suite for a data science project. In the process, you will learn to write unit tests for data preprocessors, models and visualizations, interpret test results and fix any buggy code. You will also learn advanced concepts like TDD, test organization, fixtures and mocking so that you can test your own data science projects properly.
In this chapter, you will get introduced to the pytest package and use it to write simple unit tests. You'll run the tests, interpret the test result reports and fix bugs. Throughout the chapter, we will use examples exclusively from the data preprocessing module of a linear regression project, making sure you learn unit testing in the context of data science.
In this chapter, you will write more advanced unit tests. Starting from testing complicated data types like NumPy arrays to testing exception handling, you'll do it all. Once you have mastered the science of testing, we will also focus on the arts. For example, we will learn how to find the balance between writing too many tests and too few tests.  In the last lesson, you will get introduced to a radically new programming methodology called Test Driven Development (TDD) and put it to practice. This might actually change the way you code forever!
In any data science project, you quickly reach a point when it becomes impossible to organize and manage unit tests. In this chapter, we will learn about how to structure your test suite well, how to effortlessly execute any subset of tests and how to mark problematic tests so that your test suite always stays green. The last lesson will even enable you to add the trust-inspiring build status and code coverage badges to your own project. Complete this chapter and become a unit testing wizard!
In this chapter, You will pick up advanced unit testing skills  like setup, teardown and mocking. You will also  learn  how to write sanity tests for your data science models and how to test matplotlib plots. By the end of this chapter, you will be ready to test real world data science projects!",[],"['Dibya Chakravorty', 'Hillary Green-Lerman', 'Hadrien Lacroix']",[],['Intermediate Python for Data Science'],https://www.datacamp.com/courses/unit-testing-for-data-science-in-python,Programming,Python
276,Unsupervised Learning in Python,4,13,52,"38,385","4,150",Unsupervised Learning,"Unsupervised Learning in Python
Say you have a collection of customers with a variety of characteristics such as age, location, and financial history, and you wish to discover patterns and sort them into clusters. Or perhaps you have a set of texts, such as wikipedia pages, and you wish to segment them into categories based on their content. This is the world of unsupervised learning, called as such because you are not guiding, or supervising, the pattern discovery by some prediction task, but instead uncovering hidden structure from unlabeled data. Unsupervised learning encompasses a variety of techniques in machine learning, from clustering to dimension reduction to matrix factorization. In this course, you'll learn the fundamentals of unsupervised learning and implement the essential algorithms using scikit-learn and scipy. You will learn how to cluster, transform, visualize, and extract insights from unlabeled datasets, and end the course by building a recommender system to recommend popular musical artists.
Learn how to discover the underlying groups (or ""clusters"") in a dataset.  By the end of this chapter, you'll be clustering companies using their stock market prices, and distinguishing different species by clustering their measurements.
In this chapter, you'll learn about two unsupervised learning techniques for data visualization, hierarchical clustering and t-SNE.  Hierarchical clustering merges the data samples into ever-coarser clusters, yielding a tree visualization of the resulting cluster hierarchy.  t-SNE maps the data samples into 2d space so that the proximity of the samples to one another can be visualized.
Dimension reduction summarizes a dataset using its common occuring patterns.  In this chapter, you'll learn about the most fundamental of dimension reduction techniques, ""Principal Component Analysis"" (""PCA"").  PCA is often used before supervised learning to improve model performance and generalization.  It can also be useful for unsupervised learning.  For example, you'll employ a variant of PCA will allow you to cluster Wikipedia articles by their content!
In this chapter, you'll learn about a dimension reduction technique called ""Non-negative matrix factorization"" (""NMF"") that expresses samples as combinations of interpretable parts.  For example, it expresses documents as combinations of topics, and images in terms of commonly occurring visual patterns.  You'll also learn to use NMF to build recommender systems that can find you similar articles to read, or musical artists that match your listening history!","['Data Scientist with Python', 'Machine Learning with Python']","['Benjamin Wilson', 'Yashas Roy', 'Hugo Bowne-Anderson']","[('Company stock price movements', 'https://assets.datacamp.com/production/repositories/655/datasets/1304e66b1f9799e1a5eac046ef75cf57bb1dd630/company-stock-movements-2010-2015-incl.csv'), ('Eurovision 2016', 'https://assets.datacamp.com/production/repositories/655/datasets/2a1f3ab7bcc76eef1b8e1eb29afbd54c4ebf86f2/eurovision-2016.csv'), ('Fish measurements', 'https://assets.datacamp.com/production/repositories/655/datasets/fee715f8cf2e7aad9308462fea5a26b791eb96c4/fish.csv'), ('Grains', 'https://assets.datacamp.com/production/repositories/655/datasets/bb87f0bee2ac131042a01307f7d7e3d4a38d21ec/Grains.zip'), ('LCD digits', 'https://assets.datacamp.com/production/repositories/655/datasets/effd1557b8146ab6e620a18d50c9ed82df990dce/lcd-digits.csv'), ('Musical artists', 'https://assets.datacamp.com/production/repositories/655/datasets/c974f2f2c4834958cbe5d239557fbaf4547dc8a3/Musical artists.zip'), ('Wikipedia articles', 'https://assets.datacamp.com/production/repositories/655/datasets/8e2fbb5b8240c06602336f2148f3c42e317d1fdb/Wikipedia articles.zip'), ('Wine', 'https://assets.datacamp.com/production/repositories/655/datasets/2b27d4c4bdd65801a3b5c09442be3cb0beb9eae0/wine.csv')]","['Introduction to Python', 'Intermediate Python for Data Science', 'Statistical Thinking in Python (Part 1)']",https://www.datacamp.com/courses/unsupervised-learning-in-python,Machine Learning,Python
277,Unsupervised Learning in R,4,16,49,"22,553","3,600",Unsupervised Learning in R,"Unsupervised Learning in R
Many times in machine learning, the goal is to find patterns in data without trying to make predictions. This is called unsupervised learning. One common use case of unsupervised learning is grouping consumers based on demographics and purchasing history to deploy targeted marketing campaigns. Another example is wanting to describe the unmeasured factors that most influence crime differences between cities. This course provides a basic introduction to clustering and dimensionality reduction in R from a machine learning perspective, so that you can get from data to insights as quickly as possible.

The k-means algorithm is one common approach to clustering. Learn how the algorithm works under the hood, implement k-means clustering in R, visualize and interpret the results, and select the number of clusters when it's not known ahead of time. By the end of the chapter, you'll have applied k-means clustering to a fun ""real-world"" dataset!
Hierarchical clustering is another popular method for clustering. The goal of this chapter is to go over how it works, how to use it, and how it compares to k-means clustering.
Principal component analysis, or PCA, is a common approach to dimensionality reduction. Learn exactly what PCA does, visualize the results of PCA with biplots and scree plots, and deal with practical issues such as centering and scaling the data before performing PCA.
The goal of this chapter is to guide you through a complete analysis using the unsupervised learning techniques covered in the first three chapters. You'll extend what you've learned by combining PCA as a preprocessing step to clustering using data that consist of measurements of cell nuclei of human breast masses.","['Data Scientist with R', 'Machine Learning Fundamentals in R', 'Unsupervised Machine Learning with R']","['Hank Roark', 'Nick Carchedi', 'Tom Jeon']","[('Pokemon data', 'https://assets.datacamp.com/production/course_6430/datasets/Pokemon.csv'), ('Wisconsin breast cancer data', 'https://assets.datacamp.com/production/course_6430/datasets/WisconsinCancer.csv')]",['Introduction to R'],https://www.datacamp.com/courses/unsupervised-learning-in-r,Machine Learning,R
278,Valuation of Life Insurance Products in R,4,17,55,"2,598","4,450",Valuation of Life Insurance Products in R,"Valuation of Life Insurance Products in R
Understanding the basic principles of life insurance products is essential for your personal financial planning, ranging from taking out a mortgage to designing your retirement plan and seeking financial protection for the risk of dying early. In this course, you'll study the time value of money and you’ll work with human mortality data to derive demographic markers (such as the life expectancy). Combining the basics of cash flow valuation with the calculation of survival and death probabilities in R will allow you to construct insightful tools to design life insurance products. You'll come out of this course understanding the valuation of life contingent claims: life annuities, which provide an income upon survival, and life insurance products, which pay a benefit upon death of the policyholder.
Learn the basics of cash flow vectors and their valuation with discount factors. You will then evaluate investments based on their net present value and build your own mortgage calculator. Finally, you will learn about fixed and variable interest rates; and annual and monthly rates.
Life tables play a vital role in life insurance products. In this chapter you will work with data sets from the Human Mortality Database by building meaningful visualizations to study the evolution of mortality data over age and time. You will also use survival and death probabilities.
You will now act as an actuary working in a life insurance company. Learn about the basics of life annuity products and their valuation by working on cases ranging from simple life annuities to retirement plans.
You will now deal with life insurance contracts. Learn how these products are relevant in your financial planning by designing whole life, temporary and endowment insurances.",[],"['Katrien Antonio', 'Roel Verbelen', 'Lore Dirick', 'Sumedh Panchadhar']","[('Life table for females in Belgium, 1999', 'https://assets.datacamp.com/production/repositories/747/datasets/f1809a4061ed4778943f94ed7bf2614065cf81cd/life_table_females_1999.csv'), ('Life table for females in Belgium, 1841 to 2015', 'https://assets.datacamp.com/production/repositories/747/datasets/ff5b3a0282a15debe07baf9e062f991552570243/life_table_females.csv')]","['Intermediate R', 'Foundations of Probability in R']",https://www.datacamp.com/courses/valuation-of-life-insurance-products-in-r,Applied Finance,R
279,Visualization Best Practices in R,4,13,49,"4,783","4,200",Visualization Best Practices in R,"Visualization Best Practices in R
This course will help you take your data visualization skills beyond the basics and hone them into a powerful member of your data science toolkit. Over the lessons we will use two interesting open datasets to cover different types of data (proportions, point-data, single distributions, and multiple distributions) and discuss the pros and cons of the most common visualizations. In addition, we will cover some less common alternatives visualizations for the data types and how to tweak default ggplot settings to most efficiently and effectively get your message across.
In this chapter, we focus on visualizing proportions of a whole; we see that pie charts really aren't so bad, along with discussing the waffle chart and stacked bars for comparing multiple proportions.
We shift our focus now to single-observation or point data and go over when bar charts are appropriate and when they are not, what to use when they are not, and general perception-based enhancements for your charts.
We now move on to visualizing distributional data, we expose the fragility of histograms, discuss when it is better to shift to a kernel density plots, and how to make both plots work best for your data.
Finishing off we take a look at comparing multiple distributions to each other. We see why the traditional box plots are very dangerous and how to easily improve them, along with investigating when you should use more advanced alternatives like the beeswarm plot and violin plots.","['Data Analyst with R', 'Data Scientist with R', 'Data Visualization with R']","['Nick Strayer', 'Chester Ismay', 'David Campos', 'Shon Inouye']","[('World Health Organization Disease Dataset', 'https://assets.datacamp.com/production/repositories/1864/datasets/71386124a72f58a50fbc07b8254f47ef9a867ebe/who_disease.csv')]","['Introduction to the Tidyverse', 'Data Visualization with ggplot2 (Part 1)']",https://www.datacamp.com/courses/visualization-best-practices-in-r,Data Visualization,R
280,Visualizing Big Data with Trelliscope,4,16,46,"2,284","3,450",Visualizing Big Data Trelliscope,"Visualizing Big Data with Trelliscope
Having honed your visualization skills by learning ggplot2, it's now time to tackle larger datasets. In this course, you will learn several techniques for visualizing big data, with particular focus on the scalable visualization technique of faceting. You will learn how to put this technique into action using the Trelliscope approach as implemented in the trelliscopejs R package. Trelliscope plugs seamlessly into standard R workflows and produces interactive visualizations that allow you to visually explore your data in detail. By the end of this course, you will be able to easily create interactive exploratory displays of large datasets that will help you and your colleagues gain new insights into your data.
Learn different strategies for plotting big data using ggplot2, including calculating and plotting summary statistics, various techniques to deal with overplotting, and principles of small multiples with faceting, which leads into Trelliscope.
In the previous chapter you saw how faceting can be used as a powerful technique for visualizing a lot of data that can be naturally partitioned in some meaningful way. Now, using the trelliscopejs package with ggplot2, you will learn how to create faceted visualizations when the number of partitions in the data becomes too large to effectively view in a single screen.
The ggplot2 + trelliscopejs interface is easy to use, but trelliscopejs also provides a faceted plotting mechanism that gives you much more flexibility in what plotting system you use and how to specify cognostics. You will learn all about that in this chapter!
The Montreal BIXI bike network provides open data for every bike ride, including the date, time, duration, and start and end stations of the ride. In this chapter, you will analyze data from over 4 million bike rides in 2017, going between 546 stations. There are many interesting exploratory questions to ask from this data and you will create exploratory visualizations ranging from summary statistics to detailed Trelliscope visualizations that will give you interesting insight into the data.","['Big Data with R', 'Interactive Data Visualization in R']","['Ryan Hafen', 'Yashas Roy', 'Richie Cotton', 'Benjamin  Feder']","[('Bikes', 'https://assets.datacamp.com/production/repositories/1165/datasets/7f0d3f4086c03ffaa6038ce1ef635a2be68c58fe/bike.rds'), ('Pokémon', 'https://assets.datacamp.com/production/repositories/1165/datasets/49699fd6c1cd6aa94238bb57364bd94b4feb2183/pokemon.Rdata'), ('Top 100 routes', 'https://assets.datacamp.com/production/repositories/1165/datasets/2ff2abb6e1034d1067b65727693b659c7689a599/route_hod.Rdata'), ('Stocks', 'https://assets.datacamp.com/production/repositories/1165/datasets/f0fdb9fbaa4e87ec0cf2162e11d19a22493c5f07/Stocks.zip'), ('Taxi rides', 'https://assets.datacamp.com/production/repositories/1165/datasets/ba85fc5a55118aa3d673baa3172f585b092bc44d/tx_sub.Rdata')]",['Introduction to the Tidyverse'],https://www.datacamp.com/courses/visualizing-big-data-with-trelliscope,Data Visualization,R
281,Visualizing Geospatial Data in Python,4,14,51,"4,135","4,250",Visualizing Geospatial Data,"Visualizing Geospatial Data in Python
One of the most important tasks of a data scientist is to understand the relationships between their data's physical location and their geographical context. In this course you'll be learning to make attractive visualizations of geospatial data with the GeoPandas package. You will learn to spatially join datasets, linking data to context. Finally you will learn to overlay geospatial data to maps to add even more spatial cues to your work. You will use several datasets from the City of Nashville's open data portal to find out where the chickens are in Nashville, which neighborhood has the most public art, and more!
In this chapter, you will learn how to create a two-layer map by first plotting regions from a shapefile and then plotting location points as a scatterplot.
You'll work with GeoJSON to create polygonal plots, learn about projections and coordinate reference systems, and get practice spatially joining data in this chapter.
First you will learn to get information about the geometries in your data with three different GeoSeries attributes and methods. Then you will learn to create a street map layer using folium.
In this chapter, you will learn about a special map called a choropleth. Then you will learn and practice building choropleths using two different packages: geopandas and folium.",['Data Visualization with Python'],"['Mary van Valkenburg', 'Greg Wilson', 'Adrián Soto']","[('Building permits issued in Nashville in 2017', 'https://assets.datacamp.com/production/repositories/2409/datasets/b8781d54c145e27ce43442bc6b327ac64158ebd6/building_permits_2017.csv'), ('Council district GIS data', 'https://assets.datacamp.com/production/repositories/2409/datasets/82a12239ce1fc2fc0abf1996f87bd13f86560ed4/council_districts.geojson.zip'), ('Nashville neighborhoods GIS data', 'https://assets.datacamp.com/production/repositories/2409/datasets/a534dea1e6a99373a5a8e9c2060ad8fb6b74e13c/neighborhoods.geojson'), ('Public artworks in Nashville', 'https://assets.datacamp.com/production/repositories/2409/datasets/284dd5ef16418d161af519d30ecc8471a23210ea/public_art.csv'), ('School district GIS data', 'https://assets.datacamp.com/production/repositories/2409/datasets/9ea668811fb71fa77ad29362ea5299f05ad150af/school_districts.geojson'), ('Schools in Nashville', 'https://assets.datacamp.com/production/repositories/2409/datasets/e0ac6a8e45e284e54a37388baa1957687882172d/schools.csv')]","['Manipulating DataFrames with pandas', 'Introduction to Data Visualization with Python']",https://www.datacamp.com/courses/visualizing-geospatial-data-in-python,Data Visualization,Python
282,Visualizing Time Series Data in Python,4,17,59,"7,080","4,850",Visualizing Time Series Data,"Visualizing Time Series Data in Python
Time series data is omnipresent in the field of Data Science. Whether it is analyzing business trends, forecasting company revenue or exploring customer behavior, every data scientist is likely to encounter time series data at some point during their work. To get you started on working with time series data, this course will provide practical knowledge on visualizing time series data using Python.
You will learn how to leverage basic plottings tools in Python, and how to annotate and personalize your time series plots. By the end of this chapter, you will be able to take any static dataset and produce compelling plots of your data.
In this chapter, you will gain a deeper understanding of your time series data by computing summary statistics and plotting aggregated views of your data.
You will go beyond summary statistics by learning about autocorrelation and partial autocorrelation plots. You will also learn how to automatically detect seasonality, trend and noise in your time series data.
In the field of Data Science, it is common to be involved in projects where multiple time series need to be studied simultaneously. In this chapter, we will show you how to plot multiple time series at once, and how to discover and describe relationships between multiple time series.
This chapter will give you a chance to practice all the concepts covered in the course. You will visualize the unemployment rate in the US from 2000 to 2010.",[],"['Thomas Vincent', 'Lore Dirick', 'Sumedh Panchadhar']","[('Inventions and Scientific Discoveries', 'https://assets.datacamp.com/production/repositories/1259/datasets/580f81133e1aa0803bdd2adda1670b4efff9778a/ch1_discoveries.csv'), ('CO2 levels in Mauai Hawaii', 'https://assets.datacamp.com/production/repositories/1259/datasets/88073ba2ec84224b732314d97b9862b049930f32/ch2_co2_levels.csv'), ('Airline passengers', 'https://assets.datacamp.com/production/repositories/1259/datasets/f8f1084c73032207abfc6b8a2499cab410bcfd62/ch3_airline_passengers.csv'), ('Production of meat in USA', 'https://assets.datacamp.com/production/repositories/1259/datasets/1e1cd4384a858ebd0a47d995bb0b963bfee09afd/ch4_meat.csv'), ('Unemployment rate in USA', 'https://assets.datacamp.com/production/repositories/1259/datasets/1c6b4a977a3c14f2a00c2d74694b208d9ac86443/ch5_employment.csv')]",['Intermediate Python for Data Science'],https://www.datacamp.com/courses/visualizing-time-series-data-in-python,Data Visualization,Python
283,Visualizing Time Series Data in R,4,11,45,"6,845","3,550",Visualizing Time Series Data in R,"Visualizing Time Series Data in R
As the saying goes, “A chart is worth a thousand words”. This is why visualization is the most used and powerful way to get a better understanding of your data. After this course you will have a very good overview of R time series visualisation capabilities and you will be able to better decide which model to choose for subsequent analysis. You will be able to also convey the message you want to deliver in an efficient and beautiful way.
This chapter will introduce you to basic R time series visualization tools.
Univariate plots are designed to learn as much as possible about the distribution, central tendency and spread of the data at hand. In this chapter you will be presented with some visual tools used to diagnose univariate times series.
What to do if you have to deal with multivariate time series? In this chapter, you will learn how to identify patterns in the distribution, central tendency and spread over pairs or groups of data.
Let's put everything you learned so far in practice! Imagine you already own a portfolio of stocks and you have some spare cash to invest, how can you wisely select a new stock to invest your additional cash? Analyzing the statistical properties of individual stocks vs. an existing portfolio is a good way of approaching the problem.","['Quantitative Analyst with R', 'Time Series with R']","['Arnaud Amsellem', 'Lore Dirick', 'Davis Vaughan']","[('Returns for XOM, C, MSFT, DOW, and YHOO', 'https://assets.datacamp.com/production/repositories/693/datasets/480b145d513b0d38603c1834730018b04be42d13/data_3_2.csv'), ('Existing portfolio', 'https://assets.datacamp.com/production/repositories/693/datasets/53ab351c3bb6559f4a4796a418a61a8a5452d4c3/data_4_1.csv'), ('Stock data for GS, KO, DIS, and CAT', 'https://assets.datacamp.com/production/repositories/693/datasets/f4da9eefa32e3fa6f680f758c33699e8d50fd7b1/data_4_3.csv'), ('Daily stocks for YHOO, MSFT, C, and DOW', 'https://assets.datacamp.com/production/repositories/693/datasets/ef3ee852cd87957da3cb4e65ee435c6e2f718966/dataset_1_1.csv'), ('Daily returns for Apple', 'https://assets.datacamp.com/production/repositories/693/datasets/a9ad5ea78e7c737bb58675cb41beeae08660ad92/dataset_2_1.csv'), ('Old versus new portfolio', 'https://assets.datacamp.com/production/repositories/693/datasets/393cf31248ddf4d68ed49fdcbfe0668eefe42d8b/old.vs.new.portfolio.csv')]","['Introduction to R', 'Intermediate R']",https://www.datacamp.com/courses/visualizing-time-series-data-in-r,Data Visualization,R
284,Web Scraping in Python,4,17,56,"11,344","4,500",Web Scraping,"Web Scraping in Python
The ability to build tools capable of retrieving and parsing information stored across the internet has been and continues to be valuable in many veins of data science. In this course, you will learn to navigate and parse html code, and build tools to crawl websites automatically. Although our scraping will be conducted using the versatile Python library scrapy, many of the techniques you learn in this course can be applied to other popular Python libraries as well, including BeautifulSoup and Selenium. Upon the completion of this course, you will have a strong mental model of html structure, will be able to build tools to parse html code and access desired information, and create a simple scrapy spiders to crawl the web at scale.
Learn the structure of HTML. We begin by explaining why web scraping can be a valuable addition to your data science toolbox and then delving into some basics of HTML. We end the chapter by giving a brief introduction on XPath notation, which is used to navigate the elements within HTML code.
Leverage XPath syntax to explore scrapy selectors. Both of these concepts will move you towards being able to scrape an HTML document.
Learn CSS Locator syntax and begin playing with the idea of chaining together CSS Locators with XPath. We also introduce Response objects, which behave like Selectors but give us extra tools to mobilize our scraping efforts across multiple websites.
Learn to create web crawlers with scrapy. These scrapy spiders will crawl the web through multiple pages, following links to scrape each of those pages automatically according to the procedures we've learned in the previous chapters.",[],"['Thomas Laetsch', 'David Campos', 'Mari Nazary', 'Shon Inouye']","[('DataCamp webpage HTML', 'https://assets.datacamp.com/production/repositories/2560/datasets/77f93b8128b3bdbaf2b50e7b72158738014af515/all.html')]",['Intermediate Python for Data Science'],https://www.datacamp.com/courses/web-scraping-with-python,Other,Python
285,Winning a Kaggle Competition in Python,4,16,52,771,"4,200",Winning a Kaggle Competition,"Winning a Kaggle Competition in Python
Kaggle is the most famous platform for Data Science competitions. Taking part in such competitions allows you to work with real-world datasets, explore various machine learning problems, compete with other participants and, finally, get invaluable hands-on experience. In this course, you will learn how to approach and structure any Data Science competition. You will be able to select the correct local validation scheme and to avoid overfitting. Moreover, you will master advanced feature engineering together with model ensembling approaches. All these techniques will be practiced on Kaggle competitions datasets.
In this first chapter, you will get exposure to the Kaggle competition process. You will train a model and prepare a csv file ready for submission. You will learn the difference between Public and Private test splits, and how to prevent overfitting.
Now that you know the basics of Kaggle competitions, you will learn how to study the specific problem at hand. You will practice EDA and get to establish correct local validation strategies. You will also learn about data leakage.
You will now get exposure to different types of features. You will modify existing features and create new ones. Also, you will treat the missing data accordingly.
Time to bring everything together and build some models! In this last chapter, you will build a base model before tuning some hyperparameters and improving your results with ensembles. You will then get some final tips and tricks to help you compete more efficiently.",[],"['Yauhen Babakhin', 'Hillary Green-Lerman', 'Hadrien Lacroix']","[('Demand forecasting (train)', 'https://assets.datacamp.com/production/repositories/4443/datasets/795eac18dd7e3da75a93806e68ca3a483acbceb3/demand_forecasting_train_1_month.csv'), ('Demand forecasting (test)', 'https://assets.datacamp.com/production/repositories/4443/datasets/61ee170244ca7f1cf60deac07298ba4c796297b3/demand_forecasting_test.csv'), ('House prices (train)', 'https://assets.datacamp.com/production/repositories/4443/datasets/40af41a3b8739d0ac4b3f9f85ee43630ecbe7f0c/house_prices_train.csv'), ('House prices (test)', 'https://assets.datacamp.com/production/repositories/4443/datasets/c605c221cc37d16c2eee118a616f83e3479af17d/house_prices_test.csv'), ('Taxi rides (train)', 'https://assets.datacamp.com/production/repositories/4443/datasets/1abe6ab7c7c0e880a2f6febcae946a33a9ef5e31/taxi_train_chapter_4.csv'), ('Taxi rides (test)', 'https://assets.datacamp.com/production/repositories/4443/datasets/f52d553ec3d3a8b469d58d20bf70a1b401765a30/taxi_test_chapter_4.csv')]","['Manipulating DataFrames with pandas', 'Supervised Learning with scikit-learn', 'Extreme Gradient Boosting with XGBoost']",https://www.datacamp.com/courses/winning-a-kaggle-competition-in-python,Machine Learning,Python
286,Working with Data in the Tidyverse,4,17,56,"7,122","4,500",Working Data in Tidyverse,"Working with Data in the Tidyverse
In this course, you'll learn to work with data using tools from the tidyverse in R. By data, we mean your own data, other people's data, messy data, big data, small data - any data with rows and columns that comes your way! By work, we mean doing most of the things that sound hard to do with R, and that need to happen before you can analyze or visualize your data. But work doesn't mean that it is not fun - you will see why so many people love working in the tidyverse as you learn how to explore, tame, tidy, and transform your data. Throughout this course, you'll work with data from a popular television baking competition called ""The Great British Bake Off.""
You will start this course by learning how to read data into R. We'll begin with the readr package, and use it to read in data files organized in rows and columns. In the rest of the chapter, you'll learn how to explore your data using tools to help you view, summarize, and count values effectively. You'll see how each of these steps gives you more insights into your data.
In this chapter, you will learn some basics of data taming, like how to tame your variable types, names, and values.
Now that your data has been tamed, it is time to get tidy. In this chapter, you will get hands-on experience tidying data and combining multiple tidying functions together in a chain using the pipe operator.
In this chapter, you will learn how to tame specific types of variables that are known to be tricky to work with, such as dates, strings, and factors.","['Data Manipulation with R', 'Tidyverse Fundamentals with R']","['Alison Hill', 'Chester Ismay', 'Yashas Roy', 'Benjamin  Feder']","[('Bakeoff', 'https://assets.datacamp.com/production/repositories/1613/datasets/53cf6583aa659942b787897319a1ac053cbcfa5a/bakeoff.csv'), ('Bakers', 'https://assets.datacamp.com/production/repositories/1613/datasets/c92971cb2daa3c38f7fc2a11a8fa9e07bfe2a750/Bakers.zip'), ('Desserts', 'https://assets.datacamp.com/production/repositories/1613/datasets/9c481970428acefdcfd4263ab4745ac98ab23406/Desserts.zip'), ('Ratings', 'https://assets.datacamp.com/production/repositories/1613/datasets/21f9c5e1787e59c3475946db2a36befe7e75763b/Ratings.zip')]",['Introduction to the Tidyverse'],https://www.datacamp.com/courses/working-with-data-in-the-tidyverse,Data Manipulation,R
287,Working with Dates and Times in Python,4,14,48,"1,283","4,100",Working Dates and Times,"Working with Dates and Times in Python
You'll probably never have a time machine, but how about a machine for analyzing time? As soon as time enters any analysis, things can get weird. It's easy to get tripped up on day and month boundaries, time zones, daylight saving time, and all sorts of other things that can confuse the unprepared. If you're going to do any kind of analysis involving time, you’ll want to use Python to sort it out. Working with data sets on hurricanes and bike trips, we’ll cover counting events, figuring out how much time has elapsed between events and plotting data over time. You'll work in both standard Python and in Pandas, and we'll touch on the dateutil library, the only timezone library endorsed by the official Python documentation. After this course, you'll confidently handle date and time data in any format like a champion.
Hurricanes (also known as cyclones or typhoons) hit the U.S. state of Florida several times per year. To start off this course, you'll learn how to work with date objects in Python, starting with the dates of every hurricane to hit Florida since 1950. You'll learn how Python handles dates, common date operations, and the right way to format dates to avoid confusion.
Bike sharing programs have swept through cities around the world -- and luckily for us, every trip gets recorded! Working with all of the comings and goings of one bike in Washington, D.C., you'll practice working with dates and times together. You'll parse dates and times from text, analyze peak trip times, calculate ride durations, and more.
In this chapter, you'll learn to confidently tackle the time-related topic that causes people the most trouble: time zones and daylight saving. Continuing with our bike data, you'll learn how to compare clocks around the world, how to gracefully handle ""spring forward"" and ""fall back,"" and how to get up-to-date timezone data from the dateutil library.
To conclude this course, you'll apply everything you've learned about working with dates and times in standard Python to working with dates and times in Pandas. With additional information about each bike ride, such as what station it started and stopped at and whether or not the rider had a yearly membership, you'll be able to dig much more deeply into the bike trip data. In this chapter, you'll cover powerful Pandas operations, such as grouping and plotting results by time.",[],"['Max Shron', 'Chester Ismay', 'Sumedh Panchadhar']","[('Florida Hurricanes', 'https://assets.datacamp.com/production/repositories/3551/datasets/60b0dadeb28c0433e772835bd142809f8da3e8aa/florida_hurricane_dates.pkl'), ('W20529 Bike Data (Capital Bikeshare)', 'https://assets.datacamp.com/production/repositories/3551/datasets/181c142c56d3b83112dfc16fbd933fd995e80f94/capital-onebike.csv')]","['Intermediate Python for Data Science', 'pandas Foundations']",https://www.datacamp.com/courses/working-with-dates-and-times-in-python,Programming,Python
288,Working with Dates and Times in R,4,14,48,"9,997","4,000",Working Dates and Times in R,"Working with Dates and Times in R
Dates and times are abundant in data and essential for answering questions that  start with when, how long, or how often.  However, they can be tricky, as they come in a variety of formats and can behave in unintuitive ways. This course teaches you the essentials of parsing, manipulating, and computing with  dates and times in R. By the end, you'll have mastered the lubridate package, a member of the tidyverse, specifically designed to handle dates and times.  You'll also have applied your new skills to explore how often R versions are released,  when the weather is good in Auckland (the birthplace of R), and how long monarchs ruled in Britain.
R doesn't know something is a date or time unless you tell it.  In this  chapter you'll learn about some of the ways R stores dates and times by exploring how often R versions are released, and how quickly people  download them.  You'll also get a sneak peek at what you'll learn in the  following chapters.
Dates and times come in a huge assortment of formats, so your first hurdle is often to parse the format you have into an R datetime. This chapter  teaches you to import dates and times with the lubridate package. You'll also learn how to extract parts of a datetime. You'll practice by exploring the weather in R's birthplace, Auckland NZ.
Getting datetimes into R is just the first step.  Now that you know how to parse datetimes, you need to learn how to do calculations with them.  In this chapter, you'll learn the different ways of representing spans of time with lubridate and how to leverage them to do arithmetic on datetimes. By the end of the chapter, you'll have calculated how long it's been since the first man stepped on the moon, generated sequences of dates to help schedule reminders, calculated when an eclipse occurs, and explored the reigns of monarch's of England (and which ones might have seen Halley's comet!).
You now know most of what you need to tackle data that includes dates and  times, but there are a few other problems you might encounter in practice. In this final chapter you'll learn a little more about these problems by returning to some of the earlier data examples and learning how to  handle time zones, deal with times when you don't care about dates, parse dates quickly, and output dates and times.",[],"['Charlotte Wickham', 'Yashas Roy', 'Richie Cotton']","[('Auckland daily weather', 'https://assets.datacamp.com/production/repositories/1435/datasets/f6590278193112325a874cb69cb94d7fbca5732f/akl_weather_daily.csv'), ('Auckland hourly weather', 'https://assets.datacamp.com/production/repositories/1435/datasets/0b0ef636e2a69936ef8236108bfd2261647f5e9e/akl_weather_hourly_2016.csv'), ('R releases', 'https://assets.datacamp.com/production/repositories/1435/datasets/603b5835e87ce55c22221491406854cecf213898/rversions.csv'), ('Cran logs', 'https://assets.datacamp.com/production/repositories/1435/datasets/120bd627a2cd712a240a340970ac92783d4c6c6d/cran-logs_2015-04-17.csv')]","['Introduction to R', 'Intermediate R']",https://www.datacamp.com/courses/working-with-dates-and-times-in-r,Programming,R
289,Working with Geospatial Data in Python,4,16,53,"1,300","4,500",Working Geospatial Data,"Working with Geospatial Data in Python
A good proportion of the data out there in the real world is inherently spatial. From the population recorded in the national census, to every shop in your neighborhood, the majority of datasets have a location aspect that you can exploit to make the most of what they have to offer. This course will show you how to integrate spatial data into your Python Data Science workflow. You will learn how to interact with, manipulate and augment real-world data using their geographic dimension. You will learn to read tabular spatial data in the most common formats (e.g. GeoJSON, shapefile, geopackage) and visualize them in maps. You will then combine different sources using their location as the bridge that puts them in relation to each other. And, by the end of the course, you will be able to understand what makes geographic data unique, allowing you to transform and repurpose them in different contexts.
In this chapter, you will be introduced to the concepts of geospatial data, and more specifically of vector data. You will then learn how to represent such data in Python using the GeoPandas library, and the basics to read, explore and visualize such data. And you will exercise all this with some datasets about the city of Paris.
One of the key aspects of geospatial data is how they relate to each other in space. In this chapter, you will learn the different spatial relationships, and how to use them in Python to query the data or to perform spatial joins. Finally, you will also learn in more detail about choropleth visualizations.
In this chapter, we will take a deeper look into how the coordinates of the geometries are expressed based on their Coordinate Reference System (CRS). You will learn the importance of those reference systems and how to handle it in practice with GeoPandas. Further, you will also learn how to create new geometries based on the spatial relationships, which will allow you to overlay spatial datasets. And you will further practice this all with Paris datasets!
In this final chapter, we leave the Paris data behind us, and apply everything we have learnt up to now on a brand new dataset about artisanal mining sites in Eastern Congo. Further, you will still learn some new spatial operations, how to apply custom spatial operations, and you will get a sneak preview into raster data.",[],"['Joris Van den Bossche', 'Dani Arribas-Bel', 'Mari Nazary', 'Sara Billen']","[('Paris', 'https://assets.datacamp.com/production/repositories/2561/datasets/a7b32a9713836c1704826fcd07118b1973acac71/Paris.zip'), ('Mining', 'https://assets.datacamp.com/production/repositories/2561/datasets/7f2354029361bbd61cfad7ce58bb6625ca3abe7a/Mines.zip')]",['Manipulating DataFrames with pandas'],https://www.datacamp.com/courses/working-with-geospatial-data-in-python,Data Manipulation,Python
290,Working with Geospatial Data in R,4,15,58,"13,016","5,000",Working Geospatial Data in R,"Working with Geospatial Data in R

We'll dive in by displaying some spatial data -- property sales in a small US town -- using ggplot2 and we'll introduce you to the ggmap package as a quick way to add spatial context to your plots.  We'll talk about what makes spatial data special and introduce you to the common types of spatial data we'll be working with throughout the course.
You can get a long way with spatial data stored in data frames, but it makes life easier if they are stored in special spatial objects.  In this chapter we'll introduce you to the spatial object classes provided by the sp package, particularly for point and polygon data. You'll learn how to explore and subset these objects by exploring a world map. The reward for learning about these object classes:  we'll show you the package tmap which requires spatial objects as input, but makes creating maps really easy! You'll finish up by making a map of the world's population.
While the sp package provides some classes for raster data, the raster package provides more useful classes.  You'll be introduced to these classes and their advantages and then learn to display them. The examples continue with the theme of population from Chapter 2, but you'll look at some much finer detail datasets, both spatially and demographically. In the second half of the chapter you'll learn about color -- an essential part of any visual display, but especially important for maps.
In this chapter you'll follow the creation of a visualization from raw spatial data files to adding a credit to a map.  Along the way, you'll learn how to read spatial data into R, more about projections and coordinate reference systems, how to add additional data to a spatial object, and some tips for polishing your maps.",['Spatial Data with R'],"['Charlotte Wickham', 'Nick Carchedi', 'Tom Jeon', 'Sumedh Panchadhar']","[('House sales in Corvallis, 2015', 'https://assets.datacamp.com/production/repositories/577/datasets/37c804888938af0c7fb75b5d3e6b79644408c82f/01_corv_sales.rds'), ('Ward sales in Corvallis, 2015', 'https://assets.datacamp.com/production/repositories/577/datasets/15d7dd47e2351816d09cd58cb9c4f2cb96cd9020/01_corv_wards.rds'), ('Predicted house prices in Corvallis', 'https://assets.datacamp.com/production/repositories/577/datasets/9e82e5a56b0f1bff5c2284457f63f602f6761776/01_corv_predicted_grid.rds'), ('Countries (sp object)', 'https://assets.datacamp.com/production/repositories/577/datasets/6ccbe8c9b77d9f1df16f5af465f307f3fbcb895c/02_countries_sp.rds'), ('Countries (spdf object)', 'https://assets.datacamp.com/production/repositories/577/datasets/7235b249722c927c3a072a379999e4e7c613945f/02_countries_spdf.rds'), ('Population around the Boston and NYC areas', 'https://assets.datacamp.com/production/repositories/577/datasets/b79401780627b36256295f55ca2a9df77ade463b/03-population.rds'), ('Population around the Boston and NYC areas (Broken into different age groups)', 'https://assets.datacamp.com/production/repositories/577/datasets/ea0a6d830890bf2ad479ab4c96dd4e67cb471aed/03-population-by-age.rds'), ('Population around the Boston and NYC areas (Proportion by age)', 'https://assets.datacamp.com/production/repositories/577/datasets/0f0b6424ce8f258c063ea1e917f5c5fa39790b85/03-proportion-by-age.rds'), ('Migration', 'https://assets.datacamp.com/production/repositories/577/datasets/b2c22cac2adcb32ff008dce32aaba0226f98fa2b/03_migration.rds'), ('Neighborhood Tabulation Areas', 'https://assets.datacamp.com/production/repositories/577/datasets/14ecbe220af38b4f7f822839fc201ed611783d25/04_nynta_16c.rds'), ('Median Income data', 'https://assets.datacamp.com/production/repositories/577/datasets/dee5bd33444a738a3a6b99ba6fd8e859d2eb05df/04_income_grid.rds'), ('NYC tracts data', 'https://assets.datacamp.com/production/repositories/577/datasets/a7695322f13dd30a86f67b224e9686b368d4c108/04_nyc_tracts.rds'), ('Water bodies in NYC', 'https://assets.datacamp.com/production/repositories/577/datasets/6ac97b9137e47385a4c826440848b58784aa14ff/04_water_big.rds'), ('NYC Income data', 'https://assets.datacamp.com/production/repositories/577/datasets/9cfb8fde849888c5948de6c84bb024c913afe52e/04_nyc_income.rds')]","['Introduction to R', 'Data Visualization with ggplot2 (Part 1)']",https://www.datacamp.com/courses/working-with-geospatial-data-in-r,Data Visualization,R
291,Working with Web Data in R,4,16,56,"9,952","4,500",Working Web Data in R,"Working with Web Data in R
Most of the useful data in the world, from economic data to news content to geographic information, lives somewhere on the internet - and this course will teach you how to access it.

You'll explore how to work with APIs (computer-readable interfaces to websites), access data from Wikipedia and other sources, and build your own simple API client. For those occasions where APIs are not available, you'll find out how to use R to scrape information out of web pages. In the process you'll learn how to get data out of even the most stubborn website, and how to turn it into a format ready for further analysis. The packages you'll use and learn your way around are rvest, httr, xml2 and jsonlite, along with particular API client packages like WikipediR and pageviews.
Sometimes getting data off the internet is very, very simple - it's stored in a format that R can handle and just lives on a server somewhere, or it's in a more complex format and perhaps part of an API but there's an R package designed to make using it a piece of cake. This chapter will explore how to download and read in static files, and how to use APIs when pre-existing clients are available.
If an API client doesn't exist, it's up to you to communicate directly with the API. But don't worry, the package `httr` makes this really straightforward. In this chapter you'll learn how to make web requests from R, how to examine the responses you get back and some best practices for doing this in a responsible way.
Sometimes data is a TSV or nice plaintext output. Sometimes it's XML and/or JSON. This chapter walks you through what JSON and XML are, how to convert them into R-like objects, and how to extract data from them. You'll practice by examining the revision history for a Wikipedia article retrieved from the Wikipedia API using httr, xml2 and jsonlite.
Now that we've covered the low-hanging fruit (""it has an API, and a client"", ""it has an API"") it's time to talk about what to do when a website doesn't have any access mechanisms at all - when you have to rely on web scraping. This chapter will introduce you to the rvest web-scraping package, and  build on your previous knowledge of XML manipulation and XPATHs.
CSS path-based web scraping is a far-more-pleasant alternative to using XPATHs.  You'll start this chapter by learning about CSS, and how to leverage it for web scraping.   Then, you'll work through a final case study that combines everything you've learnt so  far to write a function that queries an API, parses the response and returns data in a nice form.",[],"['Charlotte Wickham', 'Oliver Keyes', 'Richie Cotton']",[],"['Introduction to R', 'Intermediate R']",https://www.datacamp.com/courses/working-with-web-data-in-r,Importing & Cleaning Data,R
292,Writing Efficient Python Code,4,15,53,"4,528","4,050",Writing Efficient Python Code,"Writing Efficient Python Code
As a Data Scientist, the majority of your time should be spent gleaning actionable insights from data -- not waiting for your code to finish running. Writing efficient Python code can help reduce runtime and save computational resources, ultimately freeing you up to do the things you love as a Data Scientist. In this course, you'll learn how to use Python's built-in data structures, functions, and modules to write cleaner, faster, and more efficient code. We'll explore how to time and profile code in order to find bottlenecks. Then, you'll practice eliminating these bottlenecks, and other bad design patterns, using Python's Standard Library, NumPy, and pandas. After completing this course, you'll have the necessary tools to start writing efficient Python code!
In this chapter, you'll learn what it means to write efficient Python code. You'll explore Python's Standard Library, learn about NumPy arrays, and practice using some of Python's built-in tools.  This chapter builds a foundation for the concepts covered ahead.
In this chapter, you will learn how to gather and compare runtimes between different coding approaches.  You'll practice using the line_profiler and memroy_profiler packages to profile your code base and spot bottlenecks. Then, you'll put your learnings to practice by replacing these bottlenecks with efficient Python code.
This chapter covers more complex efficiency tips and tricks. You'll learn a few useful built-in modules for writing efficient code and practice using set theory.  You'll then learn about looping patterns in Python and how to make them more efficient.
This chapter offers a brief introduction on how to efficiently work with pandas DataFrames. You'll learn the various options you have for iterating over a DataFrame. Then, you'll learn how to efficiently apply functions to data stored in a DataFrame.",[],"['Logan Thomas', 'Chester Ismay', 'Becca Robins']","[('Baseball statistics', 'https://assets.datacamp.com/production/repositories/3581/datasets/779033fb8fb5021aee9ff46253980abcbc5851f3/baseball_stats.csv')]","['Data Types for Data Science', 'Python Data Science Toolbox (Part 1)', 'Python Data Science Toolbox (Part 2)']",https://www.datacamp.com/courses/writing-efficient-python-code,Programming,Python
293,Writing Efficient R Code,4,14,43,"12,335","3,100",Writing Efficient R Code,"Writing Efficient R Code
The beauty of R is that it is built for performing data analysis. The downside is that sometimes R can be slow, thereby obstructing our analysis. For this reason, it is essential to become familiar with the main techniques for speeding up your analysis, so you can reduce computational time and get insights as quickly as possible.
In order to make your code go faster, you need to know how long it takes to run. This chapter introduces the idea of benchmarking your code.
R is flexible because you can often solve a single problem in many different ways. Some ways can be several orders of magnitude faster than the others. This chapter teaches you how to write fast base R code.
Profiling helps you locate the bottlenecks in your code. This chapter teaches you how to visualize the bottlenecks using the `profvis` package.
Some problems can be solved faster using multiple cores on your machine. This chapter shows you how to write R code that runs in parallel.","['Big Data with R', 'R Programmer', 'R Programming']","['Colin Gillespie', 'Tom Jeon', 'Richie Cotton']","[('Information on 45,000 movies', 'https://assets.datacamp.com/production/repositories/749/datasets/1aeccb767f937190a2b5120b10a161c56ede34f7/movies.rds')]","['Introduction to R', 'Intermediate R']",https://www.datacamp.com/courses/writing-efficient-r-code,Programming,R
294,Writing Functions and Stored Procedures in SQL Server,4,16,57,"3,653","4,700",Writing Functions and Stored Procedures in SQL Server,"Writing Functions and Stored Procedures in SQL Server
Take your SQL Server programming to the next level. First, we demystify how to manipulate datetime data by performing temporal exploratory data analysis with the Washington DC BikeShare transactional dataset. Then, you’ll master how to create, update, and execute user-defined functions and stored procedures. You will learn the proper context for each modular programming tool and best practices. In the final chapter, you will apply all of your new skills to solve a real-world business case identifying the New York City yellow taxi utilization for each borough, and which pickup locations should be scheduled for each driver shift.
Learn how to do effective exploratory data analysis on temporal data, create scalar and table variables to store data, and learn how to execute date manipulation. This chapter will also cover the following SQL functions: DATEDIFF( ), DATENAME( ), DATEPART( ), CAST( ), CONVERT( ), GETDATE( ), and DATEADD( ).
This chapter will explain how to create, update, and execute user-defined functions (UDFs). You will learn about the various types of UDFs: scalar, inline, and multi-statement table-valued. You’ll also learn best practices.
Learn how to create, update, and execute stored procedures. Investigate the differences between stored procedures and user defined functions, including appropriate scenarios for each.
Apply your new skills in temporal EDA, user-defined functions, and stored procedures to solve a business case problem. Analyze the New York City taxi ride dataset to identify average fare per distance, ride count, and total ride time for each borough on each day of the week. And which pickup locations within the borough should be scheduled for each driver shift?",[],"['Meghan Kwartler', 'Chester Ismay', 'Amy Peterson']",[],['Intermediate SQL Server'],https://www.datacamp.com/courses/writing-functions-and-stored-procedures-in-sql-server,Programming,SQL
295,Writing Functions in Python,4,15,46,"1,082","3,650",Writing Functions,"Writing Functions in Python
You've done your analysis, built your report, and trained a model. What's next? Well, if you want to deploy your model into production, your code will need to be more reliable than exploratory scripts in a Jupyter notebook. Writing Functions in Python will give you a strong foundation in writing complex and beautiful functions so that you can contribute research and engineering skills to your team. You'll learn useful tricks, like how to write context managers and decorators. You'll also learn best practices around how to write maintainable reusable functions with good documentation. They say that people who can do good research and write high-quality code are unicorns. Take this course and discover the magic!
The goal of this course is to transform you into a Python expert, and so the first chapter starts off with best practices when writing functions. We'll cover docstrings and why they matter and how to know when you need to turn a chunk of code into a function. You will also learn the details of how Python passes arguments to functions, as well as some common gotchas that can cause debugging headaches when calling functions.
If you've ever seen the ""with"" keyword in Python and wondered what its deal was, then this is the chapter for you! Context managers are a convenient way to provide connections in Python and guarantee that those connections get cleaned up when you are done using them. This chapter will show you how to use context managers, as well as how to write your own.
Decorators are an extremely powerful concept in Python. They allow you to modify the behavior of a function without changing the code of the function itself. This chapter will lay the foundational concepts needed to thoroughly understand decorators (functions as objects, scope, and closures), and give you a good introduction into how decorators are used and defined. This deep dive into Python internals will set you up to be a superstar Pythonista.
Now that you understand how decorators work under the hood, this chapter gives you a bunch of real-world examples of when and how you would write decorators in your own code. You will also learn advanced decorator concepts like how to preserve the metadata of your decorated functions and how to write decorators that take arguments.",[],"['Shayne Miel', 'Hillary Green-Lerman', 'Becca Robins']",[],"['Python Data Science Toolbox (Part 1)', 'Python Data Science Toolbox (Part 2)']",https://www.datacamp.com/courses/writing-functions-in-python,Programming,Python
296,pandas Foundations,4,15,62,"116,620","5,150",pandas Foundations,"pandas Foundations
pandas DataFrames are the most widely used in-memory representation of complex data collections within Python. Whether in finance, a scientific field, or data science, familiarity with pandas is essential. This course teaches you to work with real-world datasets containing both string and numeric data, often structured around time series. You will learn powerful analysis, selection, and visualization techniques in this course.
In this chapter, you will be introduced to pandas DataFrames. You will use pandas to import and inspect a variety of datasets, ranging from population data obtained from the World Bank to monthly stock data obtained via Yahoo Finance. You will also practice building DataFrames from scratch and become familiar with the intrinsic data visualization capabilities of pandas.
Now that you’ve learned how to ingest and inspect your data, you will next learn how to explore it visually and quantitatively. This process, known as exploratory data analysis (EDA), is a crucial component of any data science project. pandas has powerful methods that help with statistical and visual EDA. In this chapter, you will learn how and when to apply these techniques.
In this chapter, you will learn how to manipulate and visualize time series data using pandas. You will become familiar with concepts such as upsampling, downsampling, and interpolation. You will practice using method chaining to efficiently filter your data and perform time series analyses. From stock prices to flight timings, time series data can be found in a wide variety of domains, and being able to effectively work with it is an invaluable skill.
Working with real-world weather and climate data, this chapter will allow you to apply all of the skills you have acquired in this course. You will use pandas to manipulate the data into a usable form for analysis and systematically explore it using the techniques you’ve learned.","['Data Analyst with Python', 'Data Manipulation with Python', 'Data Scientist with Python', 'Python Programmer']","['Team Anaconda', 'Yashas Roy', 'Hugo Bowne-Anderson']","[('1981-2010 NOAA Austin Climate Normals', 'https://assets.datacamp.com/production/repositories/497/datasets/5cac0469c3898e93392343c535d5c96c37ebd3c6/NOAA_QCLCD_2011_hourly_13904.txt'), ('July 2015 Austin airport departures (Southwest Airlines)', 'https://assets.datacamp.com/production/repositories/497/datasets/5b808399816c8dcb8eef08336595ef9b4eb22902/austin_airport_departure_data_2015_july.csv'), ('Automobile miles per gallon', 'https://assets.datacamp.com/production/repositories/497/datasets/3ed265bae13db503890f98663c33ac16a041e7a3/auto-mpg.csv'), ('Life expectancy at birth (Gapminder)', 'https://assets.datacamp.com/production/repositories/497/datasets/162a52b5c1991182d67391cf650bfffb33a47f54/life_expectancy_at_birth.csv'), ('Stock data (messy)', 'https://assets.datacamp.com/production/repositories/497/datasets/4e8cdfbf9e125bb723981f9218bee16194c7d869/messy_stock_data.tsv'), (""Percentage of bachelor's degrees awarded to women in the USA"", 'https://assets.datacamp.com/production/repositories/497/datasets/5f4f1a9bab95fba4d7fea1ad3c30dcab8f5b9c96/percent-bachelors-degrees-women-usa.csv'), ('Tips', 'https://assets.datacamp.com/production/repositories/497/datasets/5d401fe7e4c270da0c3b2f651c73c4a5a8771490/tips.csv'), ('Titanic', 'https://assets.datacamp.com/production/repositories/497/datasets/e280ed94bf4539afb57d8b1cbcc14bcf660d3c63/titanic.csv'), ('2010 Austin weather', 'https://assets.datacamp.com/production/repositories/497/datasets/4d7b2bc6b10b527dc297707fb92fa46b10ac1be5/weather_data_austin_2010.csv'), ('World Bank World Development Indicators', 'https://assets.datacamp.com/production/repositories/497/datasets/2175fef4b3691db03449bbc7ddffb740319c1131/world_ind_pop_data.csv'), ('World population', 'https://assets.datacamp.com/production/repositories/497/datasets/cc1de7b583ec7eb6196df30845d32241fe15d643/world_population.csv')]","['Introduction to Python', 'Intermediate Python for Data Science']",https://www.datacamp.com/courses/pandas-foundations,Data Manipulation,Python