-
Notifications
You must be signed in to change notification settings - Fork 1
/
datacamp_courses.csv
We can't make this file beautiful and searchable because it's too large.
1874 lines (1850 loc) · 723 KB
/
datacamp_courses.csv
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
,title,hours,videos,exercises,participants,xp,clean,descriptions,tracks,instructors,datasets,prereqs,link,topic,tech
0,,4,0,62,"1,317,429","6,200",,,[],[],[],[],https://www.datacamp.com/courses/free-introduction-to-r,,
1,,4,11,57,"1,656,629","4,700",,,[],[],[],[],https://www.datacamp.com/courses/intro-to-python-for-data-science,,
2,,4,0,35,"31,090","3,500",,,[],[],[],[],https://www.datacamp.com/courses/importing-cleaning-data-in-r-case-studies,,
3,,4,0,41,"361,803","3,450",,,[],[],[],[],https://www.datacamp.com/courses/intro-to-sql-for-data-science,,
4,A/B Testing in R,4,16,60,"3,203","4,700",A/B Testing in R,"A/B Testing in R
In this course, you will learn the foundations of A/B testing, including hypothesis testing, experimental design, and confounding variables. You will also be exposed to a couple more advanced topics, sequential analysis and multivariate testing. The first dataset will be a generated example of a cat adoption website. You will investigate if changing the homepage image affects conversion rates (the percentage of people who click a specific button). For the remainder of the course you will use another generated dataset of a hypothetical data visualization website.
Short case study on building and analyzing an A/B experiment.
In this chapter we'll continue with our case study, now moving to our statistical analysis. We'll also discuss how to do follow-up experiment planning.
In this chapter we'll dive deeper into the core concepts of A/B testing. This will include discussing A/B testing research questions, assumptions and types of A/B testing, as well as what confounding variables and side effects are.
In the final chapter we'll go over more types of statistical tests and power analyses for different A/B testing designs. We'll also introduce the concepts of stopping rules, sequential analysis, and multivariate analysis.",[],"['Page Piccinini', 'Chester Ismay', 'David Campos', 'Shon Inouye']","[('Click dataset', 'https://assets.datacamp.com/production/repositories/2292/datasets/4407050e9b8216249a6d5ff22fd67fd4c44e7301/click_data.csv'), ('Experiment dataset', 'https://assets.datacamp.com/production/repositories/2292/datasets/52b52cb1ca28ce10f9a09689325c4d94d889a6da/experiment_data.csv'), ('Data Visualization Website - April 2018', 'https://assets.datacamp.com/production/repositories/2292/datasets/b502094e5de478105cccea959d4f915a7c0afe35/data_viz_website_2018_04.csv')]","['Intermediate R', 'Foundations of Inference', 'Experimental Design in R']",https://www.datacamp.com/courses/ab-testing-in-r,Probability & Statistics,R
5,ARIMA Modeling with R,4,13,45,"16,735","3,600",ARIMA Modeling R,"ARIMA Modeling with R
In this course, you will become an expert in fitting ARIMA models to time series data using R. First, you will explore the nature of time series data using the tools in the R stats package. Next, you learn how to fit various ARMA models to simulated data (where you will know the correct model) using the R package astsa. Once you have mastered the basics, you will learn how to fit integrated ARMA models, or ARIMA models to various real data sets. You will learn how to check the validity of an ARIMA model and you will learn how to forecast time series data. Finally, you will learn how to fit ARIMA models to seasonal data, including forecasting using the astsa package.
You will investigate the nature of time series data and learn the basics of ARMA models that can explain the behavior of such data. You will learn the basic R commands needed to help set up raw time series data to a form that can be analyzed using ARMA models.
You will discover the wonderful world of ARMA models and how to fit these models to time series data. You will learn how to identify a model, how to choose the correct model, and how to verify a model once you fit it to data. You will learn how to use R time series commands from the stats and astsa packages.
Now that you know how to fit ARMA models to stationary time series, you will learn about integrated ARMA (ARIMA) models for nonstationary time series. You will fit the models to real data using R time series commands from the stats and astsa packages.
You will learn how to fit and forecast seasonal time series data using seasonal ARIMA models. This is accomplished using what you learned in the previous chapters and by learning how to extend the R time series commands available in the stats and astsa packages.","['Quantitative Analyst with R', 'Time Series with R']","['David Stoffer', 'Lore Dirick', 'Matt Isaacs']",[],"['Introduction to R', 'Intermediate R', 'Introduction to Time Series Analysis']",https://www.datacamp.com/courses/arima-modeling-with-r,Probability & Statistics,R
6,Advanced Deep Learning with Keras in Python,4,13,46,"6,620","3,950",Advanced Deep Learning Keras,"Advanced Deep Learning with Keras in Python
This course shows you how to solve a variety of problems using the versatile Keras functional API. You will start with simple, multi-layer dense networks (also known as multi-layer perceptrons), and continue on to more complicated architectures. The course will cover how to build models with multiple inputs and a single output, as well as how to share weights between layers in a model. We will also cover advanced topics such as category embeddings and multiple-output networks. If you've ever wanted to train a network that does both classification and regression, then this course is for you!
In this chapter, you'll become familiar with the basics of the Keras functional API. You'll build a simple functional network using functional building blocks, fit it to data, and make predictions.
In this chapter, you will build two-input networks that use categorical embeddings to represent high-cardinality data, shared layers to specify re-usable building blocks, and merge layers to join multiple inputs to a single output. By the end of this chapter, you will have the foundational building blocks for designing neural networks with complex data flows.
In this chapter, you will extend your 2-input model to 3 inputs, and learn how to use Keras' summary and plot functions to understand the parameters and topology of your neural networks. By the end of the chapter, you will understand how to extend a 2-input model to 3 inputs and beyond.
In this chapter, you will build neural networks with multiple outputs, which can be used to solve regression problems with multiple targets. You will also build a model that solves a regression problem and a classification problem simultaneously.",[],"['Zachary Deane-Mayer', 'Sumedh Panchadhar']","[('Basketball data', 'https://assets.datacamp.com/production/repositories/2189/datasets/78cfc4f848041e10a64e72a9cd4f0a8e6f80ab21/basketball_data.zip'), ('Basketball models', 'https://assets.datacamp.com/production/repositories/2189/datasets/87408a711961f0d640f7c31faa9cfbf8248e6a23/basketball_models.zip')]",['Deep Learning in Python'],https://www.datacamp.com/courses/advanced-deep-learning-with-keras-in-python,Machine Learning,Python
7,Advanced Dimensionality Reduction in R,4,16,51,846,"4,300",Advanced Dimensionality Reduction in R,"Advanced Dimensionality Reduction in R
Dimensionality reduction techniques are based on unsupervised machine learning algorithms and their application offers several advantages. In this course you will learn how to apply dimensionality reduction techniques to exploit these advantages, using interesting datasets like the MNIST database of handwritten digits, the fashion version of MNIST released by Zalando, and a credit card fraud detection dataset. Firstly, you will have a look at t-SNE, an algorithm that performs non-linear dimensionality reduction. Then, you will also explore some useful characteristics of dimensionality reduction to apply in predictive models. Finally, you will see the application of GLRM to compress big data (with numerical and categorical values) and impute missing values. Are you ready to start compressing high dimensional data?
Are you ready to become a master of dimensionality reduction?
In this chapter, you'll start by understanding how to represent handwritten digits using the MNIST dataset. You will learn what a distance metric is and which ones are the most common, along with the problems that arise with the curse of dimensionality.
Finally, you will compare the application of PCA and t-SNE .
Now, you will learn how to apply the t-Distributed Stochastic Neighbour Embedding (t-SNE) algorithm. After finishing this chapter, you will understand the different hyperparameters that have an impact on your results and how to optimize them. Finally, you will do something really cool: compute centroids prototypes of each digit to classify other digits.
In this chapter, you'll apply t-SNE to train predictive models faster. This is one of the many advantages of dimensionality reduction. You will learn how to train a random forest with the original features and with the embedded features and compare them. You will also apply t-SNE to understand the patterns learned by a neural network. And all of this using a real credit card fraud dataset!
In the final chapter, you will practice another useful dimensionality reduction algorithm: GLRM. Here you will make use of the Fashion MNIST data to classify clothes, impute missing data and also train random forests using the low dimensional embedding.",['Unsupervised Machine Learning with R'],"['Federico Castanedo', 'Chester Ismay', 'Sara Billen']","[('MNIST sample', 'https://assets.datacamp.com/production/repositories/1680/datasets/68b37d6c5f7f6768d5e11796687993b6f3da1f72/mnist-sample-200.RData'), ('Credit card fraud', 'https://assets.datacamp.com/production/repositories/1680/datasets/5b6c593225dc1f417f82822cb5fce83887890f4a/creditcard.RData'), ('Fashion MNIST sample', 'https://assets.datacamp.com/production/repositories/1680/datasets/8d19bc657cc5b03e9b368eb6d3ff0527c50d184d/fashion_mnist_500.RData')]",['Dimensionality Reduction in R'],https://www.datacamp.com/courses/advanced-dimensionality-reduction-in-r,Machine Learning,R
8,Advanced NLP with spaCy,5,15,55,"4,517","4,450",Advanced NLP spaCy,"Advanced NLP with spaCy
If you're working with a lot of text, you'll eventually want to know more about it. For example, what's it about? What do the words mean in context? Who is doing what to whom? What companies and products are mentioned? Which texts are similar to each other? In this course, you'll learn how to use spaCy, a fast-growing industry standard library for NLP in Python, to build advanced natural language understanding systems, using both rule-based and machine learning approaches.
This chapter will introduce you to the basics of text processing with spaCy. You'll learn about the data structures, how to work with statistical models, and how to use them to predict linguistic features in your text.
In this chapter, you'll use your new skills to extract specific information from large volumes of text. You'll learn how to make the most of spaCy's data structures, and how to effectively combine statistical and rule-based approaches for text analysis.
This chapter will show you to everything you need to know about spaCy's processing pipeline. You'll learn what goes on under the hood when you process a text, how to write your own components and add them to the pipeline, and how to use custom attributes to add your own meta data to the documents, spans and tokens.
In this chapter, you'll learn how to update spaCy's statistical models to customize them for your use case – for example, to predict a new entity type in online comments. You'll write your own training loop from scratch, and understand the basics of how training works, along with tips and tricks that can make your custom NLP projects more successful.",[],"['Ines Montani', 'Mari Nazary', 'Adrián Soto']",[],['Natural Language Processing Fundamentals in Python'],https://www.datacamp.com/courses/advanced-nlp-with-spacy,Data Manipulation,Python
9,Analyzing Business Data in SQL,4,15,46,"4,241","3,700",Analyzing Business Data in SQL,"Analyzing Business Data in SQL
Businesses track data on everything, from operations to marketing to HR. Leveraging this data enables businesses to better understand themselves and their customers, leading to higher profits and better performance. In this course, you’ll learn about the key metrics that businesses use to measure performance. You'll write SQL queries to calculate these metrics and produce report-ready results. Throughout the course, you'll study data from a fictional food delivery startup, modeled on data from real companies.
Profit is one of the first things people use to assess a company's success. In this chapter, you'll learn how to calculate revenue and cost, and then combine the two calculations using Common Table Expressions to calculate profit.
Financial KPIs like profit are important, but they don't speak to user activity and engagement. In this chapter, you'll learn how to calculate the registrations and active users KPIs, and use window functions to calculate the user growth and retention rates.
Since a KPI is a single number, it can't describe how data is distributed. In this chapter, you'll learn about unit economics, histograms, bucketing, and percentiles, which you can use to spot the variance in user behaviors.
Executives often use the KPIs you've calculated in the three previous chapters to guide business decisions. In this chapter, you'll package the KPIs you've created into a readable report you can present to managers and executives.",[],"['Michel Semaan', 'Mona Khalil', 'Sara Billen']",[],['Intermediate SQL'],https://www.datacamp.com/courses/analyzing-business-data-in-sql,Case Studies,SQL
10,Analyzing Election and Polling Data in R,4,15,55,"3,189","4,650",Analyzing Election and Polling Data in R,"Analyzing Election and Polling Data in R
This is an introductory course to the R programming language as applied in the context of political data analysis. In this course students learn how to wrangle, visualize, and model data with R by applying data science techniques to real-world political data such as public opinion polling and election results. The tools that you'll use in this course, from the dplyr, ggplot2, and choroplethr packages, among others, are staples of data science and can be used to analyze almost any dataset you get your hands on. Students will learn how to mutate columns and filter datasets, graph points and lines on charts, make maps, and create models to understand relationships between variables and predict the future. This course is suitable for anyone who already has downloaded R and knows the basics, like how to install packages.
Chapter one uses a dataset of job approval polling for US presidents since Harry Truman to introduce you to data wrangling and visualization in the tidyverse.
In this chapter, you will embark on a historical analysis of ""generic ballot"" US House polling and use data visualization and modeling to answer two big questions: Has the country changed over time? Can polls predict elections?
This chapter teaches you how to make maps and understand linear regression in R. With election results from the United States and the United Kingdom, you'll also learn how to use regression models to analyze the relationship between two (or more!) variables.
In this ensemble of applied statistics and data analysis, you will wrangle, visualize, and model polling and prediction data for two sets of very important US elections: the 2018 House midterms and 2020 presidential election.",[],"['G Elliott Morris', 'Chester Ismay', 'David Campos', 'Shon Inouye']","[('Brexit Polls', 'https://assets.datacamp.com/production/course_6778/datasets/brexit_polls.csv'), ('Brexit Results', 'https://assets.datacamp.com/production/course_6778/datasets/brexit_results.csv'), ('Gallup Approval Polls', 'https://assets.datacamp.com/production/course_6778/datasets/gallup_approval_polls.csv'), ('Generic Ballot', 'https://assets.datacamp.com/production/course_6778/datasets/generic_ballot.csv'), ('US Pres 2016 by County', 'https://assets.datacamp.com/production/course_6778/datasets/us_pres_2016_by_county.csv')]",['Introduction to the Tidyverse'],https://www.datacamp.com/courses/analyzing-election-and-polling-data-in-r,Case Studies,R
11,Analyzing IoT Data in Python,4,16,53,843,"4,250",Analyzing IoT Data,"Analyzing IoT Data in Python
Have you ever heard about Internet of Things devices? Of course, you have. Maybe you also have a Raspberry PI in your house monitoring the temperature and humidity.
IoT devices are everywhere around us, collecting data about our environment.
You will be analyzing Environmental data, Traffic data as well as energy counter data.
Following the course, you will learn how to collect and store data from a data stream.
You will prepare IoT data for analysis, analyze and visualize IoT data, before implementing
a simple machine learning model to take action when certain events occur and deploy this model
to a real-time data stream.
In this chapter, you will first understand what IoT data is.
Then, you learn how to aquire IoT data through a REST API and using an MQTT data stream to collect data in real time.
In the second chapter, you will look at the data you gathered during the first chapter. You will visualize the data and learn the importance of timestamps when dealing with data streams. You will also implement caching to an MQTT data stream.
In this chapter, you will combine multiple datasoures with different time intervals.
You will then analyze the data to detect correlations, outliers and trends.
In this final chapter, you will use the data you analyzed during the previous chapters to build a machine learning pipeline. You will then learn how to implement this pipeline into a data stream to make realtime predictions.",[],"['Matthias Voppichler', 'Hadrien Lacroix', 'Hillary Green-Lerman']","[('Environment', 'https://assets.datacamp.com/production/repositories/4296/datasets/b7afe8572732cb5c8e6687c7e0c51216983a2b12/environ_MS83200MS_nowind_3m-10min.json'), ('Traffic (heavy vehicles)', 'https://assets.datacamp.com/production/repositories/4296/datasets/2ab899af51dac636796c5b3fce2f27f69010a6d6/traffic_raw_siemens_heavy-veh.json'), ('Traffic (light vehicles)', 'https://assets.datacamp.com/production/repositories/4296/datasets/d4eae90c370fa6912c035323737c9fc276792514/traffic_raw_siemens_light-veh.json')]",['pandas Foundations'],https://www.datacamp.com/courses/analyzing-iot-data-in-python,Data Manipulation,Python
12,Analyzing Marketing Campaigns with pandas,4,14,53,"1,878","4,500",Analyzing Marketing Campaigns pandas,"Analyzing Marketing Campaigns with pandas
One of the biggest challenges when studying data science technical skills is understanding how those skills and concepts translate into real jobs. Whether you're looking to level up in your marketing job by incorporating Python and pandas or you're trying to get a handle on what kinds of work a data scientist in a marketing organization might do, this course is a great match for you. We'll practice translating common business questions into measurable outcomes, including ""How did this campaign perform?"", ""Which channel is referring the most subscribers?"", ""Why is a particular channel underperforming?"" and more using a fake marketing dataset based on the data of an online subscription business. This course will build on Python and pandas fundamentals, such as merging/slicing datasets, groupby(), correcting data types and visualizing results using matplotlib.
In this chapter, you will review pandas basics including importing datasets, exploratory analysis, and basic plotting.
In this chapter, you will learn about common marketing metrics and how to calculate them using pandas. You will also visualize your results and practice user segmentation.
In this chapter, you will build functions to automate common marketing analysis and determine why certain marketing channels saw lower than usual conversion rates during late January.
In this chapter, you will analyze an A/B test and learn about the importance of segmentation when interpreting the results of a test.",[],"['Jill Rosok', 'Mona Khalil', 'Sumedh Panchadhar']","[('Marketing dataset 1', 'https://assets.datacamp.com/production/repositories/3879/datasets/bdbbd97f839ef5cafebcc15363201d0e7b08881a/marketing.csv'), ('Marketing dataset 2', 'https://assets.datacamp.com/production/repositories/3879/datasets/6d86195bbc39785128d437e26a14ffb7cf68f9dc/marketing_new.csv')]","['Intermediate Python for Data Science', 'pandas Foundations']",https://www.datacamp.com/courses/analyzing-marketing-campaigns-with-pandas,Case Studies,Python
13,Analyzing Police Activity with pandas,4,16,50,"11,246","4,100",Analyzing Police Activity pandas,"Analyzing Police Activity with pandas
Now that you have learned the foundations of pandas, this course will give you the chance to apply that knowledge by answering interesting questions about a real dataset! You will explore the Stanford Open Policing Project dataset and analyze the impact of gender on police behavior. During the course, you will gain more practice cleaning messy data, creating visualizations, combining and reshaping datasets, and manipulating time series data. Analyzing Police Activity with pandas will give you valuable experience analyzing a dataset from start to finish, preparing you for your data science career!
Before beginning your analysis, it is critical that you first examine and clean the dataset, to make working with it a more efficient process. In this chapter, you will practice fixing data types, handling missing values, and dropping columns and rows while learning about the Stanford Open Policing Project dataset.
Does the gender of a driver have an impact on police behavior during a traffic stop? In this chapter, you will explore that question while practicing filtering, grouping, method chaining, Boolean math, string methods, and more!
Are you more likely to get arrested at a certain time of day? Are drug-related stops on the rise? In this chapter, you will answer these and other questions by analyzing the dataset visually, since plots can help you to understand trends in a way that examining the raw data cannot.
In this chapter, you will use a second dataset to explore the impact of weather conditions on police behavior during traffic stops. You will practice merging and reshaping datasets, assessing whether a data source is trustworthy, working with categorical data, and other advanced skills.","['Data Analyst with Python', 'Data Manipulation with Python', 'Data Scientist with Python']","['Kevin Markham', 'Becca Robins', 'Sara Snell']","[('Traffic stops in Rhode Island', 'https://assets.datacamp.com/production/repositories/1497/datasets/62bd9feef451860db02d26553613a299721882e8/police.csv'), ('Weather in Providence, Rhode Island', 'https://assets.datacamp.com/production/repositories/1497/datasets/02f3fb2d4416d3f6626e1117688e0386784e8e55/weather.csv')]","['pandas Foundations', 'Manipulating DataFrames with pandas', 'Merging DataFrames with pandas']",https://www.datacamp.com/courses/analyzing-police-activity-with-pandas,Data Manipulation,Python
14,Analyzing Social Media Data in Python,4,14,51,"5,121","4,000",Analyzing Social Media Data,"Analyzing Social Media Data in Python
Twitter produces hundreds of million messages per day, with people around the world discussing sports, politics, business, and entertainment. You can access thousands of messages flowing in this stream in a matter of minutes. In this course, you will learn how to collect Twitter data and analyze tweet text, Twitter networks, and the geographical origin of the tweet. We'll be doing this with datasets on tech companies, data science hashtags, and the 2018 State of the Union address. Using these methods, you will be able to inform business and political decision-making by discovering the prevalence of important topics, the diversity of discussion networks, and a topic's geographical reach.
Why analyze Twitter, how to access Twitter APIs, and understanding Twitter JSON.
How to process Twitter text.
Network analysis with Twitter data.
How to map Twitter data.",[],"['Alex Hanna', 'Greg Wilson', 'Kara Woo', 'David Campos', 'Shon Inouye', 'Eunkyung Park']","[('Data Science Hashtag dataset', 'https://assets.datacamp.com/production/repositories/2161/datasets/43d85d27d293323c1a4effec25717d0c2eb43169/data-science-hashtags.csv'), ('State of the Union Reply Network dataset', 'https://assets.datacamp.com/production/repositories/2161/datasets/55860c218310485e9400997ae33aecd0e97f8b51/sotu2018-reply.csv'), ('State of the Union Retweet Networking dataset', 'https://assets.datacamp.com/production/repositories/2161/datasets/51e79668580cdb86969c2c625172eaed2ded684a/sotu2018-rt.csv')]",['pandas Foundations'],https://www.datacamp.com/courses/analyzing-social-media-data-in-python,Data Manipulation,Python
15,Analyzing Survey Data in R,4,14,49,"3,211","3,950",Analyzing Survey Data in R,"Analyzing Survey Data in R
You've taken a survey (or 1000) before, right? Have you ever wondered what goes into designing a survey and how survey responses are turned into actionable insights? Of course you have! In Analyzing Survey Data in R, you will work with surveys from A to Z, starting with common survey design structures, such as clustering and stratification, and will continue through to visualizing and analyzing survey results. You will model survey data from the National Health and Nutrition Examination Survey using R's survey and tidyverse packages. Following the course, you will be able to successfully interpret survey results and finally find the answers to life's burning questions!
Our exploration of survey data will begin with survey weights. In this chapter, we will learn what survey weights are and why they are so important in survey data analysis. Another unique feature of survey data are how they were collected via clustering and stratification. We'll practice specifying and exploring these sampling features for several survey datasets.
Now that we have a handle of survey weights, we will practice incorporating those weights into our analysis of categorical data in this chapter. We'll conduct descriptive inference by calculating summary statistics, building summary tables, and constructing bar graphs. For analytic inference, we will learn to run chi-squared tests.
Of course not all survey data are categorical and so in this chapter, we will explore analyzing quantitative survey data. We will learn to compute survey-weighted statistics, such as the mean and quantiles. For data visualization, we'll construct bar-graphs, histograms and density plots. We will close out the chapter by conducting analytic inference with survey-weighted t-tests.
To model survey data also requires careful consideration of how the data were collected. We will start our modeling chapter by learning how to incorporate survey weights into scatter plots through aesthetics such as size, color, and transparency. We'll model the survey data with linear regression and will explore how to incorporate categorical predictors and polynomial terms into our models.",[],"['Kelly McConville', 'Chester Ismay', 'Becca Robins', 'Eunkyung Park']","[('Quarter 4 of the 2016 BLS Consumer Expenditure Survey', 'https://assets.datacamp.com/production/repositories/1932/datasets/54e81635756ae4b5a0207b661586c108e6dc5566/ce.csv')]","['Introduction to the Tidyverse', 'Foundations of Inference']",https://www.datacamp.com/courses/analyzing-survey-data-in-r,Probability & Statistics,R
16,Analyzing US Census Data in Python,5,16,57,"1,179","4,850",Analyzing US Census Data,"Analyzing US Census Data in Python
Data scientists in diverse fields, from marketing to public health to civic hacking, need to work with demographic and socioeconomic data. Government census agencies offer richly detailed, high-quality datasets, but the number of variables and intricacies of administrative geographies (what is a Census tract anyway?) can make approaching this goldmine a daunting process. This course will introduce you to the Decennial Census and the annual American Community Survey, and show you where to find data on household income, commuting, race, family structure, and other topics that may interest you. You will use Python to request this data using the Census API for large and small geographies. You will manipulate the data using pandas, and create derived data such as a measure of segregation. You will also get a taste of the mapping capabilities of geopandas.
Start exploring Census data products with the Decennial Census. Use the Census API and the requests package to retrieve data, load into pandas data frames, and conduct exploratory visualization in seaborn. Learn about important Census geographies, including states, counties, and tracts.
Explore topics such as health insurance coverage and gentrification using the American Community Survey. Calculate Margins of Error and explore change over time. Create choropleth maps using geopandas.
Explore racial segregation in America. Calculate the Index of Dissimilarity, and important measure of segregation. Learn about and use Metropolitan Statistical Areas, and important geography for urban research. Study segregation changes over time in Chicago.
In this chapter, you will apply what you have learned to four topical studies. Explore unemployment by race and ethnicity; commuting patterns and worker density; immigration and state-to-state population flows; and rent burden in San Francisco.",[],"['Lee Hachadoorian', 'Mari Nazary', 'Adrián Soto']","[('Hispanic Origin & Race by State, 2010', 'https://assets.datacamp.com/production/repositories/2155/datasets/68d8a7bcd8383e4e631d561d6ddc9cf61aa74d6b/states.csv'), ('Household Internet Access by State, 2017', 'https://assets.datacamp.com/production/repositories/2155/datasets/3be5b05dd02bff25b4f5efdb22d1aa1777fe799e/states_internet.gpkg'), ('Brooklyn Tract Demographics, 2000', 'https://assets.datacamp.com/production/repositories/2155/datasets/75a53f4cafd31c368d147e0b64755d74d18cff66/tracts_brooklyn_2000.pickle'), ('Brooklyn Tract Geometries, 2000', 'https://assets.datacamp.com/production/repositories/2155/datasets/5246046c1acde7183c46fe07925437d1d6c43382/brooklyn_tract_2000.gpkg'), ('Brooklyn Tract Demographics, 2010', 'https://assets.datacamp.com/production/repositories/2155/datasets/75c9d53f047b5e8e69307d22a2b3f0069760c7da/tracts_brooklyn_2010.pickle'), ('Brooklyn Tract Geometries, 2010', 'https://assets.datacamp.com/production/repositories/2155/datasets/cafe61e927146e7c0e655bdacc1647243b62dc84/brooklyn_tract_2010.gpkg')]","['Intermediate Python for Data Science', 'pandas Foundations']",https://www.datacamp.com/courses/analyzing-us-census-data-in-python,Case Studies,Python
17,Analyzing US Census Data in R,4,17,59,"1,722","5,050",Analyzing US Census Data in R,"Analyzing US Census Data in R
Analysts across industries rely on data from the United States Census Bureau in their work. In this course, students will learn how to work with Census tabular and spatial data in the R environment. The course focuses on the tidycensus package for acquiring data from the decennial US Census and American Community survey in a tidyverse-friendly format, and the tigris package for accessing Census geographic data within R. By the end of this course, students will be able to rapidly visualize and explore demographic data from the Census Bureau using ggplot2 and other tidyverse tools, and make maps of Census demographic data with only a few lines of R code.
In this chapter, students will learn the basics of working with Census data in R with tidycensus. They will acquire data using tidycensus functions, search for data, and make a basic plot.
In this chapter, students learn how to use tidyverse tools to wrangle data from the US Census and American Community Survey. They also learn about handling margins of error in the ACS.
In this chapter, students will learn how to work with US Census Bureau geographic data in R using the tigris R package.
In this chapter, you will learn how to obtain feature geometry with the tidycensus package, and use ggplot2 and mapview to make customized static and interactive maps of US Census data.",[],"['Kyle Walker', 'Chester Ismay', 'Becca Robins']",[],"['Introduction to the Tidyverse', 'Spatial Analysis in R with sf and raster']",https://www.datacamp.com/courses/analyzing-us-census-data-in-r,Other,R
18,Anomaly Detection in R,4,13,47,"2,926","3,900",Anomaly Detection in R,"Anomaly Detection in R
Are you concerned about inaccurate or suspicious records in your data, but not sure where to start? An anomaly detection algorithm could help! Anomaly detection is a collection of techniques designed to identify unusual data points, and are crucial for detecting fraud and for protecting computer networks from malicious activity. In this course, you'll explore statistical tests for identifying outliers, and learn to use sophisticated anomaly scoring algorithms like the local outlier factor and isolation forest. You'll apply anomaly detection algorithms to identify unusual wines in the UCI Wine quality dataset and also to detect cases of thyroid disease from abnormal hormone measurements.
In this chapter, you'll learn how numerical and graphical summaries can be used to informally assess whether data contain unusual points. You'll use a statistical procedure called Grubbs' test to check whether a point is an outlier, and learn about the Seasonal-Hybrid ESD algorithm, which can help identify outliers when the data are a time series.
In this chapter, you'll learn how to calculate the k-nearest neighbors distance and the local outlier factor, which are used to construct continuous anomaly scores for each data point when the data have multiple features. You'll learn the difference between local and global anomalies and how the two algorithms can help in each case.
k-nearest neighbors distance and local outlier factor use the distance or relative density of the nearest neighbors to score each point. In this chapter, you'll explore an alternative tree-based approach called an isolation forest, which is a fast and robust method of detecting anomalies that measures how easily points can be separated by randomly splitting the data into smaller and smaller regions.
You've now been introduced to a few different algorithms for anomaly scoring. In this final chapter, you'll learn to compare the detection performance of the algorithms in instances where labeled anomalies are available. You'll learn to calculate and interpret the precision and recall statistics for an anomaly score, and how to adapt the algorithms so they can accommodate data with categorical features.",[],"['Alastair Rushworth', 'Chester Ismay', 'Amy Peterson']","[('Furniture', 'https://assets.datacamp.com/production/repositories/2385/datasets/8977d3e5d10f1ac243696e86a64e6470f578cf57/furniture.csv'), ('Wine', 'https://assets.datacamp.com/production/repositories/2385/datasets/ee4b58d16708ae7647f7f6278c041623de6e3ad4/big_wine.csv'), ('Thyroid', 'https://assets.datacamp.com/production/repositories/2385/datasets/735c85adc275d9265b6b1bdef11a78020a31e9e3/thyroid.csv')]",['Intermediate R'],https://www.datacamp.com/courses/anomaly-detection-in-r,Probability & Statistics,R
19,Applying SQL to Real-World Problems,4,13,47,102,"3,550",Applying SQL Real-World Problems,"Applying SQL to Real-World Problems
Now that you’ve learned the basic tools of SQL you are ready to synthesize them into practical, real-world skills. In this course, you will work with a database of a fictional movie rental company. The size and complexity of this database will allow you to experience the challenges of working with databases firsthand. Throughout this course, you will use SQL to answer business-driven questions. You will learn new skills that will empower you to find the tables you need. You will then learn how to store and manage this data in tables and views that you create. Best of all you will also learn how to write code that not only clearly conveys your intent but is also legible.
You will review some of the most commonly used SQL commands to ensure you are prepared to tackle both real-world problems as well as every exercise covered in this course.
How do you find the data you need in your database in order to answer real-world business questions? Here you will learn how to use system tables to explore your database. You will use these tables to create a new tool that contains a list of all tables and columns in your database. Finally, you will create an Entity Relationship Diagram (ERD) which will help you connect multiple tables.
Working with SQL to solve real-world problems will oftentimes require you to do more than retrieve the data you need, oftentimes you will need to manage the data in your database. This includes creating data, updating it and, when necessary, deleting it.
How do you ensure that the SQL scripts you write will be easy to understand for anyone who needs to read them? This chapter will cover approaches you can leverage to ensure that your code clearly conveys your intent, is readable by others and follows best practices.",[],"['Dmitriy Gorenshteyn', 'Chester Ismay', 'Adrián Soto']","[('DVD Rental Database', 'https://assets.datacamp.com/production/repositories/3868/datasets/3509e6592ac9ccc8ed3084649dd1be809d9c55a9/pagilla_fixed_v3.sql')]","['Intro to SQL for Data Science', 'Joining Data in SQL']",https://www.datacamp.com/courses/applying-sql-to-real-world-problems,Data Manipulation,SQL
20,Bayesian Modeling with RJAGS,4,15,58,"1,936","4,650",Bayesian Modeling RJAGS,"Bayesian Modeling with RJAGS
The Bayesian approach to statistics and machine learning is logical, flexible, and intuitive. In this course, you will engineer and analyze a family of foundational, generalizable Bayesian models. These range in scope from fundamental one-parameter models to intermediate multivariate & generalized linear regression models. The popularity of such Bayesian models has grown along with the availability of computing resources required for their implementation. You will utilize one of these resources - the rjags package in R. Combining the power of R with the JAGS (Just Another Gibbs Sampler) engine, rjags provides a framework for Bayesian modeling, inference, and prediction.
Bayesian models combine prior insights with insights from observed data to form updated, posterior insights about a parameter. In this chapter, you will review these Bayesian concepts in the context of the foundational Beta-Binomial model for a proportion parameter. You will also learn how to use the rjags package to define, compile, and simulate this model in R.
The two-parameter Normal-Normal Bayesian model provides a simple foundation for Normal regression models. In this chapter, you will engineer the Normal-Normal and define, compile, and simulate this model using rjags. You will also explore the magic of the Markov chain mechanics behind rjags simulation.
In this chapter, you will extend the Normal-Normal model to a simple Bayesian regression model. Within this context, you will explore how to use rjags simulation output to conduct posterior inference. Specifically, you will construct posterior estimates of regression parameters using posterior means & credible intervals, you will test hypotheses using posterior probabilities, and you will construct posterior predictive distributions for new observations.
In this final chapter, you will generalize the simple Normal regression model for application in broader contexts. You will incorporate categorical predictors, engineer a multivariate regression model with two predictors, and finally extend this methodology to Poisson multivariate regression models for count variables.",[],"['Alicia Johnson', 'Chester Ismay', 'Nick Solomon', 'Eunkyung Park']","[('Sleep study data', 'https://assets.datacamp.com/production/repositories/2096/datasets/62737a3d23519405d7bfe3eceb85be0f97a07862/sleep_study.csv')]","['Fundamentals of Bayesian Data Analysis in R', 'Introduction to the Tidyverse']",https://www.datacamp.com/courses/bayesian-modeling-with-rjags,Probability & Statistics,R
21,Bayesian Regression Modeling with rstanarm,4,15,45,"1,713","3,400",Bayesian Regression Modeling rstanarm,"Bayesian Regression Modeling with rstanarm
Bayesian estimation offers a flexible alternative to modeling techniques where the inferences depend on p-values. In this course, you’ll learn how to estimate linear regression models using Bayesian methods and the rstanarm package. You’ll be introduced to prior distributions, posterior predictive model checking, and model comparisons within the Bayesian framework. You’ll also learn how to use your estimated model to make predictions for new data.
A review of frequentist regression using lm(), an introduction to Bayesian regression using stan_glm(), and a comparison of the respective outputs.
Learn how to modify your Bayesian model including changing the number and length of chains, changing prior distributions, and adding predictors.
In this chapter, we'll learn how to determine if our estimated model fits our data and how to compare competing models.
In this chapter, we'll learn how to use the estimated model to create visualizations of your model and make predictions for new data.",[],"['Jake Thompson', 'Chester Ismay', 'David Campos', 'Shon Inouye']","[('Spotify dataset', 'https://assets.datacamp.com/production/repositories/2199/datasets/3c921f85674c92085b3428c303b9364573a8bd4f/datacamp-spotify-data.csv')]","['Data Visualization with ggplot2 (Part 1)', 'Multiple and Logistic Regression', 'Bayesian Modeling with RJAGS']",https://www.datacamp.com/courses/bayesian-regression-modeling-with-rstanarm,Probability & Statistics,R
22,Big Data Fundamentals via PySpark,4,16,55,"7,142","4,600",Big Data Fundamentals via PySpark,"Big Data Fundamentals via PySpark
There's been a lot of buzz about Big Data over the past few years, and it's finally become mainstream for many companies. But what is this Big Data? This course covers the fundamentals of Big Data via PySpark. Spark is “lightning fast cluster computing"" framework for Big Data. It provides a general data processing platform engine and lets you run programs up to 100x faster in memory, or 10x faster on disk, than Hadoop. You’ll use PySpark, a Python package for spark programming and its powerful, higher-level libraries such as SparkSQL, MLlib (for machine learning), etc., to interact with works of William Shakespeare, analyze Fifa football 2018 data and perform clustering of genomic datasets. At the end of this course, you will gain an in-depth understanding of PySpark and it’s application to general Big Data analysis.
This chapter introduces the exciting world of Big Data, as well as the various concepts and different frameworks for processing Big Data. You will understand why Apache Spark is considered the best framework for BigData.
The main abstraction Spark provides is a resilient distributed dataset (RDD), which is the fundamental and backbone data type of this engine. This chapter introduces RDDs and shows how RDDs can be created and executed using RDD Transformations and Actions.
In this chapter, you'll learn about Spark SQL which is a Spark module for structured data processing. It provides a programming abstraction called DataFrames and can also act as a distributed SQL query engine. This chapter shows how Spark SQL allows you to use DataFrames in Python.
PySpark MLlib is the Apache Spark scalable machine learning library in Python consisting of common learning algorithms and utilities. Throughout this last chapter, you'll learn important Machine Learning algorithms. You will build a movie recommendation engine and a spam filter, and use k-means clustering.",[],"['Upendra Kumar Devisetty', 'Hadrien Lacroix', 'Chester Ismay']","[('Complete Shakespeare', 'https://assets.datacamp.com/production/repositories/3514/datasets/d9e4e9c9a26e932e3164ad7585bc30fc06596a50/Complete_Shakespeare.txt'), ('Movie ratings', 'https://assets.datacamp.com/production/repositories/3514/datasets/cab267d2a4c482f3323aec8dd9278875d2048a01/ratings.csv'), ('5000 points', 'https://assets.datacamp.com/production/repositories/3514/datasets/84f3b6bab25357840cfc90c4276edc9604553fd7/5000_points.txt'), ('FIFA 2018', 'https://assets.datacamp.com/production/repositories/3514/datasets/1ad7ffa377b5ba5d42e95efc9944293be97efd62/Fifa2018_dataset.csv'), ('People', 'https://assets.datacamp.com/production/repositories/3514/datasets/db8a991f6a506fb50fff7f7baf32d2ae02e7c480/people.csv'), ('Spam', 'https://assets.datacamp.com/production/repositories/3514/datasets/2d331f8b1b3c80e205850e38c53c1284b54c46cc/spam.txt'), ('Ham', 'https://assets.datacamp.com/production/repositories/3514/datasets/26b670b5ae766aecf7ebf7bf364fe9a590c2788b/ham.txt')]",['Introduction to Python'],https://www.datacamp.com/courses/big-data-fundamentals-via-pyspark,Machine Learning,Python
23,Biomedical Image Analysis in Python,4,15,54,"4,884","4,400",Biomedical Image Analysis,"Biomedical Image Analysis in Python
The field of biomedical imaging has exploded in recent years - but for the uninitiated, even loading data can be a challenge! In this introductory course, you'll learn the fundamentals of image analysis using NumPy, SciPy, and Matplotlib. You'll navigate through a whole-body CT scan, segment a cardiac MRI time series, and determine whether Alzheimer’s disease changes brain structure. Even if you have never worked with images before, you will finish the course with a solid toolkit for entering this dynamic field.
Prepare to conquer the Nth dimension! To begin the course, you'll learn how to load, build and navigate N-dimensional images using a CT image of the human chest. You'll also leverage the useful ImageIO package and hone your NumPy and matplotlib skills.
Cut image processing to the bone by transforming x-ray images. You'll learn how to exploit intensity patterns to select sub-regions of an array, and you'll use convolutional filters to detect interesting features. You'll also use SciPy's ndimage module, which contains a treasure trove of image processing tools.
In this chapter, you'll get to the heart of image analysis: object measurement. Using a 4D cardiac time series, you'll determine if a patient is likely to have heart disease. Along the way, you'll learn the fundamentals of image segmentation, object labeling, and morphological measurement.
For the final chapter, you'll need to use your brain... and hundreds of others! Drawing data from more than 400 open-access MR images, you'll learn the basics of registration, resampling, and image comparison. Then, you'll use the extracted measurements to evaluate the effect of Alzheimer's Disease on brain structure.",[],"['Stephen Bailey', 'Lore Dirick', 'Becca Robins', 'Sara Snell']","[('RSNA Hand Radiograph', 'https://assets.datacamp.com/production/repositories/2085/datasets/61bc2353b17eb6929d6169109bff447b6d00b6bc/hand.png'), ('OASIS Brain Measurements', 'https://assets.datacamp.com/production/repositories/2085/datasets/bbf1f4e91437f8d830880d30b31ab930578a7b4b/oasis_all_volumes.csv'), ('Sunnybrook Cardiac MRI', 'https://assets.datacamp.com/production/repositories/2085/datasets/fabaa1f1675549d624eb8f5d1bc94e0b11e30a8e/sunnybrook-cardiac-mr.zip'), ('TCIA Chest CT (Sample)', 'https://assets.datacamp.com/production/repositories/2085/datasets/f44726fefae841afd24ddf83c58f34722212e67a/tcia-chest-ct-sample.zip')]",['Intermediate Python for Data Science'],https://www.datacamp.com/courses/biomedical-image-analysis-in-python,Data Manipulation,Python
24,Bond Valuation and Analysis in R,4,13,43,"7,032","3,350",Bond Valuation and Analysis in R,"Bond Valuation and Analysis in R
The fixed income market is large and filled with complex instruments. In this course, we focus on plain vanilla bonds to build solid fundamentals you will need to tackle more complex fixed income instruments. In this chapter, we demonstrate the mechanics of valuing bonds by focusing on an annual coupon, fixed rate, fixed maturity, and option-free bond.
Estimating Yield To Maturity - The YTM measures the expected return to bond investors if they hold the bond until maturity. This number summarizes the compensation investors demand for the risk they are bearing by investing in a particular bond. We will discuss how one can estimate YTM of a bond.
Interest rate risk is the biggest risk that bond investors face. When interest rates rise, bond prices fall. Because of this, much attention is paid to how sensitive a particular bond's price is to changes in interest rates. In this chapter, we start the discussion with a simple measure of bond price volatility - the Price Value of a Basis Point. Then, we discuss duration and convexity, which are two common measures that are used to manage interest rate risk.
We will put all of the techniques that the student has learned from Chapters One through Three into one comprehensive example. The student will be asked to value a bond by using the yield on a comparable bond and estimate the bond's duration and convexity.","['Applied Finance with R', 'Quantitative Analyst with R']","['Clifford Ang', 'Lore Dirick']",[],"['Introduction to R for Finance', 'Intermediate R for Finance', 'Importing and Managing Financial Data in R']",https://www.datacamp.com/courses/bond-valuation-and-analysis-in-r,Applied Finance,R
25,Building Chatbots in Python,4,15,49,"38,461","4,100",Building Chatbots,"Building Chatbots in Python
Messaging and voice-controlled devices are the next big platforms, and conversational computing has a big role to play in creating engaging augmented and virtual reality experiences. This course will get you started on the path toward building such applications. There are a number of unique challenges to building these kinds of programs, like how do I turn human language into instructions for machines? In this course, you'll tackle this first with rule-based systems and then with machine learning. Some chat systems are designed to be useful, while others are just good fun. You will build one of each and put everything together to make a helpful, friendly chatbot. Once you complete the course, you’ll also learn how to connect your chatbot to Facebook Messenger!
In this chapter, you'll learn how to build your first chatbot. After gaining a bit of historical context, you'll set up a basic structure for receiving text and responding to users, and then learn how to add the basic elements of personality. You'll then build rule-based systems for parsing text.
Here, you'll use machine learning to turn natural language into structured data using spaCy, scikit-learn, and rasa NLU. You'll start with a refresher on the theoretical foundations and then move onto building models using the ATIS dataset, which contains thousands of sentences from real people interacting with a flight booking system.
In this chapter, you'll build a personal assistant to help you plan a trip. It will be able to respond to questions like ""are there any cheap hotels in the north of town?"" by looking inside a hotel’s database for matching results.
Everything you've built so far has statelessly mapped intents to actions and responses. It's amazing how far you can get with that! But to build more sophisticated bots you will always want to add some statefulness. That's what you'll do here, as you build a chatbot that helps users order coffee.",[],"['Alan Nichol', 'Hugo Bowne-Anderson', 'Yashas Roy']","[('ATIS (Airline Travel Information System)', 'https://assets.datacamp.com/production/repositories/925/datasets/bc9aa8fd897dfee464fb48b21ea0182f7d57edaa/atis.zip'), ('Hotels database', 'https://assets.datacamp.com/production/repositories/925/datasets/309bcfc999bb88b593066a0acd7d8d83bd5e175e/hotels.db')]",['Natural Language Processing Fundamentals in Python'],https://www.datacamp.com/courses/building-chatbots-in-python,Machine Learning,Python
26,Building Dashboards with flexdashboard,4,14,50,"3,153","4,150",Building Dashboards flexdashboard,"Building Dashboards with flexdashboard
Communication is a key part of the data science process. Dashboards are a popular way to present data in a cohesive visual display. In this course you'll learn how to assemble your results into a polished dashboard using the flexdashboard package. This can be as simple as adding a few lines of R Markdown to your existing code, or as rich as a fully interactive Shiny-powered experience. You will learn about the spectrum of dashboard creation tools available in R and complete this course with the ability to produce a professional quality dashboard.
In this chapter you will learn how R Markdown and the flexdashboard package are used to create a dashboard, and how to customize the layout of components on your dashboard.
This chapter will introduce the many options for including data visualizations in your dashboard. You'll learn about how to optimize your plots for display on the web.
In this chapter you will learn about other components that will allow you to create a complete dashboard. This includes ways to present everything from a single value to a complete dataset.
This chapter will demonstrate how you can use Shiny to make your dashboard interactive. You'll keep working with the San Francisco bike sharing data and build a dashboard for exploring this data set.",['Shiny Fundamentals with R'],"['Elaine McVey', 'Chester Ismay', 'Nick Solomon']","[('San Francisco bike share data', 'https://assets.datacamp.com/production/repositories/1448/datasets/1f12031000b09ad096880bceb61f6ca2fd95e2eb/sanfran_bikeshare_joined_oneday.csv'), ('San Francisco bike share station data', 'https://assets.datacamp.com/production/repositories/1448/datasets/38f4fbe05ad1b7b13a6a8f5c680eeeed67cd7cf0/stations_data.csv')]","['Building Web Applications in R with Shiny', 'Reporting with R Markdown']",https://www.datacamp.com/courses/building-dashboards-with-flexdashboard,Reporting,R
27,Building Dashboards with shinydashboard,4,13,45,"11,250","3,750",Building Dashboards shinydashboard,"Building Dashboards with shinydashboard
Once you've started learning tools for building interactive web applications with shiny, this course will translate this knowledge into building dashboards. Dashboards, a common data science deliverable, are pages that collate information, often tracking metrics from a live-updating data source. You'll gain more expertise using shiny while learning to build and design these dynamic dashboards. In the process, you'll pick up tips to optimize performance as well as best practices to create a visually appealing product.
In this chapter you will learn the basic structure of a Shiny Dashboard and how to fill it with static content.
In this chapter you will learn how to add dynamic content to your Shiny Dashboard.
In this chapter you will focus on customizing the style of your Shiny Dashboard.
In this chapter you will participate in a case study, practicing the skills you have acquired in the previous chapters.",['Shiny Fundamentals with R'],"['Lucy D’Agostino McGowan', 'Chester Ismay', 'Nick Solomon']","[('NASA fireball dataset', 'https://assets.datacamp.com/production/repositories/1661/datasets/6a69952e67540acd76ffa28386e534297c1db32b/nasa_fireball.rda'), ('Starwars dataset', 'https://assets.datacamp.com/production/repositories/1661/datasets/2d751e7a11001e8d4d4ac263ac9878361cad959d/starwars.csv')]",['Building Web Applications in R with Shiny'],https://www.datacamp.com/courses/building-dashboards-with-shinydashboard,Reporting,R
28,Building Recommendation Engines with PySpark,4,15,56,"3,531","4,550",Building Recommendation Engines PySpark,"Building Recommendation Engines with PySpark
This course will show you how to build recommendation engines using Alternating Least Squares in PySpark. Using the popular MovieLens dataset and the Million Songs dataset, this course will take you step by step through the intuition of the Alternating Least Squares algorithm as well as the code to train, test and implement ALS models on various types of customer data.
This chapter will show you how powerful recommendations engines can be, and provide important distinctions between collaborative-filtering engines and content-based engines as well as the different types of implicit and explicit data that recommendation engines can use. You will also learn a very powerful way to uncover hidden features (latent features) that you may not even know exist in customer datasets.
In this chapter you will review basic concepts of matrix multiplication and matrix factorization, and dive into how the Alternating Least Squares algorithm works and what arguments and hyperparameters it uses to return the best recommendations possible. You will also learn important techniques for properly preparing your data for ALS in Spark.
In this chapter you will be introduced to the MovieLens dataset. You will walk through how to assess it's use for ALS, build out a full cross-validated ALS model on it, and learn how to evaluate it's performance. This will be the foundation for all subsequent ALS models you build using Pyspark.
In most real-life situations, you won't not have ""perfect"" customer data available to build an ALS model. This chapter will teach you how to use your customer behavior data to ""infer"" customer ratings and use those inferred ratings to build an ALS recommendation engine. Using the Million Songs Dataset as well as another version of the MovieLens dataset, this chapter will show you how to use the data available to you to build a recommendation engine using ALS and evaluate it's performance.",[],"['Jamen Long', 'Lore Dirick', 'Nick Solomon', 'Adrián Soto']",[],"['Introduction to PySpark', 'Supervised Learning with scikit-learn']",https://www.datacamp.com/courses/recommendation-engines-in-pyspark,Machine Learning,Python
29,Building Response Models in R,4,13,53,811,"4,600",Building Response Models in R,"Building Response Models in R
Almost every company collects digital information as part of their marketing campaigns and uses it to improve their marketing tactics. Data scientists are often tasked with using this information to develop statistical models that enable marketing professionals to see if their actions are paying off. In this course, you will learn how to uncover patterns of marketing actions and customer reactions by building simple models of market response. In particular, you will learn how to quantify the impact of marketing variables, such as price and different promotional tactics, using aggregate sales and individual-level choice data.
The first chapter introduces you to the basic principles and concepts of market response models. Here, you will learn how to build simple response models for product sales. In addition, you will learn about the theoretical and practical differences between linear and non-linear models for sales responses.
An effective marketing strategy combines all the tools available to communicate the benefits of a product. The key is crafting the right mix of these tools to achieve sales increases and market share goals. In the second chapter, you will learn how to incorporate the effects of advertising and promotion in your sales-response model and how to identify the marketing strategy that is most likely to succeed.
A company can only be successful in the market if its products have a competitive advantage over those of its rivals. To develop an effective marketing strategy in a competitive environment, it is essential to understand the interrelationship between marketing activity and customer behavior. In this chapter, you will learn how to explain the effects of temporary price changes on customer brand choice by employing logistic and probit response models.
The main goal of response modeling is to enable marketers to not only see a payoff for their actions today, but also tomorrow. In order to view this future payoff, a simple but reliable statistical model is required. In this last chapter, you will learn how to evaluate the predictive performance of logistic response models.",['Marketing Analytics with R'],"['Kathrin Gruber', 'Chester Ismay', 'David Campos', 'Shon Inouye']","[('Beer sales dataset', 'https://assets.datacamp.com/production/repositories/2198/datasets/00ac05c43d83841590cc74bbbc5d83d956c41131/sales.data.RData'), ('Beer choice dataset', 'https://assets.datacamp.com/production/repositories/2198/datasets/f79717c6e2300cd40ddc7320f6681cc747100685/choice.data.RData')]",['Correlation and Regression'],https://www.datacamp.com/courses/building-response-models-in-r,Probability & Statistics,R
30,Building Web Applications in R with Shiny: Case Studies,4,16,59,"7,359","4,850",Building Web Applications in R Shiny: Case Studies,"Building Web Applications in R with Shiny: Case Studies
After learning the basics of using Shiny to build web applications, this course takes you to the next level by putting your newly acquired skills into practice. You'll get experience developing fun and realistic Shiny apps for different common use cases, such as using Shiny to explore a dataset, generate a customized plot, and even create a word cloud. With all this practice and new knowledge, you will be well-equipped to develop Shiny apps for your own use.
In the first chapter, you'll review the essentials of Shiny development. You'll get reintroduced to the basic structure of a Shiny application, as well as some core Shiny concepts such as inputs, outputs, and reactivity. Completing this chapter will help refresh your Shiny knowledge and ensure you have the required skills to develop Shiny apps for real-life scenarios.
Imagine you're preparing a figure for a manuscript using R. You spend a lot of time recreating the same plot over and over again by rerunning the same code but changing small parameters each time. The size of the points, the color of the points, the plot title, the data shown on the plot—these criteria all have to be just right before publishing the figure. To save you from the hassle of rerunning the code many times, you will learn how to create a Shiny app to make a customizable plot.
Let’s say your supervisor is impressed by the plot you created with Shiny and now wants to get familiar with the dataset you used in the plot. They don't want to simply have a raw data file, they want an interactive environment where they can view the data, filter it, and download it. This chapter will guide you in creating such an application—a Shiny app for exploring the Gapminder dataset.
Your friend really likes word clouds and has written an R function to generate them. They want to share this function with all their friends, but not all of them know how to use R. You offer to help by building a Shiny app that uses their function to let people create their own word clouds. This will allow all their friends—even the ones who are unfamiliar with R—to generate word clouds using a point-and-click interface. This chapter will guide you through the steps required to build this app.",['Shiny Fundamentals with R'],"['Dean Attali', 'Sascha Mayr']",[],['Building Web Applications in R with Shiny'],https://www.datacamp.com/courses/building-web-applications-in-r-with-shiny-case-studies,Reporting,R
31,Building and Optimizing Triggers in SQL Server,4,15,49,158,"3,800",Building and Optimizing Triggers in SQL Server,"Building and Optimizing Triggers in SQL Server
Auditing your SQL Server database and keeping data integrity can be a challenging task for DBAs and database developers. SQL Server triggers are special types of stored procedures designed to help you achieve consistency and integrity of your database. This course will teach you how to work with triggers and use them in real-life examples. Specifically, you will learn about the use cases and limitations of triggers and get practice designing and implementing them. You will also learn to optimize triggers to fit your specific needs.
An introduction to the basic concepts of SQL Server triggers. Create your first trigger using T-SQL code. Learn how triggers are used and what alternatives exist.
Learn about the different types of SQL Server triggers: AFTER triggers (DML), INSTEAD OF triggers (DML), DDL triggers, and logon triggers.
Find out known limitations of triggers, as well as common use cases for AFTER triggers (DML), INSTEAD OF triggers (DML) and DDL triggers.
Learn to delete and modify triggers. Acquaint yourself with the way trigger management is done. Learn how to investigate problematic triggers in practice.",[],"['Florin Angelescu', 'Mona Khalil', 'Becca Robins', 'Marianna Lamnina']","[('Discounts table', 'https://assets.datacamp.com/production/repositories/4414/datasets/198a4c88eaee60e0af88038abc73d84f2f968ba2/discounts.csv'), ('Orders table', 'https://assets.datacamp.com/production/repositories/4414/datasets/f3e3862ffc39d47aa7b260dc6dc3efbe4c7daead/orders.csv'), ('Products table', 'https://assets.datacamp.com/production/repositories/4414/datasets/72f2c1197f5baa4b5dee40b79fddf5cfff67c633/products.csv')]","['Introduction to Relational Databases in SQL', 'Intermediate SQL Server']",https://www.datacamp.com/courses/building-and-optimizing-triggers-in-sql-server,Data Manipulation,SQL
32,Business Process Analytics in R,4,16,58,"1,875","4,550",Business Process Analytics in R,"Business Process Analytics in R
Although you might not have realized, processes take up an indispensable role in our daily lives. Your actions and those of others generate an extensive amount of data. Whether you are ordering a book, a train crosses a red light, or your thermostat heats your bathroom, every second millions of events are taking place which are stored in data centers around the world. These enormous sets of event data can be used to gain insight into processes in a virtually unlimited range of fields. However, the analysis of this data requires its own set of specific formats and techniques. This course will introduce you to process mining with R and demonstrate the different steps needed to analyze business processes.
The amount of event data has grown enormously during the last decades. A considerable amount of this data is recorded within the context of various business process. In this chapter, you will discover a methodology for analyzing process data, consisting of three stages: extraction, processing and analysis. You will have our first encounter with the specific elements of process data which are required for analysis, and have a first deep dive into the world of activities and traces, which will allow you to reveal of first glimpse of the process.
A process can be looked at from different angles: the control-flow, the performance and the organizational background. In this chapter, you will make a deep dive into each of these perspectives. The control-flow refers to the different ways in which the process can be executed, and thus, how it is structured. Considering performance, we are both interested in discovering how long things take, as well as when they take place. Finally, the organizational perspective looks at the actors in the process.
Event data rarely comes in a form which is ready to analyze. Therefore, you often require a set of tools to get the data in the right shape, before we can answer our research question. At the end of this chapter, you will be familiar with three common preprocessing tasks: filtering data, aggregating events and enriching data.
In this final chapter we will use everything we have learned so far to do and end-to-end analysis of an order-to-cash process. Firstly, we will transform data from various sources to an event log. Secondly, we will have a helicopter view of the process, exploring the dimensions of the data and the different activities, stages and flows in the process. Finally, we will combine preprocessing and analysis tools to formulate an answer to several research questions.",[],"['Gert Janssenswillen', 'Yashas Roy', 'Sascha Mayr']","[('Eating patterns', 'https://assets.datacamp.com/production/repositories/1747/datasets/368f1d44a01d0b76b14e8a0a358132f66d8908d7/log_eat_patterns.RDS'), ('Order-to-cash process', 'https://assets.datacamp.com/production/repositories/1747/datasets/d46101a7b94b9b13701a66c7677731676d0bf40e/otc.zip')]",['Working with Data in the Tidyverse'],https://www.datacamp.com/courses/business-process-analytics-in-r,Probability & Statistics,R
33,Case Studies in Statistical Thinking,4,16,61,"6,065","4,850",Case Studies in Statistical Thinking,"Case Studies in Statistical Thinking
Mastery requires practice. Having completed Statistical Thinking I and II, you developed your probabilistic mindset and the hacker stats skills to extract actionable insights from your data. Your foundation is in place, and now it is time practice your craft.
In this course, you will apply your statistical thinking skills, exploratory data analysis, parameter estimation, and hypothesis testing, to two new real-world data sets. First, you will explore data from the 2013 and 2015 FINA World Aquatics Championships, where you will quantify the relative speeds and variability among swimmers. You will then perform a statistical analysis to assess the ""current controversy"" of the 2013 Worlds in which swimmers claimed that a slight current in the pool was affecting result. Second, you will study the frequency and magnitudes of earthquakes around the world. Finally, you will analyze the changes in seismicity in the US state of Oklahoma after the practice of high pressure waste water injection at oil extraction sites became commonplace in the last decade. As you work with these data sets, you will take vital steps toward mastery as you cement your existing knowledge and broaden your abilities to use statistics and Python to make sense of your data.
To begin, you'll use two data sets from Caltech researchers to rehash the key points of Statistical Thinking I and II to prepare you for the following case studies!
In this chapter, you will practice your EDA, parameter estimation, and hypothesis testing skills on the results of the 2015 FINA World Swimming Championships.
Some swimmers said that they felt it was easier to swim in one direction versus another in the 2013 World Championships. Some analysts have posited that there was a swirling current in the pool. In this chapter, you'll investigate this claim!
References - Quartz Media, Washington Post, SwimSwam (and also here), and Cornett, et al.
Herein, you'll use your statistical thinking skills to study the frequency and magnitudes of earthquakes. Along the way, you'll learn some basic statistical seismology, including the Gutenberg-Richter law. This exercise exposes two key ideas about data science: 1) As a data scientist, you wander into all sorts of domain specific analyses, which is very exciting. You constantly get to learn. 2) You are sometimes faced with limited data, which is also the case for many of these earthquake studies. You can still make good progress!
Of course, earthquakes have a big impact on society, and recently are connected to human activity. In this final chapter, you'll investigate the effect that increased injection of saline wastewater due to oil mining in Oklahoma has had on the seismicity of the region.",['Statistics Fundamentals with Python'],"['Justin Bois', 'Hugo Bowne-Anderson', 'Yashas Roy']","[('Swimming results, 2013 World Aquatics Championships', 'https://assets.datacamp.com/production/repositories/1067/datasets/ed0ba2dca1d7d515d925c62aa0badf02ef00fad8/2013_FINA.csv'), ('Swimming results, 2015 World Aquatics Championships', 'https://assets.datacamp.com/production/repositories/1067/datasets/80dc54c31868c00a584bfa3a195525fa243d839e/2015_FINA.csv'), ('Zebrafish active bout lengths', 'https://assets.datacamp.com/production/repositories/1067/datasets/8885c23f1c156149b736ca2ea0d9b01bbc727ecd/gandhi_et_al_bouts.csv'), ('Oklahoma earthquakes (1950 to mid-2017)', 'https://assets.datacamp.com/production/repositories/1067/datasets/c12865c9df2b6e63a40a53eaeee7caffb6cf87ac/oklahoma_earthquakes_1950-2017.csv'), ('Bacterial growth', 'https://assets.datacamp.com/production/repositories/1067/datasets/8c69b496a875ae9597a4962269baae2ceab341f0/park_bacterial_growth.csv'), ('Parkfield earthquakes (1950 to mid-2017)', 'https://assets.datacamp.com/production/repositories/1067/datasets/dfefd6ab5cf704d0723ec08723c9e7c9978c1700/parkfield_earthquakes_1950-2017.csv')]","['Statistical Thinking in Python (Part 1)', 'Statistical Thinking in Python (Part 2)']",https://www.datacamp.com/courses/case-studies-in-statistical-thinking,Probability & Statistics,Python
34,Categorical Data in the Tidyverse,4,13,44,"3,207","3,600",Categorical Data in Tidyverse,"Categorical Data in the Tidyverse
As a data scientist, you will often find yourself working with non-numerical data, such as job titles, survey responses, or demographic information. R has a special way of representing them, called factors, and this course will help you master working with them using the tidyverse package forcats. We’ll also work with other tidyverse packages, including ggplot2, dplyr, stringr, and tidyr and use real world datasets, such as the fivethirtyeight flight dataset and Kaggle’s State of Data Science and ML Survey. Following this course, you’ll be able to identify and manipulate factor variables, quickly and efficiently visualize your data, and effectively communicate your results. Get ready to categorize!
In this chapter, you’ll learn all about factors. You’ll discover the difference between categorical and ordinal variables, how R represents them, and how to inspect them to find the number and names of the levels. Finally, you’ll find how forcats, a tidyverse package, can improve your plots by letting you quickly reorder variables by their frequency.
You’ll continue to dive into the forcats package, learning how to change the order and names of levels and even collapse them into one another.
Having gotten a good grasp of forcats, you’ll expand out to the rest of the tidyverse, learning and reviewing functions from dplyr, tidyr, and stringr. You’ll refine graphs with ggplot2 by changing axes to percentage scales, editing the layout of the text, and more.
In this final chapter, you’ll take all that you’ve learned and apply it in a case study. You’ll learn more about working with strings and summarizing data, then replicate a publication quality 538 plot.","['Data Analyst with R', 'Tidyverse Fundamentals with R']","['Emily Robinson', 'Chester Ismay', 'Becca Robins']","[('538 Flying Etiquette survey', 'https://assets.datacamp.com/production/repositories/1834/datasets/bef2c6e1ef42a2f230383e080fa7379912860017/flying-etiquette.csv'), ('Kaggle multiple choice responses', 'https://assets.datacamp.com/production/repositories/1834/datasets/584ec6ab685e3795f79963486ea9c751b90a4bf0/smc_with_js.csv')]","['Introduction to the Tidyverse', 'Working with Data in the Tidyverse']",https://www.datacamp.com/courses/categorical-data-in-the-tidyverse,Data Manipulation,R
35,ChIP-seq Workflows in R,4,13,46,969,"3,650",ChIP-seq Workflows in R,"ChIP-seq Workflows in R
ChIP-seq analysis is an important branch of bioinformatics. It provides a window into the machinery that makes the cells in our bodies tick. Whether it is a brain cell helping you to read this web page or an immune cell patrolling your body for microorganisms that would make you sick, they all carry the same genome. What differentiates them are the genes that are active at any given time. Which genes these are is determined by a complex system of proteins that can activate and deactivate genes. When this regulatory machinery gets out of control, it can lead to cancer and other debilitating diseases. ChIP-seq analysis allows us to understand the function of regulatory proteins, how they can contribute to disease and can provide insights into how we may be able to intervene to prevent cells from spinning out of control. In this course, you will explore a real dataset while learning how to process and analyze ChIP-seq data in R.
Introduction to ChIP-seq experiments. Why are they interesting? What sort of phenomena can be studied with ChIP-seq and what can we learn from these experiments.
Now the ChIP-seq analysis begins in earnest. This chapter introduces Bioconductor tools to import and clean the data.
This chapter introduces techniques to identify and visualise differences between ChIP-seq samples.
Being able to identify differential binding between groups of samples is great, but what does it mean? This chapter discusses strategies to interpret differential binding results to go from peak calls to biologically meaningful insights.",[],"['Peter Humburg', 'Sascha Mayr', 'David Campos', 'Shon Inouye']","[('Androgen Receptor ChIP-seq Peaks dataset', 'https://assets.datacamp.com/production/repositories/1556/datasets/c8196863474828ad64357c8327eeab64a5f3a06d/androgen_receptor_binding_peaks.zip'), ('Chromosome 20 dataset', 'https://assets.datacamp.com/production/repositories/1556/datasets/c817df755a33469a50f455735cda02d34e452050/chr20_29729372-29929372.bam.txt')]","['Intermediate R', 'Introduction to Bioconductor']",https://www.datacamp.com/courses/chip-seq-workflows-in-r,Other,R
36,Cleaning Data in Python,4,17,58,"76,388","4,800",Cleaning Data,"Cleaning Data in Python
A vital component of data science involves acquiring raw data and getting it into a form ready for analysis. It is commonly said that data scientists spend 80% of their time cleaning and manipulating data, and only 20% of their time actually analyzing it. This course will equip you with all the skills you need to clean your data in Python, from learning how to diagnose problems in your data, to dealing with missing values and outliers. At the end of the course, you'll apply all of the techniques you've learned to a case study to clean a real-world Gapminder dataset.
Say you've just gotten your hands on a brand new dataset and are itching to start exploring it. But where do you begin, and how can you be sure your dataset is clean? This chapter will introduce you to data cleaning in Python. You'll learn how to explore your data with an eye for diagnosing issues such as outliers, missing values, and duplicate rows.
Learn about the principles of tidy data, and more importantly, why you should care about them and how they make data analysis more efficient. You'll gain first-hand experience with reshaping and tidying data using techniques such as pivoting and melting.
The ability to transform and combine your data is a crucial skill in data science, because your data may not always come in one monolithic file or table for you to load. A large dataset may be broken into separate datasets to facilitate easier storage and sharing. But it’s important to be able to run your analysis on a single dataset. You’ll need to learn how to combine datasets or clean each dataset separately so you can combine them later for analysis.
Dive into some of the grittier aspects of data cleaning. Learn about string manipulation and pattern matching to deal with unstructured data, and then explore techniques to deal with missing or duplicate data. You'll also learn the valuable skill of programmatically checking your data for consistency, which will give you confidence that your code is running correctly and that the results of your analysis are reliable.
In this final chapter, you'll apply all of the data cleaning techniques you've learned in this course toward tidying a real-world, messy dataset obtained from the Gapminder Foundation. Once you're done, not only will you have a clean and tidy dataset, you'll also be ready to start working on your own data science projects using Python.","['Data Analyst with Python', 'Data Scientist with Python', 'Importing & Cleaning Data with Python', 'Python Programmer']","['Daniel Chen', 'Hugo Bowne-Anderson', 'Yashas Roy']","[('Air quality', 'https://assets.datacamp.com/production/repositories/666/datasets/c16448e3f4219f900f540c455fdf87b0f3da70e0/airquality.csv'), ('DOB job application filings', 'https://assets.datacamp.com/production/repositories/666/datasets/b54f64ca50c859e38fd68bcc7c932d09976709b8/dob_job_application_filings_subset.csv'), ('Ebola', 'https://assets.datacamp.com/production/repositories/666/datasets/6da83b3d2017245217d35989960184234a6c4e7f/ebola.csv'), ('Gapminder', 'https://assets.datacamp.com/production/repositories/666/datasets/8e869c545c913547d94b61534b2f8d336a2c8c87/gapminder.csv'), ('Tuberculosis', 'https://assets.datacamp.com/production/repositories/666/datasets/cf05b5e01009dd5d61d7db5ac5fb790042e7fd09/tb.csv'), ('Tips', 'https://assets.datacamp.com/production/repositories/666/datasets/b064fa9e0684a38ac15b0a19845367c29fde978d/tips.csv'), ('NYC Uber data', 'https://assets.datacamp.com/production/repositories/666/datasets/c202eb5e7ae1ebf87036a30dcea577096f02c861/nyc_uber_2014.csv')]","['Introduction to Python', 'Intermediate Python for Data Science']",https://www.datacamp.com/courses/cleaning-data-in-python,Importing & Cleaning Data,Python
37,Cleaning Data in R,4,15,58,"95,292","4,700",Cleaning Data in R,"Cleaning Data in R
It's commonly said that data scientists spend 80% of their time cleaning and manipulating data and only 20% of their time actually analyzing it. For this reason, it is critical to become familiar with the data cleaning process and all of the tools available to you along the way. This course provides a very basic introduction to cleaning data in R using the tidyr, dplyr, and stringr packages. After taking the course you'll be able to go from raw data to awesome insights as quickly and painlessly as possible!
This chapter will give you an overview of the process of data cleaning with R, then walk you through the basics of exploring raw data.
This chapter will give you an overview of the principles of tidy data, how to identify messy data, and what to do about it.
This chapter will teach you how to prepare your data for analysis. We will look at type conversion, string manipulation, missing and special values, and outliers and obvious errors.
In this chapter, you will practice everything you've learned from the first three chapters in order to clean a messy dataset using R.","['Data Analyst with R', 'Data Scientist with R', 'Importing & Cleaning Data with R']","['Nick Carchedi', 'Jeff Paadre']","[('Messy weather data', 'https://assets.datacamp.com/production/repositories/34/datasets/b3c1036d9a60a9dfe0f99051d2474a54f76055ea/weather.rds'), ('BMI data', 'https://assets.datacamp.com/production/repositories/34/datasets/a0a569ebbb34500d11979eba95360125127e6434/bmi_clean.csv'), ('Census data', 'https://assets.datacamp.com/production/repositories/34/datasets/f82ab0a3ccb95fe40e18c6eac5644d288cd126ea/census-retail.csv'), ('Student data (with dates)', 'https://assets.datacamp.com/production/repositories/34/datasets/f75a87dbbdf2cf79e2286f97b2af22146cb717b1/students_with_dates.csv')]",['Introduction to R'],https://www.datacamp.com/courses/cleaning-data-in-r,Importing & Cleaning Data,R
38,Cleaning Data with Apache Spark in Python,4,16,53,799,"4,150",Cleaning Data Apache Spark,"Cleaning Data with Apache Spark in Python
Working with data is tricky - working with millions or even billions of rows is worse.
Did you receive some data processing code written on a laptop with fairly pristine data?
Chances are you’ve probably been put in charge of moving a basic data process from prototype to production.
You may have worked with real world datasets, with missing fields, bizarre formatting, and orders of magnitude more data. Even if this is all new to you, this course helps you learn what’s needed to prepare data processes using Python with Apache Spark.
You’ll learn terminology, methods, and some best practices to create a performant, maintainable, and understandable data processing platform.
A review of DataFrame fundamentals and the importance of data cleaning.
A look at various techniques to modify the contents of DataFrames in Spark.
Improve data cleaning tasks by increasing performance or reducing resource requirements.
Learn how to process complex real-world data using Spark and the basics of pipelines.",[],"['Mike Metzger', 'Hadrien Lacroix', 'Hillary Green-Lerman']","[('Dallas Council Votes', 'https://assets.datacamp.com/production/repositories/4336/datasets/ea700976560a6f1760782c6e7310a662120c63b5/DallasCouncilVotes.csv.gz'), ('Dallas Council Voters', 'https://assets.datacamp.com/production/repositories/4336/datasets/c0aa672020bce21eef0d875a484a4fd44da042cf/DallasCouncilVoters.csv.gz'), ('Flights - 2014', 'https://assets.datacamp.com/production/repositories/4336/datasets/f412f5acbef38630147c2956d8703bafbcd0f74c/AA_DFW_2014_Departures_Short.csv.gz'), ('Flights - 2015', 'https://assets.datacamp.com/production/repositories/4336/datasets/475d2803541ba8facb2c39024dd0d9497859dc6c/AA_DFW_2015_Departures_Short.csv.gz'), ('Flights - 2016', 'https://assets.datacamp.com/production/repositories/4336/datasets/c1abacafea802998597d6c68b27c7b8650a18ab8/AA_DFW_2016_Departures_Short.csv.gz'), ('Flights - 2017', 'https://assets.datacamp.com/production/repositories/4336/datasets/04db01ffbd39f7bf2f88ffd5b7924b2de0419168/AA_DFW_2017_Departures_Short.csv.gz')]","['Intermediate Python for Data Science', 'Introduction to PySpark']",https://www.datacamp.com/courses/cleaning-data-with-apache-spark-in-python,Importing & Cleaning Data,Python
39,Cluster Analysis in R,4,16,52,"15,639","3,800",Cluster Analysis in R,"Cluster Analysis in R
Cluster analysis is a powerful toolkit in the data science workbench. It is used to find groups of observations (clusters) that share similar characteristics. These similarities can inform all kinds of business decisions; for example, in marketing, it is used to identify distinct groups of customers for which advertisements can be tailored. In this course, you will learn about two commonly used clustering methods - hierarchical clustering and k-means clustering. You won't just learn how to use these methods, you'll build a strong intuition for how they work and how to interpret their results. You'll develop this intuition by exploring three different datasets: soccer player positions, wholesale customer spending data, and longitudinal occupational wage data.
Cluster analysis seeks to find groups of observations that are similar to one another, but the identified groups are different from each other. This similarity/difference is captured by the metric called distance. In this chapter, you will learn how to calculate the distance between observations for both continuous and categorical features. You will also develop an intuition for how the scales of your features can affect distance.
This chapter will help you answer the last question from chapter 1 - how do you find groups of similar observations (clusters) in your data using the distances that you have calculated? You will learn about the fundamental principles of hierarchical clustering - the linkage criteria and the dendrogram plot - and how both are used to build clusters. You will also explore data from a wholesale distributor in order to perform market segmentation of clients using their spending habits.
In this chapter, you will build an understanding of the principles behind the k-means algorithm, learn how to select the right k when it isn't previously known, and revisit the wholesale data from a different perspective.
In this chapter, you will apply the skills you have learned to explore how the average salary amongst professions have changed over time.","['Data Scientist with R', 'Unsupervised Machine Learning with R']","['Dmitriy Gorenshteyn', 'Yashas Roy', 'Richie Cotton']","[('Soccer player positions', 'https://assets.datacamp.com/production/repositories/1219/datasets/94af7037c5834527cc8799a9723ebf3b5af73015/lineup.rds'), ('Occupational Employment Statistics (OES)', 'https://assets.datacamp.com/production/repositories/1219/datasets/1e1ec9f146a25d7c71a6f6f0f46c3de7bcefd36c/oes.rds'), ('Wholesale customer spending', 'https://assets.datacamp.com/production/repositories/1219/datasets/3558d2b5564714d85120cb77a904a2859bb3d03e/ws_customers.rds')]",['Intermediate R'],https://www.datacamp.com/courses/cluster-analysis-in-r,Machine Learning,R
40,Clustering Methods with SciPy,4,14,46,"3,032","3,650",Clustering Methods SciPy,"Clustering Methods with SciPy
You have probably come across Google News, which automatically groups similar news articles under a topic. Have you ever wondered what process runs in the background to arrive at these groups? In this course, you will be introduced to unsupervised learning through clustering using the SciPy library in Python. This course covers pre-processing of data and application of hierarchical and k-means clustering. Through the course, you will explore player statistics from a popular football video game, FIFA 18. After completing the course, you will be able to quickly apply various clustering algorithms on data, visualize the clusters formed and analyze results.
Before you are ready to classify news articles, you need to be introduced to the basics of clustering. This chapter familiarizes you with a class of machine learning algorithms called unsupervised learning and then introduces you to clustering, one of the popular unsupervised learning algorithms. You will know about two popular clustering techniques - hierarchical clustering and k-means clustering. The chapter concludes with basic pre-processing steps before you start clustering data.
This chapter focuses on a popular clustering algorithm - hierarchical clustering - and its implementation in SciPy. In addition to the procedure to perform hierarchical clustering, it attempts to help you answer an important question - how many clusters are present in your data? The chapter concludes with a discussion on the limitations of hierarchical clustering and discusses considerations while using hierarchical clustering.
This chapter introduces a different clustering algorithm - k-means clustering - and its implementation in SciPy. K-means clustering overcomes the biggest drawback of hierarchical clustering that was discussed in the last chapter. As dendrograms are specific to hierarchical clustering, this chapter discusses one method to find the number of clusters before running k-means clustering. The chapter concludes with a discussion on the limitations of k-means clustering and discusses considerations while using this algorithm.
Now that you are familiar with two of the most popular clustering techniques, this chapter helps you apply this knowledge to real-world problems. The chapter first discusses the process of finding dominant colors in an image, before moving on to the problem discussed in the introduction - clustering of news articles. The chapter concludes with a discussion on clustering with multiple variables, which makes it difficult to visualize all the data.",[],"['Shaumik Daityari', 'Hillary Green-Lerman', 'Sara Billen']","[('FIFA sample', 'https://assets.datacamp.com/production/repositories/3842/datasets/10b1fd2d470d12f2486be7ffb05ab96a1b745631/fifa_18_sample_data.csv'), ('FIFA', 'https://assets.datacamp.com/production/repositories/3842/datasets/2f0473692782600a2b7c0f7d4a0dc38295c87015/fifa_18_dataset.csv'), ('Movies', 'https://assets.datacamp.com/production/repositories/3842/datasets/8bae4cc436725404038a278f6439b096bebbfd34/movies_plot.csv')]",['Intermediate Python for Data Science'],https://www.datacamp.com/courses/clustering-methods-with-scipy,Machine Learning,Python
41,Command Line Automation in Python,4,16,51,453,"3,950",Command Line Automation,"Command Line Automation in Python
There are certain skills that will stay with you your entire life. One of those skills is learning to automate things. There is a motto for automation that gets straight to the point, ""If it isn't automated...it's broken"". In this course, you learn to adopt this mindset. In one of the many examples, you will create automation code that will traverse a filesystem, find files that match a pattern, and then detect which files are duplicates. Following the course, you will be able to automate many common file system tasks and be able to manage and communicate with Unix processes.
Learn to use powerful IPython shell commands that will enhance your day to day coding. These commands include SList objects that can sort and filter shell output all from the comfort of the IPython terminal.
Learn to harness Unix processes with the subprocess module. By combining the output and input of scripts, processes, and applications, you'll create pipelines to automate complex tasks.
Use the pathlib module to perform file system operations in Python. You'll learn to write tools to walk the filesystem, write files and archive directories all with a few lines of code.
Learn how to use functions to automate complex workflows. You'll use the click command line tool module to create sophisticated command line tools in a few lines of code.",[],"['Noah Gift', 'Hillary Green-Lerman', 'Adrián Soto']",[],"['Intermediate Python for Data Science', 'Introduction to Shell for Data Science']",https://www.datacamp.com/courses/command-line-automation-in-python,Programming,Python
42,Communicating with Data in the Tidyverse,4,15,53,"8,497","4,350",Communicating Data in Tidyverse,"Communicating with Data in the Tidyverse
They say that a picture is worth a thousand words. Indeed, successfully promoting your data analysis is not only a matter of accurate and effective graphics, but also of aesthetics and uniqueness. This course teaches you how to leverage the power of ggplot2 themes for producing publication-quality graphics that stick out from the mass of boilerplate plots out there. It shows you how to tweak and get the most out of ggplot2 in order to produce unconventional plots that draw attention on social media. In the end, you will combine that knowledge to produce a slick and custom-styled report with RMarkdown and CSS – all of that within the powerful tidyverse.
In this chapter, you will have a first look at the data you're going to work with throughout this course: the relationship between weekly working hours and monetary compensation in European countries, according to the International Labour Organization (ILO). After that, you'll dive right in and discover a stunning correlation by employing an exploratory visualization. You will then apply a custom look to that graphic – you'll turn an ordinary plot into an aesthetically pleasing and unique data visualization.
Barcharts, scatter plots, and histograms are probably the most common and effective data visualizations. Yet, sometimes, there are even better ways to visually highlight the finding you want to communicate to your audience. So-called ""dot plots"" make us better grasp and understand changes in data: development over time, for example. In this chapter, you'll build a custom and unique visualization that emphasizes and explains exactly one aspect of the story you want to tell.
Back in the old days, researchers and data analysts used to generate plots in R and then tediously copy them into their LaTeX or Word documents. Nowadays, whole reports can be produced and reproduced from within R and RStudio, using the RMarkdown language – combining R chunks, formatted prose, tables and plots. In this chapter, you'll take your previous findings, results, and graphics and integrate them into such a report to tell the story that needs to be told.
Your boss, your client, or your professor usually expects your results to be accurate and presented in a clear and concise structure. However, coming up with a nicely formatted and unique report on top of that is certainly a plus and RMarkdown can be customized to accomplish this. In this last chapter, you'll take your report from the last chapter and brand it with your own custom and unique style.","['Data Analyst with R', 'Data Scientist with R', 'Tidyverse Fundamentals with R']","['Timo Grossenbacher', 'Yashas Roy', 'Chester Ismay']","[('Hourly Compensation (ILO)', 'https://assets.datacamp.com/production/repositories/1464/datasets/a252b4b4a25229cb654fc4e4864cb1ea78e68c03/ilo_hourly_compensation.RData'), ('Weekly Working Hours (ILO)', 'https://assets.datacamp.com/production/repositories/1464/datasets/49e22cc7d46a440348c920c621e75b0681120edb/ilo_working_hours.RData')]",['Introduction to the Tidyverse'],https://www.datacamp.com/courses/communicating-with-data-in-the-tidyverse,Data Visualization,R
43,Conda Essentials,3,0,28,"6,663","2,100",Conda Essentials,"Conda Essentials
Software is constantly evolving, so data scientists need a way to update the software they are using without breaking things that already work. Conda is an open source, cross-platform tool for managing packages and working environments for many different programming languages. This course explains how to use its core features to manage your software so that you and your colleagues can reproduce your working environments reliably with minimum effort.
This chapter shows you how to install, update and remove packages using conda.
In this chapter you will learn how to search and install packages from various channels with Conda.
This chapter shows you how work with Conda environments.
This chapter shows you how to easily manage projects using environments.","['Data Scientist with Python', 'Python Programmer']","['David Mertz', 'Albert DeFusco', 'Dhavide Aruliah', 'Sumedh Panchadhar']",[],['Introduction to Shell for Data Science'],https://www.datacamp.com/courses/conda-essentials,Programming,Shell
44,Conda for Building & Distributing Packages,3,0,28,"1,119","2,150",Conda Building & Distributing Packages,"Conda for Building & Distributing Packages
Now that you're proficient in many areas of data science with Python it's time to share your code and data with others. In this course you'll learn the fundamentals of sharing your data science assets. You'll learn how to leverage Anaconda Projects to package data, code, and conda environments into a single archive for other data scientists to run. You'll learn the basics of creating Python packages that provide importable modules. Finally, you'll learn how to write Conda recipes for your packages, build them, and share them on Anaconda Cloud.
Anaconda Projects allow you to package code, data, and Conda environments for others to use easily. Starting with with simple data science applications you'll create Anaconda Project archives that enable reproducible data science.
In this chapter you'll learn how to transform your Python scipts into modules and packages. You'll learn how to use setuptools to specify important metadata like version numbers and licenses.
Now that you have prepared your Python package using setuptools in this chapter you'll learn how to write a Conda recipe. Conda recipes describe the required Conda packages to build and run your package. You'll then build cross-platform packages and upload them to Anaconda Cloud.",[],"['Albert DeFusco', 'David Mertz', 'Dhavide Aruliah']",[],['Conda Essentials'],https://www.datacamp.com/courses/conda-for-building-distributing-packages,Programming,Shell
45,Conditional Formatting in Spreadsheets,4,14,51,"1,162","4,400",Conditional Formatting in Spreadsheets,"Conditional Formatting in Spreadsheets
Spreadsheets often suffer from having too much data. If you want to tell the underlying story that is in the data without creating additional reports, conditional formatting can help! Whether it's showing the age of your inventory by highlighting the items using a color scale, or accentuating the largest variances in year over year financial data, conditional formatting has built-in options that can be used without any complex code. It can be used instead of sorting or filtering since it works with the data that is already there! By the end, you will be creating your own report using conditional formatting to analyze a company's payroll.
Learn what conditional formatting is and how it can be used to emphasize the important data in a spreadsheet. We will discuss a variety of the built-in options you can use to apply conditional formatting rules to your data.
In this chapter, you will learn how to apply conditional formatting in more flexible ways. We'll discuss a variety of functions you can use to create conditional formatting rules with custom formulas.
Learn tricks to use conditional formatting in unique ways! In this chapter, you will learn more functions, build your own searches, and make interactive task lists with checkboxes.
In this chapter, you will use everything you have learned about conditional formatting to analyze a company's payroll. You will be working with dates, looking for duplicates, and checking for errors to create your report.",[],"['Adam Steinfurth', 'Chester Ismay', 'Amy Peterson']",[],[],https://www.datacamp.com/courses/conditional-formatting-in-spreadsheets,Data Manipulation,Spreadsheets
46,Convolutional Neural Networks for Image Processing,4,13,45,"8,809","3,650",Convolutional Neural Networks Image Processing,"Convolutional Neural Networks for Image Processing
Deep learning methods use data to train neural network algorithms to do a variety of machine learning tasks, such as classification of different classes of objects. Convolutional neural networks are deep learning algorithms that are particularly powerful for analysis of images. This course will teach you how to construct, train and evaluate convolutional neural networks. You will also learn how to improve their ability to learn from data, and how to interpret the results of the training.
Convolutional neural networks use the data that is represented in images to learn. In this chapter, we will probe data in images, and we will learn how to use Keras to train a neural network to classify objects that appear in images.
Convolutions are the fundamental building blocks of convolutional neural networks. In this chapter, you will be introducted to convolutions and learn how they operate on image data. You will also see how you incorporate convolutions into Keras neural networks.
Convolutional neural networks gain a lot of power when they are constructed with multiple layers (deep networks). In this chapter, you will learn how to stack multiple convolutional layers into a deep network. You will also learn how to keep track of the number of parameters, as the network grows, and how to control this number.
There are many ways to improve training by neural networks. In this chapter, we will focus on our ability to track how well a network is doing, and explore approaches towards improving convolutional neural networks.",[],"['Ariel Rokem', 'Lore Dirick', 'Eunkyung Park', 'Sumedh Panchadhar']","[('Shutterstock straight', 'https://assets.datacamp.com/production/repositories/1820/datasets/7ae58c178550ca7d108bcec7a9af0957b7a6a571/shutterstock_straight.jpg')]",['Deep Learning in Python'],https://www.datacamp.com/courses/convolutional-neural-networks-for-image-processing,Machine Learning,Python
47,Correlation and Regression,4,18,58,"43,009","4,200",Correlation and Regression,"Correlation and Regression
Ultimately, data analysis is about understanding relationships among variables. Exploring data with multiple variables requires new, more complex tools, but enables a richer set of comparisons. In this course, you will learn how to describe relationships between two numerical quantities. You will characterize these relationships graphically, in the form of summary statistics, and through simple linear regression models.
In this chapter, you will learn techniques for exploring bivariate relationships.
This chapter introduces correlation as a means of quantifying bivariate relationships.
With the notion of correlation under your belt, we'll now turn our attention to simple linear models in this chapter.
This chapter looks at how to interpret the coefficients in a regression model.
In this final chapter, you'll learn how to assess the ""fit"" of a simple linear regression model.","['Data Analyst with R', 'Data Scientist with R', 'Statistics Fundamentals with R']","['Ben Baumer', 'Nick Carchedi', 'Tom Jeon']",[],"['Introduction to R', 'Introduction to Data', 'Exploratory Data Analysis']",https://www.datacamp.com/courses/correlation-and-regression,Probability & Statistics,R
48,Course Creation at DataCamp,3,20,69,540,"4,050",Course Creation at DataCamp,"Course Creation at DataCamp
Welcome to the DataCamp family! You are about to begin creating a course that, in just a few months, will be available to over 3 million students worldwide! If you're new to eLearning, you'll soon learn that teaching an online course is very different from teaching in a classroom. But we're here to help! This course will provide a guide to the DataCamp Course Creation process; an introduction to the tools we use, including GitHub, Asana, and our very own course editor; and the different types of exercises and slides you can use, and how to make sure you're reaching students at the other end of the screen. While creating your course, you will find you have other questions, such as, ""How will my course be marketed?"", ""How do I recommend other instructors to DataCamp?"", or ""When do I get paid?"". This course will also provide you with direction on where to find answers to all your questions. Following this course, you should be familiar with the DataCamp Course Creation process and be ready to start your very own DataCamp course. Have fun and see you in the course!
Are you interested in creating a DataCamp course, but not sure what exactly to expect? This introductory chapter will you give an overview of the different phases of course creation and the people you'll work with during each phase. You'll focus on the first two phases: course design and course development and meet Curriculum Leads and Content Developers, who will be your guides.
Before diving deep into pedagogy and the nitty-gritty details of DataCamp exercises, it's important to learn the values we hold ourselves and our instructors to when building a course, namely accountability, predictability, and transparency. Furthermore, it is vital to understand the tools we use, how they work, and how they support our values.
At DataCamp, we strive for quality in our content, our product, and our instructors. We do this by building our courses with a specific structure around learning objectives. We've built this structure so that our students get the best eLearning experience. We also know that this can be a challenge, so in this chapter, we provide a few tips and tricks on how to teach effectively on an eLearning platform.
Now that you know our tools and the tricks to making a great course, dive into the nitty-gritty of DataCamp courses. In this chapter, you'll learn about how to create videos and the different types of interactive exercises we support on our platform. You'll learn about the different parts of interactive exercises and the guidelines we follow to ensure our courses keep our students engaged. Lastly, you'll return to GitHub and learn about how it's used to get all the content from your videos and exercises reviewed to ensure it's top quality!
Many things happen after a course has been designed and developed; namely, it must be launched! In this chapter, you will learn about the different aspects of course launch, the work that goes into a course following its launch, and importantly, how you will get paid for your course. If you have enjoyed creating a course, and want to make more DataCamp content, you will find out all we have to offer!",[],"['Content Team', 'Chester Ismay', 'Yashas Roy', 'Adrián Soto', 'Nick Carchedi', 'Becca Robins', 'Mari Nazary', 'Hadrien Lacroix', 'Martijn Theuwissen', 'Amy Peterson', 'Sara Billen', 'Hillary Green-Lerman', 'Mona Khalil', 'Jen Bricker', 'David Venturi', 'David Campos', 'kaelen medeiros', 'Sascha Mayr', 'Jeroen Hermans', 'Shon Inouye', 'Sumedh Panchadhar']",[],[],https://www.datacamp.com/courses/course-creation-at-datacamp,Other,R
49,Creating Robust Python Workflows,4,16,47,748,"3,900",Creating Robust Python Workflows,"Creating Robust Python Workflows
The decisions we make in life are guided by our principles. No one is born with a life philosophy, instead everyone creates their own over time. In this course, you will develop a set of principles for your data science and software development projects. These principles will save time, prevent frustration, and build your confidence as a data scientist and software developer. In addition to best practices in the Python programming language, You will learn to leverage hidden gems in the Python standard library and well-known tools from Python's excellent ecosystem, such as pandas and scikit-learn. The time you invest in this course will yield dividends for you and others throughout your career. Your colleagues, community members, and future self will thank you.
In this chapter, we will discuss three principles that guide decisions made by Python programmers. You will put these principles into practice in the coding exercises and throughout the rest of the course!
Documentation and tests are often overlooked, despite being essential to the success of all projects. In this chapter, you will learn how to include documentation in our code and practice Test-Driven Development (TDD), a process that puts tests first!
Shell scripting is an essential part of any Python workflow. In this chapter, you will learn how to build command-line interfaces (CLIs) for Python programs and to automate common tasks related to version control, virtual environments, and Python packaging.
In the final chapter of this course, you will learn how to facilitate and standardize project setup using project templates. You will also consider the benefits of zipped executable projects, Jupyter notebooks parameterization, and parallel computing.",[],"['Martin Skarzynski', 'Chester Ismay', 'Sara Billen']",[],"['Python Data Science Toolbox (Part 2)', 'Supervised Learning with scikit-learn']",https://www.datacamp.com/courses/creating-robust-python-workflows,Programming,Python
50,Credit Risk Modeling in Python,4,15,57,164,"4,850",Credit Risk Modeling,"Credit Risk Modeling in Python
If you've ever applied for a credit card or loan, you know that financial firms process your information before making a decision. This is because giving you a loan can have a serious financial impact on their business. But how do they make a decision? In this course, you will learn how to prepare credit application data. After that, you will apply machine learning and business rules to reduce risk and ensure profitability. You will use two data sets that emulate real credit applications while focusing on business value. Join me and learn the expected value of credit risk modeling!
In this first chapter, we will discuss the concept of credit risk and define how it is calculated. Using cross tables and plots, we will explore a real-world data set. Before applying machine learning, we will process this data by finding and resolving problems.
With the loan data fully prepared, we will discuss the logistic regression model which is a standard in risk modeling. We will understand the components of this model as well as how to score its performance. Once we've created predictions, we can explore the financial impact of utilizing this model.
Decision trees are another standard credit risk model. We will go beyond decision trees by using the trendy XGBoost package in Python to create gradient boosted trees. After developing sophisticated models, we will stress test their performance and discuss column selection in unbalanced data.
After developing and testing two powerful machine learning models, we use key performance metrics to compare them. Using advanced model selection techniques specifically for financial modeling, we will select one model. With that model, we will: develop a business strategy, estimate portfolio value, and minimize expected loss.",[],"['Michael Crabtree', 'Mona Khalil', 'Ruanne Van Der Walt']","[('Raw credit data', 'https://assets.datacamp.com/production/repositories/4876/datasets/a2d8510b4aec8d0ac14ab9bee61ba3c085805967/cr_loan2.csv'), ('Clean credit data (outliers and missing data removed)', 'https://assets.datacamp.com/production/repositories/4876/datasets/33e400c8f73329d290c6c25eef33de458b4db1bf/cr_loan_nout_nmiss.csv'), ('Credit data (ready for modeling)', 'https://assets.datacamp.com/production/repositories/4876/datasets/2f6c17f10d5156a29670d1926fdf7125c002e038/cr_loan_w2.csv')]","['Intro to Python for Finance', 'Intermediate Python for Data Science']",https://www.datacamp.com/courses/credit-risk-modeling-in-python,Applied Finance,Python
51,Credit Risk Modeling in R,4,16,52,"32,551","4,000",Credit Risk Modeling in R,"Credit Risk Modeling in R
This chapter begins with a general introduction to credit risk models. We'll explore a real-life data set, then preprocess the data set such that it's in the appropriate format before applying the credit risk models.
Logistic regression is still a widely used method in credit risk modeling. In this chapter, you will learn how to apply logistic regression models on credit data in R.
Classification trees are another popular method in the world of credit risk modeling. In this chapter, you will learn how to build classification trees using credit data in R.
In this chapter, you'll learn how you can evaluate and compare the results obtained through several credit risk models.",['Quantitative Analyst with R'],['Lore Dirick'],"[('Loan Data Chapter 1', 'https://assets.datacamp.com/production/repositories/162/datasets/8f48a2cbb6150e7ae32435e55f271cad5b4b8ecf/loan_data_ch1.rds'), ('Loan Data Chapter 2, 3 and 4', 'https://assets.datacamp.com/production/repositories/162/datasets/89fa0b5120b58ae561ac53073163bd133240ac06/loan_data_ch2.rds')]","['Introduction to R for Finance', 'Intermediate R for Finance']",https://www.datacamp.com/courses/introduction-to-credit-risk-modeling-in-r,Applied Finance,R
52,Customer Analytics & A/B Testing in Python,4,16,49,"4,389","3,750",Customer Analytics & A/B Testing,"Customer Analytics & A/B Testing in Python
The most successful companies today are the ones that know their customers so well that they can anticipate their needs.
Customer analytics and in particular A/B Testing are crucial parts of
leveraging quantitative know-how to help make business decisions that generate value. This course
covers the ins and outs of how to use Python to
analyze customer behavior and business trends as well as how to create, run,
and analyze A/B tests to make proactive, data-driven business decisions.
This chapter provides a brief introduction to the content that will be covered throughout the course before transitioning into a discussion of Key Performance Indicators or KPIs. You'll learn how to identify and define meaningful KPIs through a combination of critical thinking and leveraging Python tools. These techniques are all presented in a highly practical and generalizable way. Ultimately these topics serve as the core foundation for the A/B testing discussion that follows.
This chapter teaches you how to visualize, manipulate, and explore KPIs as they change over time. Through a variety of examples, you'll learn how to work with datetime objects to calculate metrics per unit time. Then we move to the techniques for how to graph different segments of data, and apply various smoothing functions to reveal hidden trends. Finally we walk through a complete example of how to pinpoint issues through exploratory data analysis of customer data. Throughout this chapter various functions are introduced and explained in a highly generalizable way.
In this chapter you will dive fully into A/B testing. You will learn the mathematics and knowledge needed to design and successfully plan an A/B test from determining an experimental unit to finding how large a sample size is needed. Accompanying this will be an introduction to the functions and code needed to calculate the various quantities associated with a statistical test of this type.
After running an A/B test, you must analyze the data and then effectively communicate the results. This chapter begins by interleaving the theory of statistical significance and confidence intervals with the tools you need to calculate them yourself from the data. Next we discuss how to effectively visualize and communicate these results. This chapter is the culmination of all the knowledge built over the entire course.",[],"['Ryan Grossman', 'Lore Dirick', 'Yashas Roy', 'Eunkyung Park']","[('Customer dataset', 'https://assets.datacamp.com/production/repositories/1646/datasets/c3a701a4729471ae0b92d8c300b470fd2ec0a73a/user_demographics_v1.csv'), ('In-App Purchases dataset', 'https://assets.datacamp.com/production/repositories/1646/datasets/5decd183ef3710475958bbc903160fd6354379d5/purchase_data_v1.csv'), ('Daily Revenue dataset', 'https://assets.datacamp.com/production/repositories/1646/datasets/3afb49cad9fb91c02b71b52a2ddc0071ea13764c/daily_revenue.csv'), ('User Demographics Paywall dataset', 'https://assets.datacamp.com/production/repositories/1646/datasets/01054025eb094ac1086edf8d206b313b84d911c5/user_demographics_paywall.csv'), ('AB Testing Results', 'https://assets.datacamp.com/production/repositories/1646/datasets/2751adce60684a03d8b4132adeadab8a0b95ee56/AB_testing_exercise.csv')]","['Python Data Science Toolbox (Part 1)', 'Data Types for Data Science', 'pandas Foundations', 'Manipulating DataFrames with pandas']",https://www.datacamp.com/courses/customer-analytics-ab-testing-in-python,Probability & Statistics,Python
53,Customer Segmentation in Python,4,17,55,"5,047","4,400",Customer Segmentation,"Customer Segmentation in Python
The most successful companies today are the ones that know their customers so well that they can anticipate their needs. Data analysts play a key role in unlocking these in-depth insights, and segmenting the customers to better serve them. In this course, you will learn real-world techniques on customer segmentation and behavioral analytics, using a real dataset containing anonymized customer transactions from an online retailer. You will first run cohort analysis to understand customer trends. You will then learn how to build easy to interpret customer segments. On top of that, you will prepare the segments you created, making them ready for machine learning. Finally, you will make your segments more powerful with k-means clustering, in just few lines of code! By the end of this course, you will be able to apply practical customer behavioral analytics and segmentation techniques.
In this first chapter, you will learn about cohorts and how to analyze them. You will create your own customer cohorts, get some metrics and visualize your results.
In this second chapter, you will learn about customer segments. Specifically, you will get exposure to recency, frequency and monetary value, create customer segments based on these concepts, and analyze your results.
Once you created some segments, you want to make predictions. However, you first need to master practical data preparation methods to ensure your k-means clustering algorithm will uncover well-separated, sensible segments.
In this final chapter, you will use the data you pre-processed in Chapter 3 to identify customer clusters based on their recency, frequency, and monetary value.",[],"['Karolis Urbonas', 'Hadrien Lacroix', 'Mari Nazary']","[('Chapter 1 datasets', 'https://assets.datacamp.com/production/repositories/3202/datasets/40378e0b8f88bffddc938f335bc68baa8fdf0b18/chapter_1.zip'), ('Chapter 2 datasets', 'https://assets.datacamp.com/production/repositories/3202/datasets/9c670a495912949de0166c3ce690bad536ccf621/chapter_2.zip'), ('Chapter 3 datasets', 'https://assets.datacamp.com/production/repositories/3202/datasets/cc496bdfda1d59a462bf7ff3e4117bcd34c76b35/chapter_3.zip'), ('Chapter 4 datasets', 'https://assets.datacamp.com/production/repositories/3202/datasets/eb6a32ed7e5faa4c4b237ab8afb94df55bb4b3a5/chapter_4.zip')]","['Manipulating DataFrames with pandas', 'Supervised Learning with scikit-learn']",https://www.datacamp.com/courses/customer-segmentation-in-python,Machine Learning,Python
54,Data Analysis with Spreadsheets,3,0,27,"19,730","2,700",Data Analysis Spreadsheets,"Data Analysis with Spreadsheets
This course will dig deeper into some of the core functionality of Google Sheets. There's a whole bunch of predefined functions we'll cover, like `SUM()` and `AVERAGE()`, and `VLOOKUP()`. We'll apply these techniques to do some analysis on your grades in school, look at performance statistics within a company, track monthly sales, and look at some real geographical information about the countries of the world.
This chapter introduces a very useful feature in Google Sheets: predefined functions. You'll use these functions to solve complex problems without having to worry about specific calculations. We’ll cover a lot of predefined functions, including functions for numbers, functions for strings, and functions for dates.
In the last chapter of the course, you'll master more advanced functions like IF() and VLOOKUP(). Conditional and lookup functions won’t seem so scary after you completed this chapter.",[],"['Vincent Vankrunkelsven', 'Sascha Mayr']",[],[],https://www.datacamp.com/courses/data-analysis-with-spreadsheets,Programming,Spreadsheets
55,Data Manipulation in R with data.table,4,15,59,"3,387","5,050",Data Manipulation in R data.table,"Data Manipulation in R with data.table
The data.table package provides a high-performance version of base R's data.frame with syntax and feature enhancements for ease of use, convenience and programming speed. This course shows you how to create, subset, and manipulate data.tables. You'll also learn about the database-inspired features of data.tables, including built-in groupwise operations. The course concludes with fast methods of importing and exporting tabular text data such as CSV files. Upon completion of the course, you will be able to use data.table in R for a more efficient manipulation and analysis process. Throughout the course you'll explore the San Francisco Bay Area bike share trip dataset from 2014.
This chapter introduces data.tables as a drop-in replacement for data.frames and shows how to use data.table's i argument to filter rows.
Just as the i argument lets you filter rows, the j argument of data.table lets you select columns and also perform computations. The syntax is far more convenient and flexible when compared to data.frames.
This chapter introduces data.table's by argument that lets you perform computations by groups. By the end of this chapter, you will master the concise DT[i, j, by] syntax of data.table.
You will learn about a unique feature of data.table in this chapter: modifying existing data.tables in place. Modifying data.tables in place makes your operations incredibly fast and is easy to learn.
Not only does the data.table package help you perform incredibly fast computations, it can also help you read and write data to disk with amazing speeds. This chapter focuses on data.table's fread() and fwrite() functions which let you import and export flat files quickly and easily!","['Data Analyst with R', 'Data Manipulation with R']","['Matt Dowle', 'Arun Srinivasan', 'Sascha Mayr', 'Benjamin Feder', 'Eunkyung Park', 'Sumedh Panchadhar']",[],"['Introduction to R', 'Intermediate R']",https://www.datacamp.com/courses/data-manipulation-in-r-with-datatable,Programming,R
56,Data Manipulation with dplyr in R,4,13,46,869,"3,850",Data Manipulation dplyr in R,"Data Manipulation with dplyr in R
Say you've found a great dataset and would like to learn more about it. How can you start to answer the questions you have about the data? You can use dplyr to answer those questions—it can also help with basic transformations of your data. You'll also learn to aggregate your data and add, remove, or change the variables. Along the way, you'll explore a dataset containing information about counties in the United States. You'll finish the course by applying these tools to the babynames dataset to explore trends of baby names in the United States.
Learn verbs you can use to transform your data, including select, filter, arrange, and mutate. You'll use these functions to modify the counties dataset to view particular observations and answer questions about the data.
Now that you know how to transform your data, you'll want to know more about how to aggregate your data to make it more interpretable. You'll learn a number of functions you can use to take many observations in your data and summarize them, including count, group_by, summarize, ungroup, and top_n.
Learn advanced methods to select and transform columns. Also learn about select helpers, which are functions that specify criteria for columns you want to choose, as well as the rename and transmute verbs.
Work with a new dataset that represents the names of babies born in the United States each year. Learn how to use grouped mutates and window functions to ask and answer more complex questions about your data. And use a combination of dplyr and ggplot2 to make interesting graphs to further explore your data.",[],"['Chris Cardillo', 'Amy Peterson']","[('counties', 'https://assets.datacamp.com/production/repositories/4984/datasets/a924bf7063f02a5445e1f49cc1c75c78e018ac4c/counties.rds'), ('babynames', 'https://assets.datacamp.com/production/repositories/4984/datasets/a924ac5d86adba2e934d489cb9db446236f62b2c/babynames.rds')]",['Introduction to the Tidyverse'],https://www.datacamp.com/courses/data-manipulation-with-dplyr-in-r,Data Manipulation,R
57,Data Privacy and Anonymization in R,4,13,45,"1,998","3,650",Data Privacy and Anonymization in R,"Data Privacy and Anonymization in R
With social media and big data everywhere, data privacy has been a growing, public concern. Recognizing this issue, entities such as Google, Apple, and the US Census Bureau are promoting better privacy techniques; specifically differential privacy, a mathematical condition that quantifies privacy risk. In this course, you will learn to code basic data privacy methods and a differentially private algorithm based on various differentially private properties. With these tools in hand, you will learn how to generate a basic synthetic (fake) data set with the differential privacy guarantee for public data release.
This chapter covers some basic data privacy techniques that statisticians use to anonymize data. You'll first learn how to remove identifiers and then generate synthetic data from probability distributions.
After covering the basic data privacy techniques, you'll learn conceptually about differential privacy as well as how to implement the most popular and common differentially private algorithm called the Laplace mechanism.
In this chapter, you will learn the various properties of differential privacy, such as the combination rules and post-processing, to properly implement the Laplace mechanism for various kinds data questions.
In this chapter, you will learn how to release simple data sets publicly using differentially private data synthesis techniques.",['R Programmer'],"['Claire Bowen', 'Chester Ismay', 'Sumedh Panchadhar']","[('Data sets', 'https://assets.datacamp.com/production/repositories/1939/datasets/5c7ae991cdefeb4897bc38c6102b11dec40889fd/data.RData')]","['Intermediate R', 'Foundations of Probability in R', 'Introduction to the Tidyverse']",https://www.datacamp.com/courses/data-privacy-and-anonymization-in-r,Other,R
58,Data Processing in Shell,4,13,46,102,"3,550",Data Processing in Shell,"Data Processing in Shell
We live in a busy world with tight deadlines. As a result, we fall back on what is familiar and easy, favoring GUI interfaces like Anaconda and RStudio. However, taking the time to learn data analysis on the command line is a great long-term investment because it makes us stronger and more productive data people.
In this course, we will take a practical approach to learn simple, powerful, and data-specific command-line skills. Using publicly available Spotify datasets, we will learn how to download, process, clean, and transform data, all via the command line. We will also learn advanced techniques such as command-line based SQL database operations. Finally, we will combine the powers of command line and Python to build a data pipeline for automating a predictive model.
In this chapter, we learn how to download data files from web servers via the command line. In the process, we also learn about documentation manuals, option flags, and multi-file processing.
We continue our data journey from data downloading to data processing. In this chapter, we utilize the command line library csvkit to convert, preview, filter and manipulate files to prepare our data for further analyses.
In this chapter, we dig deeper into all that csvkit library has to offer. In particular, we focus on database operations we can do on the command line, including table creation, data pull, and various ETL transformation.
In the last chapter, we bridge the connection between command line and other data science languages and learn how they can work together. Using Python as a case study, we learn to execute Python on the command line, to install dependencies using the package manager pip, and to build an entire model pipeline using the command line.",[],"['Susan Sun', 'Hillary Green-Lerman', 'Adrián Soto']","[('Spotify Songs Popularity Ranking', 'https://assets.datacamp.com/production/repositories/4180/datasets/82c41048fc89f03f3b0a4122642bc4fd39306071/Spotify_Popularity.csv'), ('Spotify Song Attributes', 'https://assets.datacamp.com/production/repositories/4180/datasets/513986f5ea7ed9a8565bba20d088d21c10e099dc/Spotify_MusicAttributes.csv')]","['Introduction to Shell for Data Science', 'Intermediate Python for Data Science', 'Intro to SQL for Data Science']",https://www.datacamp.com/courses/data-processing-in-shell,Data Manipulation,Shell
59,Data Science for Managers,4,14,51,"11,000","3,350",Data Science Managers,"Data Science for Managers
What is data science and how can you use it to strengthen your organization? This course will teach you about the skills you need on your data team, and how you can structure that team to meet your organization's needs. Data is everywhere! This course will provide you with an understanding of data sources your company can use and how to store that data. You'll also discover ways to analyze and visualize your data through dashboards and A/B tests. To wrap up the course, we'll discuss exciting topics in machine learning, including clustering, time series prediction, natural language processing (NLP), deep learning, and explainable AI! Along the way, you'll learn about a variety of real-world applications of data science and gain a better understanding of these concepts through practical exercises.
We'll start the course by defining what data science is. We'll cover the data science workflow, and how data science is applied to real-world business problems. We'll finish the chapter by learning about ways to structure your data team to meet your organization's needs.
Now that we understand the data science workflow, we'll dive deeper into the first step: data collection. We'll learn about the different data sources your company can draw from, and how to store that data once it's collected.
In this chapter, we'll discuss ways to explore and visualize data through dashboards. We'll discuss the elements of a dashboard and how to make a directed request for a dashboard. This chapter will also cover making ad hoc data requests and A/B tests, which are a powerful analytics tool that de-risk decision-making.
In this final chapter, we'll discuss the buzziest topic in data science: machine learning! We'll cover supervised and unsupervised machine learning, and clustering. Then, we'll move on to special topics in machine learning, including time series prediction, natural language processing, deep learning, and explainable AI!",[],"['Mari Nazary', 'Michael Chow', 'kaelen medeiros', 'Ramnath Vaidyanathan', 'Amy Peterson', 'Hillary Green-Lerman']",[],[],https://www.datacamp.com/courses/data-science-for-managers,Management,Theory
60,Data Types for Data Science,4,18,58,"12,577","4,850",Data Types Data Science,"Data Types for Data Science
Have you got your basic Python programming chops down for Data Science but are yearning for more? Then this is the course for you. Herein, you'll consolidate and practice your knowledge of lists, dictionaries, tuples, sets, and date times. You'll see their relevance in working with lots of real data and how to leverage several of them in concert to solve multistep problems, including an extended case study using Chicago metropolitan area transit data. You'll also learn how to use many of the objects in the Python Collections module, which will allow you to store and manipulate your data for a variety of Data Scientific purposes. After taking this course, you'll be ready to tackle many Data Science challenges Pythonically.
This chapter will introduce you to the fundamental Python data types - lists, sets, and tuples. These data containers are critical as they provide the basis for storing and looping over ordered data. To make things interesting, you'll apply what you learn about these types to answer questions about the New York Baby Names dataset!
At the root of all things Python is a dictionary. Herein, you'll learn how to use them to safely handle data that can viewed in a variety of ways to answer even more questions about the New York Baby Names dataset. You'll explore how to loop through data in a dictionary, access nested data, add new data, and come to appreciate all of the wonderful capabilities of Python dictionaries.
The collections module is part of Python's standard library and holds some more advanced data containers. You'll learn how to use the Counter, defaultdict, OrderedDict and namedtuple in the context of answering questions about the Chicago transit dataset.
Handling times can seem daunting at time, but here, you'll dig in and learn how to create datetime objects, print them, look to the past and to the future. Additionally, you'll learn about some third party modules that can make all of this easier. You'll continue to use the Chicago Transit dataset to answer questions about transit times.
Time for a case study to reinforce all of your learning so far! You'll use all the containers and data types you've learned about to answer several real world questions about a dataset containing information about crime in Chicago. Have fun!",[],"['Jason Myers', 'Hugo Bowne-Anderson', 'Yashas Roy']","[('Baby names', 'https://assets.datacamp.com/production/repositories/906/datasets/8043b235dab7ca9b3667df9195459bc6bf754c2a/baby_names.csv'), ('Chicago crime', 'https://assets.datacamp.com/production/repositories/906/datasets/7fe0304955dbf05e3a0d57c8959578dcef479e81/crime_sampler.csv'), ('CTA daily station totals', 'https://assets.datacamp.com/production/repositories/906/datasets/b7806a5db41c23931fd1adf02af54ac10c15e61c/cta_daily_station_totals.csv'), ('CTA daily summary totals', 'https://assets.datacamp.com/production/repositories/906/datasets/0c8af86b914fd9edfd3d907b6006fefaadaf827b/cta_daily_summary_totals.csv')]","['Introduction to Python', 'Intermediate Python for Data Science']",https://www.datacamp.com/courses/data-types-for-data-science,Programming,Python
61,Data Visualization in R,4,15,60,"33,694","5,250",Data Visualization in R,"Data Visualization in R
This chapter gives a brief overview of some of the things you can do with base graphics in R. This graphics system is one of four available in R and it forms the basis for this course because it is both the easiest to learn and extremely useful both in preparing exploratory data visualizations to help you see what's in a dataset and in preparing explanatory data visualizations to help others see what we have found.
This chapter introduces several Base R supported plot types that are particularly useful for visualizing important features in a dataset. We start with simple tools like histograms and density plots for characterizing one variable at a time, move on to scatter plots and other useful tools for showing how two variables relate, and finally introduce some tools for visualizing more complex relationships in our dataset.
Most base R graphics functions support many optional arguments and parameters that allow us to customize our plots to get exactly what we want. In this chapter, we will learn how to modify point shapes and sizes, line types and widths, add points and lines to plots, add explanatory text and generate multiple plot arrays.
As we have seen, base R graphics provides tremendous flexibility in creating plots with multiple lines, points of different shapes and sizes, and added text, along with arrays of multiple plots. If we attempt to add too many details to a plot or too many plots to an array, however, the result can become too complicated to be useful. This chapter focuses on how to manage this visual complexity so the results remain useful to ourselves and to others.
This final chapter introduces a number of important topics, including the use of numerical plot details returned invisibly by functions like barplot() to enhance our plots, and saving plots to external files so they don't vanish when we end our current R session. This chapter also offers some guidelines for using color effectively in data visualizations, and it concludes with a brief introduction to the other three graphics systems in R.",['Data Visualization with R'],"['Ronald Pearson', 'Nick Carchedi', 'Tom Jeon']",[],['Introduction to R'],https://www.datacamp.com/courses/data-visualization-in-r,Data Visualization,R
62,Data Visualization in R with lattice,4,17,60,"3,276","4,950",Data Visualization in R lattice,"Data Visualization in R with lattice
Visualization is an essential component of interactive data analysis in R. Traditional (base) graphics is powerful, but limited in its ability to deal with multivariate data. Trellis graphics is the natural successor to traditional graphics, extending its simple philosophy to gracefully handle common multivariable data visualization tasks. This course introduces the lattice package, which implements Trellis graphics for R, and illustrates its basic use.
Introduction to some basic plotting functions in lattice. Draw histograms, scatter plots, density plots, and box and whisker plots.
These exercises will teach you to create ""conditioned"" plots consisting of multiple panels using the formula interface.
Learn how to control and customize axis limits and visual appearance.
Learn to use panel and prepanel functions to enhance existing displays or create new ones.
The lattice package is not just meant to be used as a standalone collection of plotting functions. Rather, it is a framework that is used as a base by many other packages. Some of these are very specialized and beyond the scope of this course. Here we give a brief survey of extensions that are generally useful to enhance displays or create new ones.",['Data Visualization with R'],"['Deepayan Sarkar', 'Tom Jeon', 'Sascha Mayr']",[],"['Introduction to R', 'Data Visualization in R']",https://www.datacamp.com/courses/data-visualization-in-r-with-lattice,Data Visualization,R
63,Data Visualization in Spreadsheets,4,16,55,"3,159","4,700",Data Visualization in Spreadsheets,"Data Visualization in Spreadsheets
A picture can tell a thousand words - but only if you use the right picture! This course teaches you the fundamentals of data visualization with Google Sheets. You'll learn how to create common chart types like bar charts, histograms, and scatter charts, as well as more advanced types, such as sparkline and candlestick charts. You will look at how to prepare your data and use Data Validation and VLookup formulas to target specific data to chart. You'll learn how to use Conditional Formatting to apply a format to a cell or a range of cells based on certain criteria, and finally, how to create a dashboard showing plots and data together. Along the way, you'll use data from the Olympics, sharks attacks, and Marine Technology from the ASX.
Learn about business intelligence and dashboards for analyzing information in todays data-driven world. Create a basic dashboard and master setting up your data to get the most out of it.
Create and format a column chart to showcase data and learn a few smart tricks along the way. Look at using named ranges to refer to cells in your worksheet, making them user-friendly and easy to work with.
A dashboard is like a control panel. Look at ways to allow a user to use this control panel to get different results from your dashboard.
A picture paints a thousand words. Look at what types of charts to use in what situation to showcase your data.
Learn how to use rules based on criteria you set to format certain cells on your dashboard. See the formatting change as the values in the cells change.",[],"['Raina Hawley', 'Sascha Mayr', 'Amy Peterson']",[],['Intermediate Spreadsheets for Data Science'],https://www.datacamp.com/courses/data-visualization-in-spreadsheets,Data Visualization,Spreadsheets
64,Data Visualization with Seaborn,4,13,50,"7,286","4,200",Data Visualization Seaborn,"Data Visualization with Seaborn
Do you want to make beautiful, informative visualizations with ease? If so, then you must learn seaborn! Seaborn is a visualization library that is an essential part of the python data science toolkit. In this course, you will learn how to use seaborn's sophisticated visualization tools to analyze multiple real world datasets including the American Housing Survey, college tuition data, and guests from the popular television series, The Daily Show. Following this course, you will be able to use seaborn functions to visualize your data in several different formats and customize seaborn plots for your unique needs.
Introduction to the Seaborn library and where it fits in the Python visualization landscape.
Overview of functions for customizing the display of Seaborn plots.
Overview of more complex plot types included in Seaborn.
Using Seaborn to draw multiple plots in a single figure.",[],"['Chris Moffitt', 'Kara Woo', 'Becca Robins', 'Sara Snell']","[('US Housing and Urban Development FY 2018 Fair Market Rent', 'https://assets.datacamp.com/production/repositories/2210/datasets/a1fb97d60bfbcf0661e320a35a4615f4e8661a68/FY18_4050_FMRs.csv'), ('Washington DC Bike Share', 'https://assets.datacamp.com/production/repositories/2210/datasets/fb4f2c1039e3df2c2e2624a8c95de5a1980861c6/bike_share.csv'), ('2018 College Scorecard Tuition', 'https://assets.datacamp.com/production/repositories/2210/datasets/794e0759b73a2d80baa5d8fb88636a47965139d3/college_datav3.csv'), ('Daily Show Guests', 'https://assets.datacamp.com/production/repositories/2210/datasets/4eead0f82a80136cdc0068cfb54b97fe47c23c15/daily_show_guests_cleaned.csv'), ('Automobile Insurance Premiums', 'https://assets.datacamp.com/production/repositories/2210/datasets/1a8176dc594fc0a13a9f1a7b207d30ed312f2e4a/insurance_premiums.csv'), ('2010 US School Improvement Grants', 'https://assets.datacamp.com/production/repositories/2210/datasets/205443d734f177d36dad2f0bdf821a57b2c4cc13/schoolimprovement2010grants.csv')]","['pandas Foundations', 'Introduction to Python']",https://www.datacamp.com/courses/data-visualization-with-seaborn,Data Visualization,Python
65,Data Visualization with ggplot2 (Part 1),5,14,62,"122,997","5,250",Data Visualization ggplot2,"Data Visualization with ggplot2 (Part 1)
The ability to produce meaningful and beautiful data visualizations is an essential part of your skill set as a data scientist. This course, the first R data visualization tutorial in the series, introduces you to the principles of good visualizations and the grammar of graphics plotting concepts implemented in the ggplot2 package. ggplot2 has become the go-to tool for flexible and professional plots in R. Here, we’ll examine the first three essential layers for making a plot - Data, Aesthetics and Geometries. By the end of the course you will be able to make complex exploratory plots.
In this chapter we’ll get you into the right frame of mind for developing meaningful visualizations with R. You’ll understand that as a communications tool, visualizations require you to think about your audience first. You’ll also be introduced to the basics of ggplot2 - the 7 different grammatical elements (layers) and aesthetic mappings.
The structure of your data will dictate how you construct plots in ggplot2. In this chapter, you’ll explore the iris dataset from several different perspectives to showcase this concept. You’ll see that making your data conform to a structure that matches the plot in mind will make the task of visualization much easier through several R data visualization examples.
Aesthetic mappings are the cornerstone of the grammar of graphics plotting concept. This is where the magic happens - converting continuous and categorical data into visual scales that provide access to a large amount of information in a very short time. In this chapter you’ll understand how to choose the best aesthetic mappings for your data.
A plot’s geometry dictates what visual elements will be used. In this chapter, we’ll familiarize you with the geometries used in the three most common plot types you’ll encounter - scatter plots, bar charts and line plots. We’ll look at a variety of different ways to construct these plots.
In this chapter you'll learn about qplot; it is a quick and dirty form of ggplot2. It’s not as intuitive as the full-fledged ggplot() function but may be useful in specific instances. This chapter also features a wrap-up video and corresponding data visualization exercises.","['Data Analyst with R', 'Data Scientist with R', 'Data Visualization with R']","['Rick Scavetta', 'Vincent Vankrunkelsven', 'Filip Schouwenaars']","[('Subset of 1,000 diamonds', 'https://assets.datacamp.com/production/repositories/236/datasets/20c77eaab1d045693bdc3e6b3c9e72ad2db53746/diamonds.RData'), ('Fish datasets', 'https://assets.datacamp.com/production/repositories/236/datasets/eb4457a6db78d48de3720bb10b47e5c740a21234/fish.RData'), ('Iris datasets', 'https://assets.datacamp.com/production/repositories/236/datasets/7f714f993f1ad4c3d26412ae1e537ce6355b1b54/iris.RData'), ('Recession', 'https://assets.datacamp.com/production/repositories/236/datasets/9f738e79062e6a207c3981533c3cab060f348ebd/recess.RData')]","['Introduction to R', 'Intermediate R']",https://www.datacamp.com/courses/data-visualization-with-ggplot2-1,Data Visualization,R
66,Data Visualization with ggplot2 (Part 2),5,11,55,"45,129","4,750",Data Visualization ggplot2,"Data Visualization with ggplot2 (Part 2)
This ggplot2 tutorial builds on your knowledge from the first course to produce meaningful explanatory plots. We'll explore the last four optional layers. Statistics will be calculated on the fly and we’ll see how Coordinates and Facets aid in communication. Publication quality plots will be produced directly in R using the Themes layer. We’ll also discuss details on data visualization best practices with ggplot2 to help make sure you have a sound understanding of what works and why. By the end of the course, you’ll have all the tools needed to make a custom plotting function to explore a large data set, combining statistics and excellent visuals.
In this chapter, we’ll delve into how to use R ggplot2 as a tool for graphical data analysis, progressing from just plotting data to applying a variety of statistical methods. This includes a variety of linear models, descriptive and inferential statistics (mean, standard deviation and confidence intervals) and custom functions.
The Coordinates and Facets layers offer specific and very useful tools for efficiently and accurately communicating data. In this chapter we’ll look at the various ways of effectively using these two layers.
Now that you’ve built high-quality plots, it’s time to make them pretty. This is the last step in the data viz process. The Themes layer will enable you to make publication quality plots directly in R.
Once you have the technical skill to make great visualizations, it’s important that you make them as meaningful as possible. In this chapter we’ll go over three plot types that are mostly discouraged in the data viz community - heat maps, pie charts and dynamite plots. We’ll understand what the problems are with these plots and what the alternatives are.
In this case study, we’ll explore the large, publicly available California Health Interview Survey dataset from 2009. We’ll go step-by-step through the development of a new plotting method - a mosaic plot - combining statistics and flexible visuals. At the end, we’ll generalize our new plotting method to use on a variety of datasets we’ve seen throughout the first two courses.",['Data Visualization with R'],"['Rick Scavetta', 'Vincent Vankrunkelsven', 'Filip Schouwenaars']","[('CHIS adult-response dataset, 2009', 'https://assets.datacamp.com/production/repositories/235/datasets/3b6fc2923b599058584b57d8c605c6bef454d273/CHIS2009_reduced_2.Rdata')]","['Introduction to R', 'Intermediate R', 'Data Visualization with ggplot2 (Part 1)']",https://www.datacamp.com/courses/data-visualization-with-ggplot2-2,Data Visualization,R
67,Data Visualization with ggplot2 (Part 3),6,19,86,"12,529","7,550",Data Visualization ggplot2 (Part 3),"Data Visualization with ggplot2 (Part 3)
In this third ggplot2 course, we'll dive into some advanced topics including geoms commonly used in maths and sciences, strategies for handling large data sets, a variety of specialty plots, and some useful features of ggplot2 internals.
Actually, all the plots you've explored in the first two ggplot2 courses can be considered 'statistical plots'. Here, however, you'll consider those that are intended for a specialist audience that is familiar with the data: box plots and density plots.
In this chapter, you'll explore useful specialty plots for specific data types such as ternary plots, networks and maps. You'll also look at how to use ggplot2 to convert typical base package plots that are used to evaluate the results of statistical methods. Finally, you'll take a look at a couple ways in which you can make and appropriately use animations.
In this chapter, we'll continue our discussion of plots for specific data types by diving into the world of maps. You'll also have a look at animations to make your data come to life!
In this chapter, you'll delve into ggplot2 internals, exploring the grid package and ggproto. You'll learn how to use these tools to create unique plots.
In this chapter, you'll draw on some of the many tools for effective data visualization that we've covered over the three ggplot2 courses and combine them with some data munging techniques.",[],"['Rick Scavetta', 'Filip Schouwenaars']","[('Movies (subset of 10000 observations)', 'https://assets.datacamp.com/production/repositories/414/datasets/a8e67e7190bc3a7ddc7a34a76bdef0fe136adcfb/ch1_movies_small.RDS'), ('Test datasets', 'https://assets.datacamp.com/production/repositories/414/datasets/9f0326fb6c2c53d97b49e8977c1d7126ca3d9586/test_datasets.RData'), ('Mammals', 'https://assets.datacamp.com/production/repositories/414/datasets/26c594b09095fc5e29b28b74b1faf48fa63cdc62/mammals.RDS'), ('Africa', 'https://assets.datacamp.com/production/repositories/414/datasets/8eaf914265a420d7e240bde1ba9e949a4498e5bb/africa.RData'), ('US Cities', 'https://assets.datacamp.com/production/repositories/414/datasets/24739149e0dbdbdc84dcbf275b68616cb2481005/US_Cities.txt'), ('US States', 'https://assets.datacamp.com/production/repositories/414/datasets/7eef36579d107fefbcb38d0c314c963e608c9609/US_States.txt'), ('Germany unemployment data', 'https://assets.datacamp.com/production/repositories/414/datasets/bdedafb52d7060a90f9bf320cf11a274ce02bcfd/germany_unemployment.txt'), ('Population of Japan', 'https://assets.datacamp.com/production/repositories/414/datasets/f2efc9d1f2f07a22843aabef510094c6e5474616/japanPOP.txt'), ('Shape files', 'https://assets.datacamp.com/production/repositories/414/datasets/1e3d8c75d1c8085a0ed893a4a5b4f49e3311fde2/shape_files.zip'), ('Paris weather data', 'https://assets.datacamp.com/production/repositories/414/datasets/df61e885cc58b88db51968a13ca7827897b098e8/FRPARIS.txt'), ('Reykavik weather data', 'https://assets.datacamp.com/production/repositories/414/datasets/45d984ccc4d2afa2023b7139824116040aac3a54/ILREYKJV.txt'), ('New York weather data', 'https://assets.datacamp.com/production/repositories/414/datasets/c37fafe15bfa05a338f8c835e79ee5e242400438/NYNEWYOR.txt'), ('London weather data', 'https://assets.datacamp.com/production/repositories/414/datasets/89250a654c2f83331a90e6538f89d501aa966181/UKLONDON.txt')]","['Introduction to R', 'Intermediate R', 'Data Visualization with ggplot2 (Part 1)', 'Data Visualization with ggplot2 (Part 2)']",https://www.datacamp.com/courses/data-visualization-with-ggplot2-part-3,Data Visualization,R
68,Data-Driven Decision Making in SQL,4,15,54,"3,801","4,550",Data-Driven Decision Making in SQL,"Data-Driven Decision Making in SQL
In this course, you will learn how to use SQL to support decision making. It is based on a case study about an online movie rental company with a database about customer information, movie ratings, background information on actors and more. You will learn to apply SQL queries to study for example customer preferences, customer engagement, and sales development. This course also covers SQL extensions for online analytical processing (OLAP), which makes it easier to obtain key insights from multidimensional aggregated data.
The first chapter is an introduction to the use case of an online movie rental company, called MovieNow and focuses on using simple SQL queries to extract and aggregated data from its database.
More complex queries with GROUP BY, LEFT JOIN and sub-queries are used to gain insight into customer preferences.
The concept of nested queries and correlated nested queries is introduced and the functions EXISTS and UNION are used to categorize customers, movies, actors, and more.
The OLAP extensions in SQL are introduced and applied to aggregated data on multiple levels. These extensions are the CUBE, ROLLUP and GROUPING SETS operators.",[],"['Irene Ortner', 'Tim Verdonck', 'Bart Baesens', 'Hadrien Lacroix', 'Mona Khalil']","[('MovieNow', 'https://assets.datacamp.com/production/repositories/4068/datasets/6abeae4810d472a18df091e19ed36373ebed410e/MovieNow.sql')]",['Intermediate SQL'],https://www.datacamp.com/courses/data-driven-decision-making-with-sql,Reporting,SQL
69,Dealing With Missing Data in R,4,14,52,"2,958","4,350",Dealing With Missing Data in R,"Dealing With Missing Data in R
Missing data is part of any real world data analysis. It can crop up in unexpected places, making analyses challenging to understand. In this course, you will learn how to use tidyverse tools and the naniar R package to visualize missing values. You'll tidy missing values so they can be used in analysis and explore missing values to find bias in the data. Lastly, you'll reveal other underlying patterns of missingness. You will also learn how to ""fill in the blanks"" of missing values with imputation models, and how to visualize, assess, and make decisions based on these imputed datasets.
Chapter 1 introduces you to missing data, explaining what missing values are, their behavior in R, how to detect them, and how to count them. We then introduce missing data summaries and how to summarise missingness across cases, variables, and how to explore across groups within the data. Finally, we discuss missing data visualizations, how to produce overview visualizations for the entire dataset and over variables, cases, and other summaries, and how to explore these across groups.
In chapter two, you will learn how to uncover hidden missing values like ""missing"" or ""N/A"" and replace them with `NA`. You will learn how to efficiently handle implicit missing values - those values implied to be missing, but not explicitly listed. We also cover how to explore missing data dependence, discussing Missing Completely at Random (MCAR), Missing At Random (MAR), Missing Not At Random (MNAR), and what they mean for your data analysis.
In this chapter, you will learn about workflows for working with missing data. We introduce special data structures, the shadow matrix, and nabular data, and demonstrate how to use them in workflows for exploring missing data so that you can link summaries of missingness back to values in the data. You will learn how to use ggplot to explore and visualize how values changes as other variables go missing. Finally, you learn how to visualize missingness across two variables, and how and why to visualize missings in a scatterplot.
In this chapter, you will learn about filling in the missing values in your data, which is called imputation. You will learn how to impute and track missing values, and what the good and bad features of imputations are so that you can explore, visualise, and evaluate the imputed data against the original values. You will learn how to use, evaluate, and compare different imputation models, and explore how different imputation models affect the inferences you can draw from the models.",['Intermediate Tidyverse Toolbox'],"['Nicholas Tierney', 'David Campos', 'Shon Inouye', 'Chester Ismay', 'Sascha Mayr']",[],"['Introduction to R', 'Introduction to the Tidyverse']",https://www.datacamp.com/courses/dealing-with-missing-data-in-r,Importing & Cleaning Data,R
70,Dealing with Missing Data in Python,4,14,46,103,"3,800",Dealing Missing Data,"Dealing with Missing Data in Python
Tired of working with messy data? Did you know that most of a data scientist's time is spent in finding, cleaning and reorganizing data?! Well turns out you can clean your data in a smart way! In this course Dealing with Missing Data in Python, you'll do just that! You'll learn to address missing values for numerical, and categorical data as well as time-series data. You'll learn to see the patterns the missing data exhibits! While working with air quality and diabetes data, you'll also learn to analyze, impute and evaluate the effects of imputing the data.
Get familiar with missing data and how it impacts your analysis! Learn about different null value operations in your dataset, how to find missing data and summarizing missingness in your data.
Analyzing the type of missingness in your dataset is a very important step towards treating missing values. In this chapter, you'll learn in detail how to establish patterns in your missing and non-missing data, and how to appropriately treat the missingness using simple techniques such as listwise deletion.
Embark on the world of data imputation! In this chapter, you will apply basic imputation techniques to fill in missing data and visualize your imputations to be able to evaluate your imputations' performance.
Finally, go beyond simple imputation techniques and make the most of your dataset by using advanced imputation techniques that rely on machine learning models, to be able to accurately impute and evaluate your missing data. You will be using methods such as KNN and MICE in order to get the most out of your missing data!",[],"['Suraj Donthi', 'Adel Nehme']","[('Diabetes', 'https://assets.datacamp.com/production/repositories/4584/datasets/459359643874ba6411189d4a5251204b6142dd7d/pima-indians-diabetes data.csv'), ('Air Quality', 'https://assets.datacamp.com/production/repositories/4584/datasets/2f7155541f430d0e0e94d30a0becabcdc7dabded/air-quality.csv')]","['pandas Foundations', 'Supervised Learning with scikit-learn', 'Introduction to Data Visualization with Python']",https://www.datacamp.com/courses/dealing-with-missing-data-in-python,Data Manipulation,Python
71,Deep Learning in Python,4,17,50,"133,108","3,500",Deep Learning,"Deep Learning in Python
Deep learning is the machine learning technique behind the most exciting capabilities in diverse areas like robotics, natural language processing, image recognition, and artificial intelligence, including the famous AlphaGo. In this course, you'll gain hands-on, practical knowledge of how to use deep learning with Keras 2.0, the latest version of a cutting-edge library for deep learning in Python.
In this chapter, you'll become familiar with the fundamental concepts and terminology used in deep learning, and understand why deep learning techniques are so powerful today. You'll build simple neural networks and generate predictions with them.
Learn how to optimize the predictions generated by your neural networks. You'll use a method called backward propagation, which is one of the most important techniques in deep learning. Understanding how it works will give you a strong foundation to build on in the second half of the course.
In this chapter, you'll use the Keras library to build deep learning models for both regression and classification. You'll learn about the Specify-Compile-Fit workflow that you can use to make predictions, and by the end of the chapter, you'll have all the tools necessary to build deep neural networks.
Learn how to optimize your deep learning models in Keras. Start by learning how to validate your models, then understand the concept of model capacity, and finally, experiment with wider and deeper networks.","['Data Scientist with Python', 'Machine Learning with Python']","['Dan Becker', 'Hugo Bowne-Anderson', 'Yashas Roy']","[('Hourly wages', 'https://assets.datacamp.com/production/repositories/654/datasets/8a57adcdb5bfb3e603dad7d3c61682dfe63082b8/hourly_wages.csv'), ('MNIST', 'https://assets.datacamp.com/production/repositories/654/datasets/24769dae9dc51a77b9baa785d42ea42e3f8f7538/mnist.csv'), ('Titanic', 'https://assets.datacamp.com/production/repositories/654/datasets/92b75b9bc0c0a8a30999d76f4a1ee786ef072a9c/titanic_all_numeric.csv')]","['Introduction to Python', 'Intermediate Python for Data Science', 'Supervised Learning with scikit-learn']",https://www.datacamp.com/courses/deep-learning-in-python,Machine Learning,Python
72,Deep Learning with Keras in Python,4,15,59,"1,425","4,950",Deep Learning Keras,"Deep Learning with Keras in Python
Deep learning is here to stay! It's the go-to technique to solve complex problems that arise with unstructured data and an incredible tool for innovation. Keras is one of the frameworks that make it easier to start developing deep learning models, and it's versatile enough to build industry-ready models in no time. In this course, you will learn regression and save the earth by predicting asteroid trajectories, apply binary classification to distinguish between real and fake dollar bills, use multiclass classification to decide who threw which dart at a dart board, learn to use neural networks to reconstruct noisy images and much more. Additionally, you will learn how to better control your models during training and how to tune them to boost their performance.
In this first chapter, you will get introduced to neural networks, understand what kind of problems they can solve, and when to use them. You will also build several networks and save the earth by training a regression model that approximates the orbit of a meteor that is approaching us!
By the end of this chapter, you will know how to solve binary, multi-class, and multi-label problems with neural networks. All of this by solving problems like detecting fake dollar bills, deciding who threw which dart at a board, and building an intelligent system to water your farm. You will also be able to plot model training metrics and to stop training and save your models when they no longer improve.
In the previous chapters, you've trained a lot of models! You will now learn how to interpret learning curves to understand your models as they train. You will also visualize the effects of activation functions, batch-sizes, and batch-normalization. Finally, you will learn how to perform automatic hyperparameter optimization to your Keras models using sklearn.
It's time to get introduced to more advanced architectures! You will create an autoencoder to reconstruct noisy images, visualize convolutional neural network activations, use deep pre-trained models to classify images and learn more about recurrent neural networks and working with text as you build a network that predicts the next word in a sentence.",[],"['Miguel Esteban', 'Hillary Green-Lerman', 'Sara Billen']","[('Darts', 'https://assets.datacamp.com/production/repositories/4335/datasets/a6f91a00c922a4fa7204787a583461831437d647/darts.csv'), ('Banknotes', 'https://assets.datacamp.com/production/repositories/4335/datasets/40eb98aaa7c03af87689d363a3e08ab59e38077c/banknotes.csv'), ('MNIST', 'https://assets.datacamp.com/production/repositories/4335/datasets/1c42fb4e5245742f7c3ed188682e2f7e2275f459/MNIST.zip'), ('Irrigation Machine', 'https://assets.datacamp.com/production/repositories/4335/datasets/e8e07e4d8969b5fb8f1d2eae9615feaa2ff5f319/irrigation_machine.csv'), ('Digits', 'https://assets.datacamp.com/production/repositories/4335/datasets/01772a23927623e41fbaaab2ab456e00ba4fcb92/Digits.zip')]",['Supervised Learning with scikit-learn'],https://www.datacamp.com/courses/deep-learning-with-keras-in-python,Machine Learning,Python
73,Deep Learning with PyTorch,4,17,53,"1,522","4,300",Deep Learning PyTorch,"Deep Learning with PyTorch
Neural networks have been at the forefront of Artificial Intelligence research during the last few years, and have provided solutions to many difficult problems like image classification, language translation or Alpha Go. PyTorch is one of the leading deep learning frameworks, being at the same time both powerful and easy to use. In this course you will use PyTorch to first learn about the basic concepts of neural networks, before building your first neural network to predict digits from MNIST dataset. You will then learn about convolutional neural networks, and use them to build much more powerful models which give more accurate results. You will evaluate the results and use different techniques to improve them. Following the course, you will be able to delve deeper into neural networks and start your career in this fascinating field.
In this first chapter, we introduce basic concepts of neural networks and deep learning using PyTorch library.
In this second chapter, we delve deeper into Artificial Neural Networks, learning how to train them with real datasets.
In this third chapter, we introduce convolutional neural networks, learning how to train them and how to use them to make predictions.
In this last chapter, we learn how to make neural networks work well in practice, using concepts like regularization, batch-normalization and transfer learning.",[],"['Ismail Elezi', 'Hadrien Lacroix', 'Hillary Green-Lerman']",[],"['Supervised Learning with scikit-learn', 'Object-Oriented Programming in Python']",https://www.datacamp.com/courses/deep-learning-with-pytorch,Machine Learning,Python
74,Defensive R Programming,4,16,51,861,"3,400",Defensive R Programming,"Defensive R Programming
Writing R scripts is easy. Writing good R code is hard. In this course, we'll discuss defensive programming - a set of standard techniques that will help reduce bugs and aid working in teams. We examine techniques for avoiding common errors and also how to handle the inevitable error that arises in our code. The course will conclude looking at when to make the transition from script to project to package.
In this first chapter, you'll learn what defensive programming is, and how to use existing packages for increased efficiency. You will then learn to manage the packages loaded in your environment and the potential conflicts that may arise.
Programming is simpler when you get feedback on your code execution. In R, we use messages, warnings and errors to signal to keep the user informed. This chapter will discuss when and where you should use these communication tools.
We can avoid making mistakes using a consistent programming approach. In this chapter, we will introduce you to R best practices.
Creating a script is nice, but working on a project with several scripts and assets requires structure. This final chapter will teach you good organization practices, so you can go from script to package with an optimal workflow.",[],"['Colin Gillespie', 'Hadrien Lacroix', 'Sascha Mayr']",[],['Intermediate R'],https://www.datacamp.com/courses/defensive-r-programming,Programming,R
75,Designing Machine Learning Workflows in Python,4,16,51,"1,873","4,200",Designing Machine Learning Workflows,"Designing Machine Learning Workflows in Python
Deploying machine learning models in production seems easy with modern tools, but often ends in disappointment as the model performs worse in production than in development. This course will give you four superpowers that will make you stand out from the data science crowd and build pipelines that stand the test of time: how to exhaustively tune every aspect of your model in development; how to make the best possible use of available domain expertise; how to monitor your model in performance and deal with any performance deterioration; and finally how to deal with poorly or scarcely labelled data. Digging deep into the cutting edge of sklearn, and dealing with real-life datasets from hot areas like personalized healthcare and cybersecurity, this course reveals a view of machine learning from the frontline.
In this chapter, you will be reminded of the basics of a supervised learning workflow, complete with model fitting, tuning and selection, feature engineering and selection, and data splitting techniques. You will understand how these steps in a workflow depend on each other, and recognize how they can all contribute to, or fight against overfitting: the data scientist's worst enemy. By the end of the chapter, you will already be fluent in supervised learning, and ready to take the dive towards more advanced material in later chapters.
In the previous chapter, you perfected your knowledge of the standard supervised learning workflows. In this chapter, you will critically examine the ways in which expert knowledge is incorporated in supervised learning. This is done through the identification of the appropriate unit of analysis which might require feature engineering across multiple data sources, through the sometimes imperfect process of labeling examples, and through the specification of a loss function that captures the true business value of errors made by your machine learning model.
In the previous chapter, you employed different ways of incorporating feedback from experts in your workflow, and evaluating it in ways that are aligned with business value. Now it is time for you to practice the skills needed to productize your model and ensure it continues to perform well thereafter by iteratively improving it. You will also learn to diagnose dataset shift and mitigate the effect that a changing environment can have on your model's accuracy.
In the previous chapters you established a solid foundation in supervised learning, complete with knowledge of deploying models in production but always assumed you a labeled dataset would be available for your analysis. In this chapter, you take on the challenge of modeling data without any, or with very few, labels. This takes you into a journey into anomaly detection, a kind of unsupervised modeling, as well as distance-based learning, where beliefs about what constitutes similarity between two examples can be used in place of labels to help you achieve levels of accuracy comparable to a supervised workflow. Upon completing this chapter, you will clearly stand out from the crowd of data scientists in confidently knowing what tools to use to modify your workflow in order to overcome common real-world challenges.",[],"['Christoforos Anagnostopoulos', 'Chester Ismay', 'Sara Billen']","[('Credit', 'https://assets.datacamp.com/production/repositories/3554/datasets/e02f7e59fc8b6cbd9fc7032fe595038f4171ef16/credit.csv'), ('Flows', 'https://assets.datacamp.com/production/repositories/3554/datasets/18a574bfeef99241c2fe45db6314fdaeb4b288fe/lanl_flows.csv'), ('Attacks', 'https://assets.datacamp.com/production/repositories/3554/datasets/e52ee3093aee49f8599dd30dc0acada35cbc2873/redteam.csv'), ('Hepatitis', 'https://assets.datacamp.com/production/repositories/3554/datasets/7a8662884e2157642c3eb287bee39346040c8bef/hep.csv'), ('Proteins', 'https://assets.datacamp.com/production/repositories/3554/datasets/76399b36f4b8a83a3a441f39cf1cc1171171db5c/proteins_exercises.csv'), ('Arrhythmia', 'https://assets.datacamp.com/production/repositories/3554/datasets/eb59119dbc87d95d89b446b825cb38854a59411e/arrh.csv')]","['Python Data Science Toolbox (Part 2)', 'Supervised Learning with scikit-learn', 'Unsupervised Learning in Python']",https://www.datacamp.com/courses/designing-machine-learning-workflows-in-python,Machine Learning,Python
76,Designing and Analyzing Clinical Trials in R,4,15,48,"1,482","4,000",Designing and Analyzing Clinical Trials in R,"Designing and Analyzing Clinical Trials in R
Clinical trials are scientific experiments that are conducted to assess whether treatments are effective and safe. They are used by a variety of organizations, including pharmaceutical companies for drug development. Biostatisticians play a key role in ensuring the success of a clinical trial. In this course you will gain an overview of the important principles and a practical introduction to commonly used statistical analyses. This course would be valuable for data analysts, medical students, clinicians, medical researchers and others interested in learning about the design and analysis of clinical trials.
In this chapter you will be introduced to the important principles of clinical trials.
In this chapter you will be introduced to randomization methods and different types of trial designs.
By the end of this chapter you will be able to calculate the numbers of patients needed for a clinical trial under a range of scenarios.
In this chapter we will explore additional statistical techniques that are commonly used to analyze data from clinical trials.",[],"['Tamuno Alfred', 'Sascha Mayr', 'David Campos', 'Shon Inouye']","[('Acupuncture dataset', 'https://assets.datacamp.com/production/repositories/1956/datasets/4e5e58dcff952229111ee184bb8a1823f6fa3c7a/Ex1_1_1.Rds'), ('Fact dataset', 'https://assets.datacamp.com/production/repositories/1956/datasets/b23c39b7e793e05d89bc33becd78f0d858287b2b/fact.data.Rds'), ('PK dataset', 'https://assets.datacamp.com/production/repositories/1956/datasets/5d31c5d48a9384dd5dd401e018af6fb452476aaf/PKData.Rds')]","['Introduction to R', 'Introduction to Data', 'Exploratory Data Analysis']",https://www.datacamp.com/courses/designing-and-analyzing-clinical-trials-in-r,Case Studies,R
77,Developing R Packages,4,16,56,"1,946","4,200",Developing R Packages,"Developing R Packages
In this course, you will learn the end-to-end process for creating an R package from scratch. You will start off by creating the basic structure for your package, and adding in important details like functions and metadata. Once the basic components of your package are in place, you will learn about how to document your package, and why this is important for creating quality packages that other people - as well as your future self - can use with ease. Once you have created the components of your package, you will learn how to test they work properly, by creating tests, running checks, and building your package. By the end of this course you can expect to have all the necessary skills to create and share your own R packages.
In this chapter, you will learn the basics of creating an R package. You will learn about the structure of R packages, set up a package, and write a function and include it in your package. You will also learn about the metadata stored in the DESCRIPTION and NAMESPACE files.
In this chapter, you will learn how to document your package. You will learn why documentation is important, and how to provide documentation for your package, its functions, and other components. You will also learn about what it means to export a function and how to implement this in your package.
In this chapter, you will learn about how to run checks to ensure that your R package is correctly structured and can be installed. You will learn how to correct common problems, and get your package ready to be built so it can be shared with others.
In the final chapter, you will learn how to add tests to your package to ensure your code runs as expected if the package is updated or changes. You will look at how to test functions to ensure they produce expected values, and also how to test for other aspects of functionality such as expected errors. Once you've written tests for your functions, you'll finally learn how to run your tests and what to do in the case of a failing test.",[],"['Aimee Gott', 'Nic Crane', 'Richie Cotton', 'Sumedh Panchadhar', 'Eunkyung Park']",[],['Writing Functions in R'],https://www.datacamp.com/courses/developing-r-packages,Programming,R
78,Differential Expression Analysis in R with limma,4,15,47,"1,477","3,900",Differential Expression Analysis in R limma,"Differential Expression Analysis in R with limma
Functional genomic technologies like microarrays, sequencing, and mass spectrometry enable scientists to gather unbiased measurements of gene expression levels on a genome-wide scale. Whether you are generating your own data or want to explore the large number of publicly available data sets, you will first need to learn how to analyze these types of experiments. In this course, you will be taught how to use the versatile R/Bioconductor package limma to perform a differential expression analysis on the most common experimental designs. Furthermore, you will learn how to pre-process the data, identify and correct for batch effects, visually assess the results, and perform enrichment testing. After completing this course, you will have general analysis strategies for gaining insight from any functional genomics study.
To begin, you'll review the goals of differential expression analysis, manage gene expression data using R and Bioconductor, and run your first differential expression analysis with limma.
In this chapter, you'll learn how to construct linear models to test for differential expression for common experimental designs.
Now that you've learned how to perform differential expression tests, next you'll learn how to normalize and filter the feature data, check for technical batch effects, and assess the results.
In this final chapter, you'll use your new skills to perform an end-to-end differential expression analysis of a study that uses a factorial design to assess the impact of the cancer drug doxorubicin on the hearts of mice with different genetic backgrounds.",[],"['John Blischak', 'Richie Cotton', 'David Campos', 'Shon Inouye']","[('Doxorubicin dataset', 'https://assets.datacamp.com/production/repositories/1626/datasets/bb773b0ece1e325dc23933f8e492ef4d1a17cddd/dox.rds'), ('Leukemia dataset', 'https://assets.datacamp.com/production/repositories/1626/datasets/0decef2850200efcf87b107b080959b31ec681ba/cll-eset.rds'), ('Hypoxia dataset', 'https://assets.datacamp.com/production/repositories/1626/datasets/db8dbd1c9889333384a3a78a30c745b4251e6c06/stem-eset.rds')]","['Introduction to R', 'Introduction to Data']",https://www.datacamp.com/courses/differential-expression-analysis-in-r-with-limma,Other,R
79,Dimensionality Reduction in Python,4,16,58,"2,283","4,700",Dimensionality Reduction,"Dimensionality Reduction in Python
High-dimensional datasets can be overwhelming and leave you not knowing where to start. Typically, you’d visually explore a new dataset first, but when you have too many dimensions the classical approaches will seem insufficient. Fortunately, there are visualization techniques designed specifically for high dimensional data and you’ll be introduced to these in this course. After exploring the data, you’ll often find that many features hold little information because they don’t show any variance or because they are duplicates of other features. You’ll learn how to detect these features and drop them from the dataset so that you can focus on the informative ones. In a next step, you might want to build a model on these features, and it may turn out that some don’t have any effect on the thing you’re trying to predict. You’ll learn how to detect and drop these irrelevant features too, in order to reduce dimensionality and thus complexity. Finally, you’ll learn how feature extraction techniques can reduce dimensionality for you through the calculation of uncorrelated principal components.
You'll be introduced to the concept of dimensionality reduction and will learn when an why this is important. You'll learn the difference between feature selection and feature extraction and will apply both techniques for data exploration. The chapter ends with a lesson on t-SNE, a powerful feature extraction technique that will allow you to visualize a high-dimensional dataset.
In this first out of two chapters on feature selection, you'll learn about the curse of dimensionality and how dimensionality reduction can help you overcome it. You'll be introduced to a number of techniques to detect and remove features that bring little added value to the dataset. Either because they have little variance, too many missing values, or because they are strongly correlated to other features.
In this second chapter on feature selection, you'll learn how to let models help you find the most important features in a dataset for predicting a particular target feature. In the final lesson of this chapter, you'll combine the advice of multiple, different, models to decide on which features are worth keeping.
This chapter is a deep-dive on the most frequently used dimensionality reduction algorithm, Principal Component Analysis (PCA). You'll build intuition on how and why this algorithm is so powerful and will apply it both for data exploration and data pre-processing in a modeling pipeline. You'll end with a cool image compression use case.",[],"['Jeroen Boeye', 'Aleksandra Vercauteren', 'Hadrien Lacroix', 'Hillary Green-Lerman', 'Chester Ismay']","[('ANSUR Female', 'https://assets.datacamp.com/production/repositories/3515/datasets/802fc5cdbe3a29248483e496a966627ea9629e7a/ANSUR_II_FEMALE.csv'), ('ANSUR Male', 'https://assets.datacamp.com/production/repositories/3515/datasets/28edd853c0a6aa7316b0d84a21f8e0d821e5010d/ANSUR_II_MALE.csv'), ('Diabetes', 'https://assets.datacamp.com/production/repositories/3515/datasets/87ced33d5371cdc13f9301ecb99ead36a63c8197/PimaIndians.csv'), ('Grocery store sales', 'https://assets.datacamp.com/production/repositories/3515/datasets/236dfa1d124bf01147dd5b3da595066fcf84a1a4/grocery_sales.csv'), ('Boston Public Schools', 'https://assets.datacamp.com/production/repositories/3515/datasets/8d23ca278dcc6c6b59629a47e1474afd93ad960c/Public_Schools2.csv'), ('Pokemon', 'https://assets.datacamp.com/production/repositories/3515/datasets/9b0682ecacc5a3429f62947794d1adbeecbd5a11/pokemon.csv')]",['Supervised Learning with scikit-learn'],https://www.datacamp.com/courses/dimensionality-reduction-in-python,Machine Learning,Python
80,Dimensionality Reduction in R,4,14,46,"2,141","3,450",Dimensionality Reduction in R,"Dimensionality Reduction in R
Real-world datasets often include values for dozens, hundreds, or even thousands of variables. Our minds cannot efficiently process such high-dimensional datasets to come up with useful, actionable insights. How do you deal with these multi-dimensional swarms of data points? How do you uncover and visualize hidden patterns in the data? In this course, you'll learn how to answer these questions by mastering three fundamental dimensionality reduction techniques - Principal component analysis (PCA), non-negative matrix factorisation (NNMF), and exploratory factor analysis (EFA).
As a data scientist, you'll frequently have to deal with messy and high-dimensional datasets. In this chapter, you'll learn how to use Principal Component Analysis (PCA) to effectively reduce the dimensionality of such datasets so that it becomes easier to extract actionable insights from them.
Here, you'll build on your knowledge of PCA by tackling more advanced applications, such as dealing with missing data. You'll also become familiar with another essential dimensionality reduction technique called Non-negative matrix factorization (NNMF) and how to use it in R.
Become familiar with exploratory factor analysis (EFA), another dimensionality reduction technique that is a natural extension to PCA.
Round out your mastery of dimensionality reduction in R by extending your knowledge of EFA to cover more advanced applications.",[],"['Alexandros Tantos', 'Yashas Roy', 'Richie Cotton', 'Benjamin Feder']","[('BBC dataset', 'https://assets.datacamp.com/production/course_4249/datasets/bbc_res.rds'), ('Humor Styles Questionnaire dataset', 'https://assets.datacamp.com/production/course_4249/datasets/humor_dataset.csv'), ('Short Dark Triad dataset', 'https://assets.datacamp.com/production/course_4249/datasets/SD3.RDS')]",['Unsupervised Learning in R'],https://www.datacamp.com/courses/dimensionality-reduction-in-r,Machine Learning,R
81,Ensemble Methods in Python,4,15,52,850,"4,050",Ensemble Methods,"Ensemble Methods in Python
Continue your machine learning journey by diving into the wonderful world of ensemble learning methods! These are an exciting class of machine learning techniques that combine multiple individual algorithms to boost performance and solve complex problems at scale across different industries. Ensemble techniques regularly win online machine learning
competitions as well! In this course, you’ll learn all about these advanced ensemble techniques, such as bagging, boosting, and stacking. You’ll apply them to real-world datasets using cutting edge Python machine learning libraries such as scikit-learn, XGBoost, CatBoost, and mlxtend.
Do you struggle to determine which of the models you built is the best for your problem? You should give up on that, and use them all instead! In this chapter, you'll learn how to combine multiple models into one using ""Voting"" and ""Averaging"". You'll use these to predict the ratings of apps on the Google Play Store, whether or not a Pokémon is legendary, and which characters are going to die in Game of Thrones!
Bagging is the ensemble method behind powerful machine learning algorithms such as random forests. In this chapter you'll learn the theory behind this technique and build your own bagging models using scikit-learn.
Boosting is class of ensemble learning algorithms that includes award-winning models such as AdaBoost. In this chapter, you'll learn about this award-winning model, and use it to predict the revenue of award-winning movies! You'll also learn about gradient boosting algorithms such as CatBoost and XGBoost.
Get ready to see how things stack up! In this final chapter you'll learn about the stacking ensemble method. You'll learn how to implement it from scratch as well as using the mlxtend library! You'll apply stacking to predict the edibility of North American mushrooms, and revisit the ratings of Google apps with this more advanced approach.",[],"['Román de las Heras', 'Hillary Green-Lerman', 'Yashas Roy']","[('App ratings', 'https://assets.datacamp.com/production/repositories/4024/datasets/f29456ea573c318fa53362fdf91871d0c7849bb2/googleplaystore.csv'), ('App reviews', 'https://assets.datacamp.com/production/repositories/4024/datasets/be1aeb4c05850973c671d689575b6613fd8c8553/googleplaystore_user_reviews.csv'), ('Game of Thrones', 'https://assets.datacamp.com/production/repositories/4024/datasets/02627e1959ac37b28bde9ec9d28400d776dbc123/character-predictions.csv'), ('Pokémon', 'https://assets.datacamp.com/production/repositories/4024/datasets/2dd4cab3c792e2755e7dafe355a14bdb06973c5d/Pokemon.csv'), ('SECOM (Semiconductor Manufacturing)', 'https://assets.datacamp.com/production/repositories/4024/datasets/68204a108133375b21076bdd7cb560d4bb7ce4b8/uci-secom.csv'), ('TMDb (The Movie Database)', 'https://assets.datacamp.com/production/repositories/4024/datasets/f3b1b3b8ee260b447b146f156b9fbc72e51f2131/tmdb_5000_movies.csv')]","['Linear Classifiers in Python', 'Machine Learning with Tree-Based Models in Python']",https://www.datacamp.com/courses/ensemble-methods-in-python,Machine Learning,Python
82,Equity Valuation in R,4,16,58,"3,821","4,750",Equity Valuation in R,"Equity Valuation in R
How do we know when a stock is cheap or expensive? To do this, we need to compare the stock's price with its value. The price of the stock can be obtained by looking at various public sources, such as Yahoo Finance or Google Finance. The value of the stock though is much harder to identify. Every investor has to form his or her valuation of the stock. In this course, you will learn the fundamentals of valuing stocks using present value approaches, such as free cash flow to equity and dividend discount models, and valuation multiples. By the end of this course, you will be able to build your own valuation models.
Many individuals and institutions invest in equities. To do so effectively, the investor must have a solid understanding of how the value of the equity compares to the stock price. In this course, we focus on fundamental concepts of equity valuation. We begin with a discussion of time value of money and then move on to the first of two discounted cash flow methods we will discuss - the free cash flow to equity valuation model.
One of the critical components of free cash flow to equity valuation is using reliable projections. In the first part of this chapter, we will discuss ways to analyze the projections to help us identify the right questions to ask. In the second part of this chapter, we will go through the second of our discounted cash flow models - the dividend discount model. In this approach, we discount expected dividends instead of free cash flows.
To be able to discount cash flows, we need a discount rate. For the free cash flow to equity and dividend discount model, the cost of equity is the appropriate discount rate. In this chapter, we will discuss how each of the components of the cost of equity are calculated.
Relative valuation allows us to use the valuation of comparable companies to infer the value of our subject firm. In this chapter, we discuss how to identify comparable companies and how to calculate valuation multiples. We also show how to analyze the determinants of multiples.
This chapter combines the lessons from Chapters 1 to 4 in a series of exercises. You will be asked to inspect the data and to value the firm using discounted cash flow and relative valuation approaches. At the end, you will combine the results in a summary table.",[],"['Clifford Ang', 'Lore Dirick', 'Sumedh Panchadhar']","[('Historical returns', 'https://assets.datacamp.com/production/repositories/941/datasets/47503ad99e5539567fc9211b5df956a2260f9305/damodaran_histret.rda'), ('US Treasury data', 'https://assets.datacamp.com/production/repositories/941/datasets/338824497e458d5bd1bdd68ca301565eba3c9de7/fred_10yr.rda'), ('S&P 400 Midcap Index', 'https://assets.datacamp.com/production/repositories/941/datasets/552725670438351d1e704e69cb3e566e38fa330e/midcap400.rda'), ('Mylan prices', 'https://assets.datacamp.com/production/repositories/941/datasets/949f5196c681dd0abe9fa8a752c40a4711fa1a75/myl_spy_prices.rda')]","['Introduction to R for Finance', 'Intermediate R for Finance', 'Importing and Managing Financial Data in R']",https://www.datacamp.com/courses/equity-valuation-in-r,Applied Finance,R
83,Error and Uncertainty in Spreadsheets,4,15,61,38,"4,950",Error and Uncertainty in Spreadsheets,"Error and Uncertainty in Spreadsheets
You rely on predictions every day: you might check the weather app before choosing your outfit or peek at the traffic before starting your commute. Perhaps you are responsible for setting your organization’s strategy in the future. Do you find yourself wondering how accurate predictions are, how you can see into the future, and why the weatherman always seems to be wrong? In our Error and Uncertainty course, you’ll make some predictions yourself, learn to distinguish real differences from random noise, and explore psychological crutches we use that interfere with our rational decision making. You will uncover patterns in Seattle crime data, predict students’ final grades, prevent Nashville traffic accidents, and determine whether a bakery’s menu needs to change. Join us! We’re certain you’ll enjoy learning about error and uncertainty.
The first chapter presents common terminology, introduces methods for determining significant differences between groups, and outlines the kinds of error and uncertainty involved. We will specifically look at Seattle crime data and evaluate crime rate differences between precincts and neighborhoods. This chapter will equip learners to identify threats to the validity and accuracy of their conclusions.
The second chapter outlines both rudimentary (e.g., moving average, seasonal average, yearly average) and more complicated methods (e.g., linear regression) for making predictions and outlines the kinds of error and uncertainty involved. We will specifically look at anonymized student grades data and evaluate the accuracy of our predictions for given students. Throughout the chapter, we will identify threats to the validity and accuracy of our predictions.
Chapter 3 encourages learners to test the assumptions of their predictions using data on car crashes. Specifically, they will determine how to allocate resources to reduce injuries and fatalities from auto accidents. Learners will discuss the impact of outliers in prediction accuracy, evaluate the importance of normally distributed data in making predictions, employ consequence-likelihood matrices in risk management, and adapt psychological heuristics to discussions of numerical uncertainty and risk.
The final chapter integrates all the previous lessons into a constructed-world scenario. Learners are tasked with updating the menu at their small business: the Risky Business Bakery. They need to figure out whether to add or drop menu items based on whether there are significant differences in sales by baked good; whether their predicted sales figures from their accountant are accurate.",[],"['Evan Kramer', 'Chester Ismay', 'Becca Robins', 'Ruanne Van Der Walt']","[('Seattle Crime Data', 'https://assets.datacamp.com/production/repositories/4311/datasets/dea1de7f70b77c0dc0bdad4de5154ef4f6d5ceaa/1_seattle_crime.csv'), ('Student Math Scores', 'https://assets.datacamp.com/production/repositories/4311/datasets/036f0e2199b2670d6da2fbe8fa799ce70787ee96/2_math_scores.csv'), ('Risky Business Bakery', 'https://assets.datacamp.com/production/repositories/4311/datasets/a983a410dc7065058970013c8ebdf1963735dd7f/4_bakery_sales.csv')]",['Data Analysis with Spreadsheets'],https://www.datacamp.com/courses/error-and-uncertainty-in-spreadsheets,Probability & Statistics,Spreadsheets
84,Experimental Design in Python,4,16,53,293,"4,400",Experimental Design,"Experimental Design in Python
Data is all around us and can help us to understand many things. Making a pretty graph is great, but how can we tell the difference between a few outliers on a graph and a real, reliable effect? Is a trend that we see on a graph a reliable result or just random chance playing tricks? In this course, you will learn how to interrogate datasets in a rigorous way, giving clear answers to your questions. You will learn a range of statistical tests, how to apply them, how to understand their results, and how to deal with their shortcomings. Along the way, you will explore Olympic athlete data and the differences between populations of continents.
In this chapter, you will learn how to explore your data and ask meaningful questions. Then, you will discover how to answer these question by using your first statistical hypothesis tests: the t-test, the Chi-Square test, the Fisher exact test, and the Pearson correlation test.
In this chapter, you will learn how to examine and multiple factors at once, controlling for the effect of confounding variables and examining interactions between variables. You will learn how to use randomization and blocking to build robust tests and how to use the powerful ANOVA method.
In this chapter, you will focus on ways to avoid drawing false conclusions, whether false positives (type I errors) or false negatives (type II errors). Central to avoiding false negatives is understanding the interplay between sample size, power analysis, and effect size.
In this final chapter, you will examine the assumptions underlying statistical tests and learn about how that influences your experimental design. This will include learning whether a variable follows a normal distribution and when you should use non-parametric statistical tests like the Wilcoxon rank-sum test and the Spearman correlation test.",[],"['Luke Hayden', 'Chester Ismay', 'Amy Peterson']","[('Olympic dataset', 'https://assets.datacamp.com/production/repositories/4371/datasets/8fd0a14bfbc5f13719d92334eaf77b23f2e914d6/olyathswim.csv'), ('UN dataset', 'https://assets.datacamp.com/production/repositories/4371/datasets/f5c1016b818f97ec200236fb161ae711944fb2cb/undata_country_profile_variables.csv')]",['Introduction to Python'],https://www.datacamp.com/courses/experimental-design-in-python,Probability & Statistics,Python
85,Experimental Design in R,4,12,52,"3,890","4,400",Experimental Design in R,"Experimental Design in R
Experimental design is a crucial part of data analysis in any field, whether you work in business, health or tech. If you want to use data to answer a question, you need to design an experiment! In this course you will learn about basic experimental design, including block and factorial designs, and commonly used statistical tests, such as the t-tests and ANOVAs. You will use built-in R data and real world datasets including the CDC NHANES survey, SAT Scores from NY Public Schools, and Lending Club Loan Data. Following the course, you will be able to design and analyze your own experiments!
An introduction to key parts of experimental design plus some power and sample size calculations.
Explore the Lending Club dataset plus build and validate basic experiments, including an A/B test.
Use the NHANES data to build a RCBD and BIBD experiment, including model validation and design tips to make sure the BIBD is valid.
Evaluate the NYC SAT scores data and deal with its missing values, then evaluate Latin Square, Graeco-Latin Square, and Factorial experiments.",['Statistics Fundamentals with R'],"['kaelen medeiros', 'Sascha Mayr', 'Becca Robins']","[('sample of Lending Club data', 'https://assets.datacamp.com/production/repositories/1793/datasets/e14dbe91a0840393e86e4fb9a7ec1b958842ae39/lendclub.csv'), ('NHANES Body Measures', 'https://assets.datacamp.com/production/repositories/1793/datasets/ee832ef6c2fa7036704c53e90dc1e710a3b50dbc/nhanes_bodymeasures.csv'), ('NHANES Demographics', 'https://assets.datacamp.com/production/repositories/1793/datasets/2be5ca94453a63e825bc30ccefd1429b7683c19c/nhanes_demo.csv'), ('NHANES final combined dataset', 'https://assets.datacamp.com/production/repositories/1793/datasets/c74d60f37456fd0bbf0323d6ef88ff6ca91366a3/nhanes_final.csv'), ('NHANES Medical Conditions', 'https://assets.datacamp.com/production/repositories/1793/datasets/d34921a9255422617cdc42f6a3fbcd189f51c19d/nhanes_medicalconditions.csv'), ('NYC SAT Scores', 'https://assets.datacamp.com/production/repositories/1793/datasets/6eee2fcc47c8c8dbb2e9d4670cf2eabeda52b705/nyc_scores.csv')]","['Introduction to Data', 'Introduction to the Tidyverse']",https://www.datacamp.com/courses/experimental-design-in-r,Probability & Statistics,R
86,Exploratory Data Analysis,4,15,54,"36,879","3,950",Exploratory Data Analysis,"Exploratory Data Analysis
When your dataset is represented as a table or a database, it's difficult to observe much about it beyond its size and the types of variables it contains. In this course, you'll learn how to use graphical and numerical techniques to begin uncovering the structure of your data. Which variables suggest interesting relationships? Which observations are unusual? By the end of the course, you'll be able to answer these questions and more, while generating graphics that are both insightful and beautiful.
In this chapter, you will learn how to create graphical and numerical summaries of two categorical variables.
In this chapter, you will learn how to graphically summarize numerical data.
Now that we've looked at exploring categorical and numerical data, you'll learn some useful statistics for describing distributions of data.
Apply what you've learned to explore and summarize a real world dataset in this case study of email spam.","['Data Analyst with R', 'Data Scientist with R', 'Statistics Fundamentals with R']","['Andrew Bray', 'Nick Carchedi', 'Tom Jeon']","[('Cars data', 'https://assets.datacamp.com/production/repositories/537/datasets/c0366d5da5ee8dce49919a5443685cf2e50c6a96/cars04.csv'), ('Comics data', 'https://assets.datacamp.com/production/repositories/537/datasets/8860af2c0ef67fc77a8c704a73bbb93a395debcf/comics.csv'), ('Immigration data', 'https://assets.datacamp.com/production/repositories/537/datasets/d6b811836c453d2afaaf76c6d62b592e673e93ae/immigration.csv'), ('Raw life expectancy data', 'https://assets.datacamp.com/production/repositories/537/datasets/e079a96a639aa10afc478359da45f2f75f7efd2e/life_exp_raw.csv'), ('Names data', 'https://assets.datacamp.com/production/repositories/537/datasets/7dc95cdac26db11e7dd46542741435dbb09fb613/names.txt'), ('Raw U.S. income data', 'https://assets.datacamp.com/production/repositories/537/datasets/813eb74f670b7dd1c7806375bc9607472fe976db/us_income_raw.csv')]","['Introduction to R', 'Introduction to Data']",https://www.datacamp.com/courses/exploratory-data-analysis,Probability & Statistics,R
87,Exploratory Data Analysis in Python,4,16,52,"2,438","4,150",Exploratory Data Analysis,"Exploratory Data Analysis in Python
How do we get from data to answers? Exploratory data analysis is a process for exploring datasets, answering questions, and visualizing results. This course presents the tools you need to clean and validate data, to visualize distributions and relationships between variables, and to use regression models to predict and explain. You'll explore data related to demographics and health, including the National Survey of Family Growth and the General Social Survey. But the methods you learn apply to all areas of science, engineering, and business. You'll use Pandas, a powerful library for working with data, and other core Python libraries including NumPy and SciPy, StatsModels for regression, and Matplotlib for visualization. With these tools and skills, you will be prepared to work with real data, make discoveries, and present compelling results.
The first step of almost any data project is to read the data, check for errors and special cases, and prepare data for analysis. This is exactly what you'll do in this chapter, while working with a dataset obtained from the National Survey of Family Growth.
In the first chapter, having cleaned and validated your data, you began exploring it by using histograms to visualize distributions. In this chapter, you'll learn how to represent distributions using Probability Mass Functions (PMFs) and Cumulative Distribution Functions (CDFs). You'll learn when to use each of them, and why, while working with a new dataset obtained from the General Social Survey.
Up until this point, you've only looked at one variable at a time. In this chapter, you'll explore relationships between variables two at a time, using scatter plots and other visualizations to extract insights from a new dataset obtained from the Behavioral Risk Factor Surveillance Survey (BRFSS). You'll also learn how to quantify those relationships using correlation and simple regression.
Explore multivariate relationships using multiple regression to describe non-linear relationships and logistic regression to explain and predict binary variables.",[],"['Allen Downey', 'Chester Ismay', 'Yashas Roy']","[('National Survey of Family Growth (NSFG)', 'https://assets.datacamp.com/production/repositories/4025/datasets/513eca1637050a1fa75874dc5ceabfe89e9d2668/nsfg.hdf5'), ('General Social Survey (GSS)', 'https://assets.datacamp.com/production/repositories/4025/datasets/01de76fde7ef43c629a7dbfb11ce91cde0210417/gss.hdf5'), ('Behavioral Risk Factor Surveillance System (BRFSS)', 'https://assets.datacamp.com/production/repositories/4025/datasets/0bfd1b5298cbaf58f3b4dc2c035120a8b6156d73/brfss.hdf5')]",['Python Data Science Toolbox (Part 2)'],https://www.datacamp.com/courses/exploratory-data-analysis-in-python,Case Studies,Python
88,Exploratory Data Analysis in R: Case Study,4,15,58,"24,605","4,800",Exploratory Data Analysis in R: Case Study,"Exploratory Data Analysis in R: Case Study
Once you've started learning tools for data manipulation and visualization like dplyr and ggplot2, this course gives you a chance to use them in action on a real dataset. You'll explore the historical voting of the United Nations General Assembly, including analyzing differences in voting between countries, across time, and among international issues. In the process you'll gain more practice with the dplyr and ggplot2 packages, learn about the broom package for tidying model output, and experience the kind of start-to-finish exploratory analysis common in data science.
The best way to learn data wrangling skills is to apply them to a specific case study. Here you'll learn how to clean and filter the United Nations voting dataset using the dplyr package, and how to summarize it into smaller, interpretable units.
Once you've cleaned and summarized data, you'll want to visualize them to understand trends and extract insights. Here you'll use the ggplot2 package to explore trends in United Nations voting within each country over time.
While visualization helps you understand one country at a time, statistical modeling lets you quantify trends across many countries and interpret them together. Here you'll learn to use the tidyr, purrr, and broom packages to fit linear models to each country, and understand and compare their outputs.
In this chapter, you'll learn to combine multiple related datasets, such as incorporating information about each resolution's topic into your vote analysis. You'll also learn how to turn untidy data into tidy data, and see how tidy data can guide your exploration of topics and countries over time.","['Data Analyst with R', 'Data Manipulation with R', 'Data Scientist with R']","['David Robinson', 'Nick Carchedi', 'Tom Jeon']","[('United Nations voting dataset', 'https://assets.datacamp.com/production/repositories/420/datasets/ddfa750d993c73026f621376f3c187f276bf0e2a/votes.rds'), ('Topic information for each country (Descriptions)', 'https://assets.datacamp.com/production/repositories/420/datasets/a438432333a31a6f4aba2d5507df9a44e513b518/descriptions.rds')]","['Introduction to R', 'Data Visualization with ggplot2 (Part 1)']",https://www.datacamp.com/courses/exploratory-data-analysis-in-r-case-study,Case Studies,R
89,Exploring Pitch Data with R,4,14,69,"7,427","5,750",Exploring Pitch Data R,"Exploring Pitch Data with R
Velocity is a key component in the arsenal of many pitchers. In this chapter, you will examine whether there was an uptick in Zack Greinke's velocity during his impressive July in 2015. The chapter will introduce how to deal with dates, plotting distributions with histograms, and using the very handy tapply() function.
Pitchers throw various types of pitches with different velocities and trajectories in order to make it more difficult for the batter to hit the ball. This chapter will introduce pitch types and make heavy use of tables to examine changes to pitch type choices by Greinke in July, as well as in other important situations.
As with velocity and pitch type, pitch location can play a key role in pitching success. This chapter leverages the rich information about location provided in the MLB Statcast data to visualize changes in Greinke's pitch location choice in July and in different ball-strike counts. You will also make use of the very important for loop in the context of plotting data.
In this chapter, you'll bring it all together. Minimizing damage on each pitch is the key to run prevention by the pitcher. Therefore, you will look closely at outcomes from pitches thrown by Greinke in different months. We'll also introduce the ggplot2 package to create high quality visualizations of hitter exit speed when Greinke throws to different locations.",[],"['Brian M. Mills', 'Nick Carchedi', 'Tom Jeon', 'Jeff Paadre']","[('greinke2015', 'https://assets.datacamp.com/production/course_943/datasets/greinke2015.csv')]","['Introduction to R', 'Intermediate R']",https://www.datacamp.com/courses/exploring-pitch-data-with-r,Case Studies,R
90,Extreme Gradient Boosting with XGBoost,4,16,49,"14,448","3,750",Extreme Gradient Boosting XGBoost,"Extreme Gradient Boosting with XGBoost
Do you know the basics of supervised learning and want to use state-of-the-art models on real-world datasets? Gradient boosting is currently one of the most popular techniques for efficient modeling of tabular datasets of all sizes. XGboost is a very fast, scalable implementation of gradient boosting, with models using XGBoost regularly winning online data science competitions and being used at scale across different industries. In this course, you'll learn how to use this powerful library alongside pandas and scikit-learn to build and tune supervised learning models. You'll work with real-world datasets to solve classification and regression problems.
This chapter will introduce you to the fundamental idea behind XGBoost—boosted learners. Once you understand how XGBoost works, you'll apply it to solve a common classification problem found in industry: predicting whether a customer will stop being a customer at some point in the future.
After a brief review of supervised regression, you'll apply XGBoost to the regression task of predicting house prices in Ames, Iowa. You'll learn about the two kinds of base learners that XGboost can use as its weak learners, and review how to evaluate the quality of your regression models.
This chapter will teach you how to make your XGBoost models as performant as possible. You'll learn about the variety of parameters that can be adjusted to alter the behavior of XGBoost and how to tune them efficiently so that you can supercharge the performance of your models.
Take your XGBoost skills to the next level by incorporating your models into two end-to-end machine learning pipelines. You'll learn how to tune the most important XGBoost hyperparameters efficiently within a pipeline, and get an introduction to some more advanced preprocessing techniques.",[],"['Sergey Fogelson', 'Hugo Bowne-Anderson', 'Yashas Roy']","[('Ames housing prices (preprocessed)', 'https://assets.datacamp.com/production/repositories/943/datasets/4dbcaee889ef06fb0763e4a8652a4c1f268359b2/ames_housing_trimmed_processed.csv'), ('Ames housing prices (original)', 'https://assets.datacamp.com/production/repositories/943/datasets/17a7c5c0acd7bfa253827ea53646cf0db7d39649/ames_unprocessed_data.csv'), ('Chronic kidney disease', 'https://assets.datacamp.com/production/repositories/943/datasets/82c231cd41f92325cf33b78aaa360824e6b599b9/chronic_kidney_disease.csv')]","['Supervised Learning with scikit-learn', 'Machine Learning with the Experts: School Budgets']",https://www.datacamp.com/courses/extreme-gradient-boosting-with-xgboost,Machine Learning,Python
91,Factor Analysis in R,4,13,45,"3,208","3,600",Factor Analysis in R,"Factor Analysis in R
The world is full of unobservable variables that can't be directly measured. You might be interested in a construct such as math ability, personality traits, or workplace climate. When investigating constructs like these, it's critically important to have a model that matches your theories and data. This course will help you understand dimensionality and show you how to conduct exploratory and confirmatory factor analyses. With these statistical techniques in your toolkit, you'll be able to develop, refine, and share your measures. These analyses are foundational for diverse fields including psychology, education, political science, economics, and linguistics.
In Chapter 1, you will learn how to conduct an EFA to examine the statistical properties of a measure designed around one construct.
This chapter will show you how to extend the single-factor EFA you learned in Chapter 1 to multidimensional data.
This chapter will cover conducting CFAs with the sem package. Both theory-driven and EFA-driven CFA structures will be covered.
This chapter will reinforce the difference between EFAs and CFAs and offer suggestions for improving your model and/or measure.",['Unsupervised Machine Learning with R'],"['Jennifer Brussow', 'Chester Ismay', 'Becca Robins']","[('Generic Conspiracist Beliefs Scale (GCBS) dataset', 'https://assets.datacamp.com/production/repositories/2136/datasets/869615371e66021e97829feb7e19e38037ed0c14/GCBS_data.rds')]","['Intermediate R', 'Foundations of Inference']",https://www.datacamp.com/courses/factor-analysis-in-r,Probability & Statistics,R
92,Feature Engineering for Machine Learning in Python,4,16,53,"1,946","4,350",Feature Engineering Machine Learning,"Feature Engineering for Machine Learning in Python
Every day you read about the amazing breakthroughs in how the newest applications of machine learning are changing the world. Often this reporting glosses over the fact that a huge amount of data munging and feature engineering must be done before any of these fancy models can be used. In this course, you will learn how to do just that. You will work with Stack Overflow Developers survey, and historic US presidential inauguration addresses, to understand how best to preprocess and engineer features from categorical, continuous, and unstructured data. This course will give you hands-on experience on how to prepare any data for your own machine learning models.
In this chapter, you will explore what feature engineering is and how to get started with applying it to real-world data. You will load, explore and visualize a survey response dataset, and in doing so you will learn about its underlying data types and why they have an influence on how you should engineer your features. Using the pandas package you will create new features from both categorical and continuous columns.
This chapter introduces you to the reality of messy and incomplete data. You will learn how to find where your data has missing values and explore multiple approaches on how to deal with them. You will also use string manipulation techniques to deal with unwanted characters in your dataset.
In this chapter, you will focus on analyzing the underlying distribution of your data and whether it will impact your machine learning pipeline. You will learn how to deal with skewed data and situations where outliers may be negatively impacting your analysis.
Finally, in this chapter, you will work with unstructured text data, understanding ways in which you can engineer columnar features out of a text corpus. You will compare how different approaches may impact how much context is being extracted from a text, and how to balance the need for context, without too many features being created.",[],"[""Robert O'Callaghan"", 'Sumedh Panchadhar', 'Hillary Green-Lerman']","[('Stack Overflow Survey Responses (Modified)', 'https://assets.datacamp.com/production/repositories/3752/datasets/19699a2441073ad6459bf5e3e17690e2cae86cf1/Combined_DS_v10.csv'), ('US Presidential Inauguration Addresses', 'https://assets.datacamp.com/production/repositories/3752/datasets/cdc15798dd6698003ee33c6af185242faf896187/inaugural_speeches.csv')]","['pandas Foundations', 'Supervised Learning with scikit-learn']",https://www.datacamp.com/courses/feature-engineering-for-machine-learning-in-python,Machine Learning,Python
93,Feature Engineering for NLP in Python,4,15,52,"2,379","4,200",Feature Engineering NLP,"Feature Engineering for NLP in Python
In this course, you will learn techniques that will allow you to extract useful information from text and process them into a format suitable for applying ML models. More specifically, you will learn about POS tagging, named entity recognition, readability scores, the n-gram and tf-idf models, and how to implement them using scikit-learn and spaCy. You will also learn to compute how similar two documents are to each other. In the process, you will predict the sentiment of movie reviews and build movie and Ted Talk recommenders. Following the course, you will be able to engineer critical features out of any text and solve some of the most challenging problems in data science!
Learn to compute basic features such as number of words, number of characters, average word length and number of special characters (such as Twitter hashtags and mentions). You will also learn to compute readability scores and determine the amount of education required to comprehend a piece of text.
In this chapter, you will learn about tokenization and lemmatization. You will then learn how to perform text cleaning, part-of-speech tagging, and named entity recognition using the spaCy library. Upon mastering these concepts, you will proceed to make the Gettysburg address machine-friendly, analyze noun usage in fake news, and identify people mentioned in a TechCrunch article.
Learn about n-gram modeling and use it to perform sentiment analysis on movie reviews.
Learn how to compute tf-idf weights and the cosine similarity score between two vectors. You will use these concepts to build a movie and a TED Talk recommender. Finally, you will also learn about word embeddings and using word vector representations, you will compute similarities between various Pink Floyd songs.",[],"['Rounak Banik', 'Hillary Green-Lerman', 'Adrián Soto']","[('Russian Troll Tweets', 'https://assets.datacamp.com/production/repositories/4375/datasets/f67c0cd351c8431bde5ac9724f9031102e38edb3/russian_tweets.csv'), ('Movie Overviews and Taglines', 'https://assets.datacamp.com/production/repositories/4375/datasets/83f27c4ad045c098d3db5596154316e4ee0a28a8/movie_overviews.csv'), ('Preprocessed Movie Reviews', 'https://assets.datacamp.com/production/repositories/4375/datasets/4281f3352173b69c17965c8f5261603cc18c7d0b/movie_reviews_clean.csv'), ('TED Talk Transcripts', 'https://assets.datacamp.com/production/repositories/4375/datasets/923cfcdab7e4297c2e3c4c859a5add798ae51d3b/ted.csv'), ('Real and Fake News Headlines', 'https://assets.datacamp.com/production/repositories/4375/datasets/dd0cbaa4d6df483b6cb8fb8365152f5e3d743990/fakenews.csv')]","['pandas Foundations', 'Natural Language Processing Fundamentals in Python', 'Supervised Learning with scikit-learn']",https://www.datacamp.com/courses/feature-engineering-for-nlp-in-python,Machine Learning,Python
94,Feature Engineering in R,4,13,44,"1,671","3,500",Feature Engineering in R,"Feature Engineering in R
Feature engineering helps you uncover useful insights from your machine learning models. The model building process is iterative and requires creating new features using existing variables that make your model more efficient. In this course, you will explore different data sets and apply a variety of feature engineering techniques to both continuous and discrete variables.
In this chapter, you will learn how to change categorical features into numerical representations that models can interpret. You'll learn about one-hot encoding and using binning for categorical features.
In this chapter, you will learn how to manipulate numerical features to create meaningful features that can give better insights into your model. You will also learn how to work with dates in the context of feature engineering.
In this chapter, you will learn about using transformation techniques, like Box-Cox and Yeo-Johnson, to address issues with non-normally distributed features. You'll also learn about methods to scale features, including mean centering and z-score standardization.
In the final chapter, we will use feature crossing to create features from two or more variables. We will also discuss principal component analysis, and methods to explore and visualize those results.",[],"['Jose Hernandez', 'Chester Ismay', 'Amy Peterson']",[],[],https://www.datacamp.com/courses/feature-engineering-in-r,Machine Learning,R
95,Feature Engineering with PySpark,4,16,60,"3,549","5,000",Feature Engineering PySpark,"Feature Engineering with PySpark
The real world is messy and your job is to make sense of it. Toy datasets like MTCars and Iris are the result of careful curation and cleaning, even so the data needs to be transformed for it to be useful for powerful machine learning algorithms to extract meaning, forecast, classify or cluster. This course will cover the gritty details that data scientists are spending 70-80% of their time on; data wrangling and feature engineering. With size of datasets now becoming ever larger, let's use PySpark to cut this Big Data problem down to size!
Get to know a bit about your problem before you dive in! Then learn how to statistically and visually inspect your dataset!
Real data is rarely clean and ready for analysis. In this chapter learn to remove unneeded information, handle missing values and add additional data to your analysis.
In this chapter learn how to create new features for your machine learning model to learn from. We'll look at generating them by combining fields, extracting values from messy columns or encoding them for better results.
In this chapter we'll learn how to choose which type of model we want. Then we will learn how to apply our data to the model and evaluate it. Lastly, we'll learn how to interpret the results and save the model for later!",[],"['John Hogue', 'Adrián Soto', 'Nick Solomon']","[('2017 St Paul MN Real Estate Dataset', 'https://assets.datacamp.com/production/repositories/1704/datasets/d26c25f46746882d0a0f474cc6709c629f69872c/2017_StPaul_MN_Real_Estate.csv')]","['Supervised Learning with scikit-learn', 'Introduction to PySpark']",https://www.datacamp.com/courses/feature-engineering-with-pyspark,Data Manipulation,Python
96,Financial Analytics in R,4,17,59,"3,369","4,750",Financial Analytics in R,"Financial Analytics in R
This course is an introduction to the world of finance where cash is king and time is money. In this course, you will learn how to use R to quantify the value of projects, opportunities, and actions and drive decision-making. Students will use the R language to explore cashflow statements, compute profitability metrics, apply decision rules, and compare alternatives. You will end this case-motivated course with an understanding of key financial concepts and the skills needed to conceptualize an communicate the value of you or your teams' projects in a corporate setting.
Introducing the motivation for and basic concepts of discounted cashflow valuations analysis.
An overview of time-value of money and related concepts.
Understanding different ways to summarize cashflow output.
Piecing it altogether with sensitivty and scenario analysis.",[],"['Emily Riederer', 'Sascha Mayr', 'David Campos', 'Shon Inouye']",[],"['Introduction to R', 'Introduction to the Tidyverse']",https://www.datacamp.com/courses/financial-analytics-in-r,Applied Finance,R
97,Financial Analytics in Spreadsheets,4,15,56,"5,718","4,650",Financial Analytics in Spreadsheets,"Financial Analytics in Spreadsheets
Monitoring the evolution of traded assets is key in finance. In this course, you will learn how to build a graphical dashboard with spreadsheets to track the performance of financial securities. You will focus on historical prices and dividends of the hypothetical stock ABC. You will learn how to visualize its prices, how to measure essential reward and risk indicators, and see if your investment in ABC outperformed a benchmark index. At the end of the course, you should be able to use spreadsheets to build great monitoring tools used by traders and financial analysts in their day-to-day business life!
In the first chapter, you’ll be introduced to the problem: you have a time series of monthly (historical) prices for the hypothetical stock ABC from which you have to extract some meaningful information. You’ll be given some definitions (what is a stock? what are dividends?), and at the end of the chapter, you’ll be able to graphically represent the evolution of a stock price over a specific period.
In this chapter, the core of the analysis will switch from historical prices to historical returns. You’ll learn (and compute) the main performance indicators of past returns, both in terms of reward and risk. Finally, you’ll be introduced to risk-adjusted performance measures: indicators that take into account both reward and risk.
In this chapter, you'll look at the full distribution of historical returns. First, you’ll learn how to build a histogram to describe the distribution of historical returns. Second, you’ll be introduced to the Gaussian distribution, a commonly used model for stock returns. You'll visually inspect if the Gaussian model is reasonable for the ABC stock returns. Finally, you'll understand potential flaws with the Gaussian model.
In this final chapter, you’ll benchmark ABC stock against a market index and verify whether ABC outperformed the benchmark or not. The comparison process will be done through several steps/metrics. First, you’ll analyze the cumulative wealth. Next, you’ll extend the comparison using different indicators such as Sharpe Ratio and Drawdown. Finally, you’ll examine the linear relation between ABC stock and the benchmark through the correlation coefficient. At the end of the chapter, you’ll be introduced to more powerful and advanced spreadsheet features that introduce interactivity in your analysis.",[],"['David Ardia', 'Riccardo Mancini', 'Chester Ismay', 'Sara Billen']","[('Stock ABC', 'https://assets.datacamp.com/production/repositories/3915/datasets/51f1898dae27f03a058601c2a7585f4775a1afe9/Dataset.csv')]",['Intermediate Spreadsheets for Data Science'],https://www.datacamp.com/courses/financial-analytics-in-spreadsheets,Applied Finance,Spreadsheets
98,Financial Forecasting in Python,4,12,49,"2,492","4,050",Financial Forecasting,"Financial Forecasting in Python
In Financial Forecasting in Python, you will step into the role of CFO and learn how to advise a board of directors on key metrics while building a financial forecast, the basics of income statements and balance sheets, and cleaning messy financial data. During the course, you will examine real-life datasets from Netflix, Tesla, and Ford, using the pandas package. Following the course, you will be able to calculate financial metrics, work with assumptions and variances, and build your own forecast in Python!
In this chapter, we will learn the basics of financial statements, with a specific focus on the income statement, which provides details on our sales, costs, and profits. We will learn how to calculate profitability metrics and finish off what we have learned by building our profit forecast for Tesla!
In this chapter, we will learn a bit more about the balance sheet, covering assets and liabilities and specific ratios to help evaluate the financial health and efficiency of a company, as well as how these ratios can assist us in building a great forecast.
We have gotten a basic understanding of income statements and balance sheets. However, consolidating data for forecasting is complex, so in this chapter, we will look at some basic tools to help solve some of the complexities specifically relating to finance - working with dates and different financial periods, and formatting our raw data into the correct format for financial forecasting.
In this chapter, we will be exploring two more aspects to creating a good forecast. First, we will look at assumptions, what drives them and what happens when an assumption changes? Next, we will look at variances, as a forecast is built at one point in time, but what happens when the actual results do not correspond to our forecast? We need to build a sensitive forecast that can be sensitive to changes in both assumptions and take into account variances, and this is what we will explore in this chapter.",[],"['Victoria Clark', 'Becca Robins', 'Sara Snell']","[('Ford Balance Sheet', 'https://assets.datacamp.com/production/repositories/1882/datasets/9f3f116318e2471b55d9f0a6c5c709d4cbfb94b7/F-Balance-Sheet.csv'), ('Netflix Forecast', 'https://assets.datacamp.com/production/repositories/1882/datasets/21336aacbe41c511358594c5baead41b7673f89b/Netflix.csv'), ('Tesla Income Statement', 'https://assets.datacamp.com/production/repositories/1882/datasets/c87f9f462d0a8e04b1595ac86b2fa2fbfde75737/TSLA-Income-Statement.csv')]","['Introduction to Python', 'Intermediate Python for Data Science']",https://www.datacamp.com/courses/financial-forecasting-in-python,Applied Finance,Python
99,Financial Modeling in Spreadsheets,4,13,52,862,"4,550",Financial Modeling in Spreadsheets,"Financial Modeling in Spreadsheets
Have you ever wanted to plan for retirement, understand the stock market, or create a cash flow for your business? In this course, you will learn how to build business and financial models in Sheets. Google Sheets is an excellent technology for business models! You can create a framework for your goal, like understanding the growth of investments, and then update that framework based on current data. You will learn the basics of business modeling focusing on cash flows, investments, annuities, loan amortization, and saving for retirement. By the end of the course, you will have gained referencing and function skills in Sheets that you can apply to all sorts of models.
An introduction to modeling financial statements in Sheets focusing on balance and income statements, which help create cash flow models.
Learn Sheet's financial model functions by creating investment models with the fv, pv, pmt, and nper functions. You will also learn how to pay off debts in a loan amortization table.
Saving for retirement is tricky, but in this chapter, you will learn how to create models that help you plan to save and use your money after retirement.
Stock prices go up and down but can we model them? Learn about volatility and simulating stock prices in this final chapter.",[],"['Erin Buchanan', 'Chester Ismay', 'Amy Peterson']",[],"['Spreadsheet Basics', 'Data Analysis with Spreadsheets', 'Intermediate Spreadsheets for Data Science']",https://www.datacamp.com/courses/financial-modeling-in-spreadsheets,Applied Finance,Spreadsheets
100,Financial Trading in R,5,20,65,"14,094","5,050",Financial Trading in R,"Financial Trading in R
This course will cover the basics on financial trading and will give you an overview of how to use quantstrat to build signal-based trading strategies in R. It will teach you how to set up a quantstrat strategy, apply transformations of market data called indicators, create signals based on the interactions of those indicators, and even simulate orders. Lastly, it will explain how to analyze your results both from statistical and visual perspectives.
In this chapter, you will learn the definition of trading, the philosophies of trading, and the pitfalls that exist in trading. This chapter covers both momentum and oscillation trading, along with some phrases to identify these types of philosophies. You will learn about overfitting and how to avoid it, obtaining and plotting financial data, and using a well-known indicator in trading.
Before building a strategy, the quantstrat package requires you to initialize some settings. In this chapter you will learn how this is done. You will cover a series of functions that deal with initializing a time zone, currency, the instruments you'll be working with, along with quantstrat's various frameworks that will allow it to perform analytics. Once this is done, you will have the knowledge to set up a quantstrat initialization file, and know how to change it.
Indicators are crucial for your trading strategy. They are transformations of market data that allow a clearer understanding of its overall behavior, usually in exchange for lagging the market behavior. Here, you will be working with both trend types of indicators as well as oscillation indicators. You will also learn how to use pre-programmed indicators available in other libraries as well as implement one of your own.
When constructing a quantstrat strategy, you want to see how the market interacts with indicators and how indicators interact with each other. In this chapter you'll learn how indicators can generate signals in quantstrat. Signals are interactions of market data with indicators, or indicators with other indicators. There are four types of signals in quantstrat: sigComparison, sigCrossover, sigThreshold, and sigFormula. By the end of this chapter, you'll know all about these signals, what they do, and how to use them.
In this chapter, you'll learn how to shape your trading transaction once you decide to execute on a signal. This chapter will cover a basic primer on rules, and how to enter and exit positions. You'll also learn how to send inputs to order-sizing functions. By the end of this chapter, you'll learn the gist of how rules function, and where you can continue learning about them.
After a quantstrat strategy has been constructed, it's vital to know how to actually analyze the strategy's performance. This chapter details just that. You will learn how to read vital trade statistics, and view the performance of your trading strategy over time. You will also learn how to get a reward to risk ratio called the Sharpe ratio in two different ways. This is the last chapter.","['Applied Finance with R', 'Quantitative Analyst with R']","['Ilya Kipnis', 'Lore Dirick']","[('SPY data from 2000 through 2016', 'https://assets.datacamp.com/production/repositories/378/datasets/add0628410cb0ca07efffaf6756517c455186eb5/spy_000101_160630.RData')]","['Introduction to R for Finance', 'Intermediate R for Finance']",https://www.datacamp.com/courses/financial-trading-in-r,Applied Finance,R
101,Forecasting Product Demand in R,4,13,50,"4,083","4,200",Forecasting Product Demand in R,"Forecasting Product Demand in R
Accurately predicting demand for products allows a company to stay ahead of the market. By knowing what things shape demand, you can drive behaviors around your products better. This course unlocks the process of predicting product demand through the use of R. You will learn how to identify important drivers of demand, look at seasonal effects, and predict demand for a hierarchy of products from a real world example. By the end of the course you will be able to predict demand for multiple products across a region of a state in the US. Then you will roll up these predictions across many different regions of the same state to form a complete hierarchical forecasting system.
When it comes to forecasting, time series modeling is a great place to start! You need to forecast out the future values of sales demand and a good baseline approach would be ARIMA models. In this chapter you'll learn how to quickly implement ARIMA models and get good initial forecasts for future product demand.
Economic theory has a lot to say about predicting values of demand. Obviously, external factors like price, seasonality, and timing of promotions will drive some aspects of product demand. In this chapter you'll learn about the basics around price elasticity models and how to incorporate seasonality and promotion timing factors into our product demand forecasts.
Time series models and pricing regressions don't have to be thought of as separate approaches to product demand forecasting. They can be combined! In this chapter you'll learn about two ways of ""combining"" the information gained in both modeling approaches - transfer functions and forecast ensembling.
Everything up until this point deals with making individual models for forecasting product demand. However, we haven't taken advantage of the fact that all of these products form a product hierarchy of sales. Products make up regions and regions make up states. How can we ensure that our forecasts reconcile correctly up and down the hierarchy? In this chapter you'll learn about hierarchical forecasting and how to use it to your advantage in forecasting product demand.",[],"['Aric LaBarr', 'Yashas Roy', 'Richie Cotton']","[('Beverage producer sales', 'https://assets.datacamp.com/production/course_6021/datasets/Bev.csv')]",['Intermediate R'],https://www.datacamp.com/courses/forecasting-product-demand-in-r,Probability & Statistics,R
102,Forecasting Using ARIMA Models in Python,4,15,57,"1,716","4,850",Forecasting Using ARIMA Models,"Forecasting Using ARIMA Models in Python
Have you ever tried to predict the future? What lies ahead is a mystery which is usually only solved by waiting. In this course, you will stop waiting and learn to use the powerful ARIMA class models to forecast the future. You will learn how to use the statsmodels package to analyze time series, to build tailored models, and to forecast under uncertainty. How will the stock market move in the next 24 hours? How will the levels of CO2 change in the next decade? How many earthquakes will there be next year? You will learn to solve all these problems and more.
Dive straight in and learn about the most important properties of time series. You'll learn about stationarity and how this is important for ARMA models. You'll learn how to test for stationarity by eye and with a standard statistical test. Finally, you'll learn the basic structure of ARMA models and use this to generate some ARMA data and fit an ARMA model.
What lies ahead in this chapter is you predicting what lies ahead in your data. You'll learn how to use the elegant statsmodels package to fit ARMA, ARIMA and ARMAX models. Then you'll use your models to predict the uncertain future of stock prices!
In this chapter, you will become a modeler of discerning taste. You'll learn how to identify promising model orders from the data itself, then, once the most promising models have been trained, you'll learn how to choose the best model from this fitted selection. You'll also learn a great framework for structuring your time series projects.
In this final chapter, you'll learn how to use seasonal ARIMA models to fit more complex data. You'll learn how to decompose this data into seasonal and non-seasonal parts and then you'll get the chance to utilize all your ARIMA tools on one last global forecast challenge.",[],"['James Fulton', 'Chester Ismay', 'Adel Nehme']","[('US Monthly Candy Production', 'https://assets.datacamp.com/production/repositories/4567/datasets/0707fe926ef5f110ed889fcd2a09c9417e2ffbb6/candy_production.csv'), ('Monthly Record of CO2', 'https://assets.datacamp.com/production/repositories/4567/datasets/d358460aae958f23ba20968aba924cd3eea2e969/co2.csv'), ('Amazon Daily Closing Stock Price', 'https://assets.datacamp.com/production/repositories/4567/datasets/4543d63de229cec637e58f90973b64417e5dc24c/amazon_close.csv'), ('Monthly Milk Production', 'https://assets.datacamp.com/production/repositories/4567/datasets/1213fc15035051ef7fe5a0dac44176df7223a93a/milk_production.csv'), ('Yearly Earthquakes', 'https://assets.datacamp.com/production/repositories/4567/datasets/96dadbe9fcb8985ff2f89c5c9f5ada3d4180e65a/earthquakes.csv')]",['Supervised Learning with scikit-learn'],https://www.datacamp.com/courses/forecasting-using-arima-models-in-python,Machine Learning,Python
103,Forecasting Using R,5,18,55,"24,397","4,450",Forecasting Using R,"Forecasting Using R
Forecasting involves making predictions about the future. It is required in many situations: deciding whether to build another power generation plant in the next ten years requires forecasts of future demand; scheduling staff in a call centre next week requires forecasts of call volumes; stocking an inventory requires forecasts of stock requirements. Forecasts can be required several years in advance (for the case of capital investments), or only a few minutes beforehand (for telecommunication routing). Whatever the circumstances or time horizons involved, forecasting is an important aid to effective and efficient planning. This course provides an introduction to time series forecasting using R.
The first thing to do in any data analysis task is to plot the data. Graphs enable many features of the data to be visualized, including patterns, unusual observations, and changes over time. The features that are seen in plots of the data must then be incorporated, as far as possible, into the forecasting methods to be used.
In this chapter, you will learn general tools that are useful for many different forecasting situations. It will describe some methods for benchmark forecasting, methods for checking whether a forecasting method has adequately utilized the available information, and methods for measuring forecast accuracy. Each of the tools discussed in this chapter will be used repeatedly in subsequent chapters as you develop and explore a range of forecasting methods.
Forecasts produced using exponential smoothing methods are weighted averages of past observations, with the weights decaying exponentially as the observations get older. In other words, the more recent the observation, the higher the associated weight. This framework generates reliable forecasts quickly and for a wide range of time series, which is a great advantage and of major importance to applications in business.
ARIMA models provide another approach to time series forecasting. Exponential smoothing and ARIMA models are the two most widely-used approaches to time series forecasting, and provide complementary approaches to the problem. While exponential smoothing models are based on a description of the trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data.
The time series models in the previous chapters work well for many time series, but they are often not good for weekly or hourly data, and they do not allow for the inclusion of other information such as the effects of holidays, competitor activity, changes in the law, etc. In this chapter, you will look at some methods that handle more complicated seasonality, and you consider how to extend ARIMA models in order to allow other information to be included in the them.","['Quantitative Analyst with R', 'Time Series with R']","['Rob J. Hyndman', 'Lore Dirick', 'Davis Vaughan']","[('Excelfile in the first exercise', 'https://assets.datacamp.com/production/repositories/684/datasets/d46ad7146f174e01407d01b7a8ef906f0bb7cdd6/exercise1.xlsx')]","['Introduction to R', 'Intermediate R']",https://www.datacamp.com/courses/forecasting-using-r,Probability & Statistics,R
104,Foundations of Functional Programming with purrr,4,13,44,"3,081","3,750",Functional Programming purrr,"Foundations of Functional Programming with purrr
Lists can be difficult to both understand and manipulate, but they can pack a ton of information and are very powerful. In this course, you will learn to easily extract, summarize, and manipulate lists and how to export the data to your desired object, be it another list, a vector, or even something else! Throughout the course, you will work with the purrr package and a variety of datasets from the repurrrsive package, including data from Star Wars and Wes Anderson films and data collected about GitHub users and GitHub repos. Following this course, your list skills will be purrrfect!
Iteration is a powerful way to make the computer do the work for you. It can also be an area of coding where it is easy to make lots of typos and simple mistakes. The purrr package helps simplify iteration so you can focus on the next step, instead of finding typos.
purrr is much more than a for loop; it works well with pipes, we can use it to run models and simulate data, and make nested loops!
Like anything in R, understanding how to troubleshoot issues is an important skill set. This can be particularly important with lists, where finding the problem can be tricky.
Now that you have the building blocks, we will start tackling some more complex data problems with purrr.",['Intermediate Tidyverse Toolbox'],"['DataCamp Content Creator', 'Chester Ismay', 'Becca Robins']","[('Simulated data 1990-2005', 'https://assets.datacamp.com/production/repositories/1858/datasets/24e986c962c2acc48ee76ec01363e23ab73a4319/simulated_data_from_1990_to_2005.zip')]",['Introduction to the Tidyverse'],https://www.datacamp.com/courses/foundations-of-functional-programming-with-purrr,Programming,R
105,Foundations of Inference,4,17,58,"15,644","4,350",Inference,"Foundations of Inference
One of the foundational aspects of statistical analysis is inference, or the process of drawing conclusions about a larger population from a sample of data. Although counter intuitive, the standard practice is to attempt to disprove a research claim that is not of interest. For example, to show that one medical treatment is better than another, we can assume that the two treatments lead to equal survival rates only to then be disproved by the data. Additionally, we introduce the idea of a p-value, or the degree of disagreement between the data and the hypothesis. We also dive into confidence intervals, which measure the magnitude of the effect of interest (e.g. how much better one treatment is than another).
In this chapter, you will investigate how repeated samples taken from a population can vary. It is the variability in samples that allow you to make claims about the population of interest. It is important to remember that the research claims of interest focus on the population while the information available comes only from the sample data.
In this chapter, you will gain the tools and knowledge to complete a full hypothesis test. That is, given a dataset, you will know whether or not is appropriate to reject the null hypothesis in favor of the research claim of interest.
You will continue learning about hypothesis testing with a new example and the same structure of randomization tests. In this chapter, however, the focus will be on different errors (type I and type II), how they are made, when one is worse than another, and how things like sample size and effect size impact the error rates.
As a complement to hypothesis testing, confidence intervals allow you to estimate a population parameter. Recall that your interest is always in some characteristic of the population, but you only have incomplete information to estimate the parameter using sample data. Here, the parameter is the true proportion of successes in a population. Bootstrapping is used to estimate the variability needed to form the confidence interval.",['Statistical Inference with R'],"['Jo Hardin', 'Nick Carchedi', 'Tom Jeon']","[('All polls', 'https://assets.datacamp.com/production/repositories/538/datasets/9737cf05b3899a5057110feb8dd27aa5dfe107b8/all_polls.rds'), ('Polling data', 'https://assets.datacamp.com/production/repositories/538/datasets/b1071ca5cb72143820e33fd7c6605dc4b3f11b7a/all_polls.RData'), ('Big discrimination dataset', 'https://assets.datacamp.com/production/repositories/538/datasets/f03da8fc4a2ae50a3ddf775324f4df90c96f7f26/disc_big.rds'), ('New discrimination dataset', 'https://assets.datacamp.com/production/repositories/538/datasets/60566129b391ef827ea9c8a9846608dee24ce34a/disc_new.rds'), ('Small discrimination dataset', 'https://assets.datacamp.com/production/repositories/538/datasets/543fa990550c61f6a2cc175b0a0414528f8094c0/disc_small.rds')]","['Introduction to R', 'Introduction to Data', 'Exploratory Data Analysis', 'Correlation and Regression']",https://www.datacamp.com/courses/foundations-of-inference,Probability & Statistics,R
106,Foundations of Predictive Analytics in Python (Part 1),4,14,52,"4,732","4,100",Predictive Analytics,"Foundations of Predictive Analytics in Python (Part 1)
In this course, you will learn how to build a logistic regression model with meaningful variables. You will also learn how to use this model to make predictions and how to present it and its performance to business stakeholders.
In this Chapter, you'll learn the basics of logistic regression: how can you predict a binary target with continuous variables and, how should you interpret this model and use it to make predictions for new examples?
In this chapter you'll learn why variable selection is crucial for building a useful model. You'll also learn how to implement forward stepwise variable selection for logistic regression and how to decide on the number of variables to include in your final model.
Now that you know how to build a good model, you should convince stakeholders to use it by creating appropriate graphs. You will learn how to construct and interpret the cumulative gains curve and lift graph.
In a business context, it is often important to explain the intuition behind the model you built. Indeed, if the model and its variables do not make sense, the model might not be used. In this chapter you'll learn how to explain the relationship between the variables in the model and the target by means of predictor insight graphs.",[],"['Nele Verbiest', 'Lore Dirick', 'Nick Solomon', 'Hadrien Lacroix']","[('Example basetable', 'https://assets.datacamp.com/production/repositories/1441/datasets/7abb677ec52631679b467c90f3b649eb4f8c00b2/basetable_ex2_4.csv')]",['Intermediate Python for Data Science'],https://www.datacamp.com/courses/foundations-of-predictive-analytics-in-python-part-1,Machine Learning,Python
107,Foundations of Predictive Analytics in Python (Part 2),4,15,56,"1,313","4,350",Predictive Analytics,"Foundations of Predictive Analytics in Python (Part 2)
Building good models only succeeds if you have a decent base table to start with. In this course you will learn how to construct a good base table, create variables and prepare your data for modeling. We finish with advanced topics on the matter. If you have not already, you should take Foundations of Predictive Analytics in Python (Part 1) first.
In this chapter you will learn how to construct the foundations of your base table, namely the population and the target.
You will learn how to add variables to the base table that you can use to predict the target.
Once you derived variables from the raw data, it is time to clean the data and prepare it for modeling. In this Chapter we discuss the steps that need to be taken to make your data modeling-ready.
In some cases, the target or variables change heavily with the seasons. You will learn how you can deal with seasonality by adding different snapshots to the base table",[],"['Nele Verbiest', 'Hadrien Lacroix', 'Nick Solomon', 'Lore Dirick']","[('Donor IDs', 'https://assets.datacamp.com/production/repositories/1602/datasets/a83c1416e5a3ee2a7b8286aa72de97d2dd8eab45/basetable.csv'), ('Basetable with countries and age', 'https://assets.datacamp.com/production/repositories/1602/datasets/8d94f1d90fcc065416296e29cd1b3fef13cdbd16/basetable_interactions.csv'), ('Basetable used in Ex 2.13', 'https://assets.datacamp.com/production/repositories/1602/datasets/1066b658e4359261e54bb2f303812f4f6e3b6cf9/basetable_ex_2_13.csv'), ('Living place of donors', 'https://assets.datacamp.com/production/repositories/1602/datasets/cc4f3b53a8818e584bed85b75173b217e645216a/living_places.csv'), ('Donations', 'https://assets.datacamp.com/production/repositories/1602/datasets/e828af7f273445328bbe8648f0fc318a6d7741a5/gifts.csv')]","['Intermediate Python for Data Science', 'Foundations of Predictive Analytics in Python (Part 1)']",https://www.datacamp.com/courses/foundations-of-predictive-analytics-in-python-part-2,Machine Learning,Python
108,Foundations of Probability in Python,5,16,61,"1,053","5,050",Probability,"Foundations of Probability in Python
Probability is the study of regularities that emerge in the outcomes of random experiments. In this course, you'll learn about fundamental probability concepts like random variables (starting with the classic coin flip example) and how to calculate mean and variance, probability distributions, and conditional probability. We'll also explore two very important results in probability: the law of large numbers and the central limit theorem. Since probability is at the core of data science and machine learning, these concepts will help you understand and apply models more robustly. Chances are everywhere, and the study of probability will change the way you see the world. Let’s get random!
A coin flip is the classic example of a random experiment. The possible outcomes are heads or tails. This type of experiment, known as a Bernoulli or binomial trial, allows us to study problems with two possible outcomes, like “yes” or “no” and “vote” or “no vote.” This chapter introduces Bernoulli experiments, binomial distributions to model multiple Bernoulli trials, and probability simulations with the scipy library.
In this chapter you'll learn to calculate various kinds of probabilities, such as the probability of the intersection of two events and the sum of probabilities of two events, and to simulate those situations. You'll also learn about conditional probability and how to apply Bayes' rule.
Until now we've been working with binomial distributions, but there are many probability distributions a random variable can take. In this chapter we'll introduce three more that are related to the binomial distribution: the normal, Poisson, and geometric distributions.
No that you know how to calculate probabilities and important properties of probability distributions, we'll introduce two important results: the law of large numbers and the central limit theorem. This will expand your understanding on how the sample mean converges to the population mean as more data is available and how the sum of random variables behaves under certain conditions.
We will also explore connections between linear and logistic regressions as applications of probability and statistics in data science.",[],"['Alexander A. Ramírez M.', 'Hillary Green-Lerman', 'Adrián Soto']",[],"['Intermediate Python for Data Science', 'Statistical Thinking in Python (Part 1)']",https://www.datacamp.com/courses/foundations-of-probability-in-python,Probability & Statistics,Python
109,Foundations of Probability in R,4,13,54,"11,746","4,350",Probability in R,"Foundations of Probability in R
Probability is the study of making predictions about random phenomena. In this course, you'll learn about the concepts of random variables, distributions, and conditioning, using the example of coin flips. You'll also gain intuition for how to solve probability problems through random simulation. These principles will help you understand statistical inference and can be applied to draw conclusions from data.
One of the simplest and most common examples of a random phenomenon is a coin flip: an event that is either ""yes"" or ""no"" with some probability. Here you'll learn about the binomial distribution, which describes the behavior of a combination of yes/no trials and how to predict and simulate its behavior.
In this chapter you'll learn to combine multiple probabilities, such as the probability two events both happen or that at least one happens, and confirm each with random simulations. You'll also learn some of the properties of adding and multiplying random variables.
Bayesian statistics is a mathematically rigorous method for updating your beliefs based on evidence. In this chapter, you'll learn to apply Bayes' theorem to draw conclusions about whether a coin is fair or biased, and back it up with simulations.
So far we've been talking about the binomial distribution, but this is one of many probability distributions a random variable can take. In this chapter we'll introduce three more that are related to the binomial: the normal, the Poisson, and the geometric.",['Probability and Distributions with R'],"['David Robinson', 'Nick Carchedi', 'Tom Jeon', 'Nick Solomon']",[],['Introduction to R'],https://www.datacamp.com/courses/foundations-of-probability-in-r,Probability & Statistics,R
110,Fraud Detection in Python,4,16,57,"5,263","4,800",Fraud Detection,"Fraud Detection in Python
A typical organization loses an estimated 5% of its yearly revenue to fraud. In this course, you will learn how to fight fraud by using data. For example, you'll learn how to apply supervised learning algorithms to detect fraudulent behavior similar to past ones, as well as unsupervised learning methods to discover new types of fraud activities. Moreover, in fraud analytics you often deal with highly imbalanced datasets when classifying fraud versus non-fraud, and during this course you will pick up some techniques on how to deal with that. The course provides a mix of technical and theoretical insights and shows you hands-on how to practically implement fraud detection models. In addition, you will get tips and advice from real-life experience to help you prevent making common mistakes in fraud analytics.
In this chapter, you''ll learn about the typical challenges associated with fraud detection, and will learn how to resample your data in a smart way, to tackle problems with imbalanced data.
Now that you're familiar with the main challenges of fraud detection, you're about to learn how to flag fraudulent transactions with supervised learning. You will use classifiers, adjust them and compare them to find the most efficient fraud detection model.
This chapter focuses on using unsupervised learning techniques to detect fraud. You will segment customers, use K-means clustering and other clustering algorithms to find suspicious occurrences in your data.
In this final chapter, you will use text data, text mining and topic modeling to detect fraudulent behavior.",[],"['Charlotte Werger', 'Hadrien Lacroix', 'Mari Nazary']","[('Chapter 1 datasets', 'https://assets.datacamp.com/production/repositories/2162/datasets/cc3a36b722c0806e4a7df2634e345975a0724958/chapter_1.zip'), ('Chapter 2 datasets', 'https://assets.datacamp.com/production/repositories/2162/datasets/4fb6199be9b89626dcd6b36c235cbf60cf4c1631/chapter_2.zip'), ('Chapter 3 datasets', 'https://assets.datacamp.com/production/repositories/2162/datasets/08cfcd4158b3a758e72e9bd077a9e44fec9f773b/chapter_3.zip'), ('Chapter 4 datasets', 'https://assets.datacamp.com/production/repositories/2162/datasets/94f2356652dc9ea8f0654b5e9c29645115b6e77f/chapter_4.zip')]","['Supervised Learning with scikit-learn', 'Unsupervised Learning in Python']",https://www.datacamp.com/courses/fraud-detection-in-python,Machine Learning,Python
111,Fraud Detection in R,4,16,49,"2,973","3,900",Fraud Detection in R,"Fraud Detection in R
The Association of Certified Fraud Examiners estimates that fraud costs organizations worldwide $3.7 trillion a year and that a typical company loses five percent of annual revenue due to fraud. Fraud attempts are expected to even increase further in future, making fraud detection highly necessary in most industries. This course will show how learning fraud patterns from historical data can be used to fight fraud. Some techniques from robust statistics and digit analysis are presented to detect unusual observations that are likely associated with fraud. Two main challenges when building a supervised tool for fraud detection are the imbalance or skewness of the data and the various costs for different types of misclassification. We present techniques to solve these issues and focus on artificial and real datasets from a wide variety of fraud applications.
This chapter will first give a formal definition of fraud. You will then learn how to detect anomalies in the type of payment methods used or the time these payments are made to flag suspicious transactions.
In the second chapter, you will learn how to use networks to fight fraud. You will visualize networks and use a sociology concept called homophily to detect fraudulent transactions and catch fraudsters.
Fortunately, fraud occurrences are rare. However, this means that you're working with imbalanced data, which if left as is will bias your detection models. In this chapter, you will tackle imbalance using over and under-sampling methods.
In this final chapter, you will learn about a surprising mathematical law used to detect suspicious occurrences. You will then use robust statistics to make your models even more bulletproof.",[],"['Bart Baesens', 'Sebastiaan Höppner', 'Tim Verdonck', 'Hadrien Lacroix', 'Sara Billen', 'Chester Ismay']","[('Chapter 1 datasets', 'https://assets.datacamp.com/production/repositories/2913/datasets/df95c1b620b0496b485557220a39222788491cb1/chapter_1.zip'), ('Chapter 2 datasets', 'https://assets.datacamp.com/production/repositories/2913/datasets/cd4fb1a9ddaf3c2c6ef3a1e8f3542fa1f10cdf5a/chapter_2.zip'), ('Chapter 3 datasets', 'https://assets.datacamp.com/production/repositories/2913/datasets/1885873fd937a3fa2c94c3581dd8309b81b1e091/chapter_3.zip'), ('Chapter 4 datasets', 'https://assets.datacamp.com/production/repositories/2913/datasets/70e2b476999f68e1b74b4ee321aa30830727817c/chapter_4.zip')]","['Introduction to the Tidyverse', 'Multiple and Logistic Regression', 'Unsupervised Learning in R']",https://www.datacamp.com/courses/fraud-detection-in-r,Machine Learning,R
112,Fundamentals of AI,4,14,49,120,"3,350",Fundamentals of AI,"Fundamentals of AI
So what is all this AI fuss about? Machine Learning, Deep Learning, Predictive Analytics -- what is the reality behind the hype? How do machines actually learn and what are their limits? How can we use Machine Learning to recognize written digits, predict customer churn and find structure in Elon Musk's tweets? All this -- and much more -- is the topic of this course, which will introduce you to the world of AI in a gentle, but firm and very practical manner.
Understand the definition of AI ( “general” and “narrow”), the relationship between AI and Machine Learning, and will the robots take over the world - soon?
Learn about supervised learning, work with labeled data and train regression models.
Learn about unsupervised learning, divide data into clusters, detect anomalies and select the right model for the job.
Learn about deep learning, create your first neural networks, and train a model to recognize digits.",[],"['Nemanja Radojković', 'Hadrien Lacroix', 'Hillary Green-Lerman']","[('Customer Churn', 'https://assets.datacamp.com/production/repositories/3866/datasets/252c7d50740da7988d71174d15184247463d975c/WA_Fn-UseC_-Telco-Customer-Churn.csv'), ('MNIST', 'https://assets.datacamp.com/production/repositories/3866/datasets/28eb967447024b20ba4071bebc1bf2e855ac3ceb/MNIST_5k.csv')]",[],https://www.datacamp.com/courses/fundamentals-of-ai,Machine Learning,Python
113,Fundamentals of Bayesian Data Analysis in R,4,23,58,"8,311","4,450",Fundamentals of Bayesian Data Analysis in R,"Fundamentals of Bayesian Data Analysis in R
Bayesian data analysis is an approach to statistical modeling and machine learning that is becoming more and more popular. It provides a uniform framework to build problem specific models that can be used for both statistical inference and for prediction. This course will introduce you to Bayesian data analysis: What it is, how it works, and why it is a useful tool to have in your data science toolbox.
This chapter will introduce you to Bayesian data analysis and give you a feel for how it works.
In this chapter we will take a detailed look at the foundations of Bayesian inference.
This chapter will show you four reasons why Bayesian data analysis is a useful tool to have in your data science tool belt.
Learn what Bayes theorem is all about and how to use it for statistical inference.
Learn about using the Normal distribution to analyze continuous data and try out a tool for practical Bayesian analysis in R.",[],"['Rasmus Bååth', 'Chester Ismay', 'Nick Solomon']",[],['Introduction to R'],https://www.datacamp.com/courses/fundamentals-of-bayesian-data-analysis-in-r,Probability & Statistics,R
114,GARCH Models in R,4,16,60,"2,093","4,550",GARCH Models in R,"GARCH Models in R
Are you curious about the rhythm of the financial market's heartbeat? Do you want to know when a stable market becomes turbulent? In this course on GARCH models you will learn the forward looking approach to balancing risk and reward in financial decision making. The course gradually moves from the standard normal GARCH(1,1) model to more advanced volatility models with a leverage effect, GARCH-in-mean specification and the use of the skewed student t distribution for modelling asset returns. Applications on stock and exchange rate returns include portfolio optimization, rolling sample forecast evaluation, value-at-risk forecasting and studying dynamic covariances.
We start off by making our hands dirty. A rolling window analysis of daily stock returns shows that its standard deviation changes massively through time. Looking back at the past, we thus have clear evidence of time-varying volatility. Looking forward, we need to estimate the volatility of future returns. This is essentially what a GARCH model does! In this chapter, you will learn the basics of using the rugarch package for specifying and estimating the workhorse GARCH(1,1) model in R. We end by showing its usefulness in tactical asset allocation.
Markets take the stairs up and the elevator down. This Wallstreet wisdom has important consequences for specifying a realistic volatility model. It requires to give up the assumption of normality, as well as the symmetric response of volatility to shocks. In this chapter, you will learn about GARCH models with a leverage effect and skewed student t innovations. At the end, you will be able to use GARCH models for estimating over ten thousand different GARCH model specifications.
GARCH models yield volatility forecasts which serve as input for financial decision making. Their use in practice requires to first evaluate the goodness of the volatility forecast. In this chapter, you will learn about the analysis of statistical significance of the estimated GARCH parameters, the properties of standardized returns, the interpretation of information criteria and the use of rolling GARCH estimation and mean squared prediction errors to analyze the accuracy of the volatility forecast.
At this stage, you master the standard specification, estimation and validation of GARCH models in the rugarch package. This chapter introduces specific rugarch functionality for making value-at-risk estimates, for using the GARCH model in production and for simulating GARCH returns. You will also discover that the presence of GARCH dynamics in the variance has implications for simulating log-returns, the estimation of the beta of a stock and finding the minimum variance portfolio.",[],"['Kris Boudt', 'Hadrien Lacroix', 'Sara Billen', 'Chester Ismay']","[('Daily EUR/USD returns', 'https://assets.datacamp.com/production/repositories/3066/datasets/661d985976cb697abc44abcc2a34170086813dd6/EURUSDret.Rdata'), ('Daily Microsoft returns', 'https://assets.datacamp.com/production/repositories/3066/datasets/5a9a26d972a80e17d6ca316632d36781e6119fc0/msftret.Rdata'), ('S&P 500 prices', 'https://assets.datacamp.com/production/repositories/3066/datasets/39bd6105e3d2f79bb679f9f95426807335f0fd19/sp500prices.Rdata'), ('S&P 500 returns', 'https://assets.datacamp.com/production/repositories/3066/datasets/c3d1811c6fd860f6a9eb3fa97a553d8db855a457/sp500ret.Rdata'), ('Simulated return data', 'https://assets.datacamp.com/production/repositories/3066/datasets/a3261cd3c152d9124c9c0542aabb0c4bd729165d/ret.Rdata')]","['Introduction to Time Series Analysis', 'Manipulating Time Series Data in R with xts & zoo']",https://www.datacamp.com/courses/garch-models-in-r,Applied Finance,R
115,Generalized Linear Models in Python,5,16,59,"1,056","4,950",Generalized Linear Models,"Generalized Linear Models in Python
Imagine being able to handle data where the response variable is either binary, count, or approximately normal, all under one single framework. Well, you don't have to imagine. Enter the Generalized Linear Models in Python course! In this course you will extend your regression toolbox with the logistic and Poisson models, by learning how to fit, understand, assess model performance and finally use the model to make predictions on new data. You will practice using data from real world studies such the largest population poisoning in world's history, nesting of horseshoe crabs and counting the bike crossings on the bridges in New York City.
Review linear models and learn how GLMs are an extension of the linear model given different types of response variables. You will also learn the building blocks of GLMs and the technical process of fitting a GLM in Python.
This chapter focuses on logistic regression. You'll learn about the structure of binary data, the logit link function, model fitting, as well as how to interpret model coefficients, model inference, and how to assess model performance.
Here you'll learn about Poisson regression, including the discussion on count data, Poisson distribution and the interpretation of the model fit. You'll also learn how to overcome problems with overdispersion. Finally, you'll get hands-on experience with the process of model visualization.
In this final chapter you'll learn how to increase the complexity of your model by adding more than one explanatory variable. You'll practice with the problem of multicollinearity, and with treating categorical and interaction terms in your model.",[],"['Ita Cirovic Donev', 'Chester Ismay', 'Adrián Soto']","[('Well switch due to arsenic poisoning', 'https://assets.datacamp.com/production/repositories/4047/datasets/8d608ed3e4e960e9e5d4f1730cb354154faa374f/wells.csv'), ('Nesting of the female horseshoe crab', 'https://assets.datacamp.com/production/repositories/4047/datasets/3dabb99855f48ca92bd8bf123a2cfacfea3ef273/crab.csv'), ('Credit default', 'https://assets.datacamp.com/production/repositories/4047/datasets/a0614614e3917196f66b29ce26d4d5244b85188b/default.csv'), ('Level of salary and years of work experience', 'https://assets.datacamp.com/production/repositories/4047/datasets/7b1962e80528b839bf82e9ed9f1a65968f9aa087/salary.csv'), ('Medical costs per person given age and BMI', 'https://assets.datacamp.com/production/repositories/4047/datasets/9ce36c042c7db33260ef69a27b1918dfc08e7cab/insurance.csv'), ('Bike crossings in New York City', 'https://assets.datacamp.com/production/repositories/4047/datasets/8d069534eea69e7a946dc2f8f9d4f8b594f62d37/bike.csv')]","['Statistical Thinking in Python (Part 2)', 'Introduction to Linear Modeling in Python']",https://www.datacamp.com/courses/generalized-linear-models-in-python,Machine Learning,Python
116,Generalized Linear Models in R,4,14,51,"3,810","4,050",Generalized Linear Models in R,"Generalized Linear Models in R
Linear regression serves as a workhorse of statistics, but cannot handle some types of complex data. A generalized linear model (GLM) expands upon linear regression to include non-normal distributions including binomial and count data. Throughout this course, you will expand your data science toolkit to include GLMs in R. As part of learning about GLMs, you will learn how to fit model binomial data with logistic regression and count data with Poisson regression. You will also learn how to understand these results and plot them with ggplot2.
This chapter teaches you how generalized linear models are an extension of other models in your data science toolbox. The chapter also uses Poisson regression to introduce generalize linear models.
This chapter covers running a logistic regression and examining the model outputs.
This chapter teaches you about interpreting GLM coefficients and plotting GLMs using ggplot2.
In this chapter, you will learn how to do multiple regression with GLMs in R.",[],"['Richard Erickson', 'Chester Ismay', 'David Campos', 'Shon Inouye']","[('Bus Commuter dataset', 'https://assets.datacamp.com/production/repositories/2698/datasets/e368234a66bbabc19b8da1fb42d3e1027508d710/busData.csv')]",['Multiple and Logistic Regression'],https://www.datacamp.com/courses/generalized-linear-models-in-r,Probability & Statistics,R
117,HR Analytics in Python: Predicting Employee Churn,4,14,44,"3,489","3,500",HR Analytics : Predicting Employee Churn,"HR Analytics in Python: Predicting Employee Churn
Among all of the business domains, HR is still the least disrupted. However, the latest developments in data collection and analysis tools and technologies allow for data driven decision-making in all dimensions, including HR. This course will provide a solid basis for dealing with employee data and developing a predictive model to analyze employee turnover.
In this chapter you will learn about the problems addressed by HR analytics, as well as will explore a sample HR dataset that will further be analyzed. You will describe and visualize some of the key variables, transform and manipulate the dataset to make it ready for analytics.
This chapter introduces one of the most popular classification techniques: the Decision Tree. You will use it to develop an algorithm that predicts employee turnover.
Here, you will learn how to evaluate a model and understand how ""good"" it is. You will compare different trees to choose the best among them.
In this final chapter, you will learn how to use cross-validation to avoid overfitting the training data. You will also learn how to know which features are impactful, and which are negligible. Finally, you will use these newly acquired skills to build a better performing Decision Tree!",[],"['Hrant Davtyan', 'Lore Dirick', 'Nick Solomon']","[('Employee turnover data', 'https://assets.datacamp.com/production/repositories/1765/datasets/ae888d00f9b36dd7d50a4afbc112761e2db766d2/turnover.csv')]",['Intermediate Python for Data Science'],https://www.datacamp.com/courses/hr-analytics-in-python-predicting-employee-churn,Machine Learning,Python
118,Hierarchical and Mixed Effects Models,4,14,54,"5,422","4,600",Hierarchical and Mixed Effects Models,"Hierarchical and Mixed Effects Models
This course begins by reviewing slopes and intercepts in linear regressions before moving on to random-effects. You'll learn what a random effect is and how to use one to model your data. Next, the course covers linear mixed-effect regressions. These powerful models will allow you to explore data with a more complicated structure than a standard linear regression. The course then teaches generalized linear mixed-effect regressions. Generalized linear mixed-effects models allow you to model more kinds of data, including binary responses and count data. Lastly, the course goes over repeated-measures analysis as a special case of mixed-effect modeling. This kind of data appears when subjects are followed over time and measurements are collected at intervals. Throughout the course you'll work with real data to answer interesting questions using mixed-effects models.
The first chapter provides an example of when to use a mixed-effect and also describes the parts of a regression. The chapter also examines a a student test-score dataset with a nested structure to demonstrate mixed-effects.
This chapter providers an introduction to linear mixed-effects models. It covers different types of random-effects, describes how to understand the results for linear mixed-effects models, and goes over different methods for statistical inference with mixed-effects models using crime data from Maryland.
This chapter extends linear mixed-effects models to include non-normal error terms using generalized linear mixed-effects models. By altering the model to include a non-normal error term, you are able to model more kinds of data with non-linear responses. After reviewing generalized linear models, the chapter examines binomial data and count data in the context of mixed-effects models.
This chapter shows how repeated-measures analysis is a special case of mixed-effect modeling. The chapter begins by reviewing paired t-tests and repeated measures ANOVA. Next, the chapter uses a linear mixed-effect model to examine sleep study data. Lastly, the chapter uses a generalized linear mixed-effect model to examine hate crime data from New York state through time.",[],"['Richard Erickson', 'Chester Ismay', 'Nick Solomon']","[('Illinois chlamydia data', 'https://assets.datacamp.com/production/repositories/1803/datasets/612bd6490500636efa74132bfbc37817f250cb5a/ILdata.csv'), ('Maryland crime data', 'https://assets.datacamp.com/production/repositories/1803/datasets/e5e076efd3c3b7665a3180da9f95aaaf671f6a61/MDcrime.csv'), ('Classroom data', 'https://assets.datacamp.com/production/repositories/1803/datasets/975fe2b0190804d854a5da90083364629fb6af2e/classroom.csv'), ('Birth rate data', 'https://assets.datacamp.com/production/repositories/1803/datasets/eb95cb6973afa56c38ba53cfd8058c72f768322f/countyBirthsDataUse.csv'), ('New York hate crime data', 'https://assets.datacamp.com/production/repositories/1803/datasets/45e88fe1bc8d1d76d140e69cb873da9eddb7008e/hateNY.csv')]",['Generalized Linear Models in R'],https://www.datacamp.com/courses/hierarchical-and-mixed-effects-models,Probability & Statistics,R
119,Hierarchical and Recursive Queries in SQL Server,4,13,47,415,"3,800",Hierarchical and Recursive Queries in SQL Server,"Hierarchical and Recursive Queries in SQL Server
Do you want to query complex data structures in an iterative way? Do you have access to hierarchical data structures that need to be queried? This course will teach you the tools required to solve these questions. You will learn how to write recursive queries and query hierarchical data structures. To do this, you will use Common Table Expressions (CTE) and the recursion principle on a wide variety of datasets. You will, for example, dig into a flight plan dataset and learn how to find the best and cheapest connection between two airports. After completing this course, you will understand the principle of recursion, and be able to identify and create hierarchical data models.
In this chapter, you will learn about recursion and why it is beneficial to apply this technique. You will also refresh your knowledge about Common Expression Tables (CTE).
In this chapter, you will learn about recursive CTEs, how to query hierarchical datasets, and finally, how to apply recursive CTEs on hierarchical data.
In this chapter, you will learn how to create and modify database tables. You will learn about relational and hierarchical data models, how they differ, and when each model should be used.
In this chapter, you will practice your learnings about hierarchical and recursive querying on real-world problems, such as finding possible flight routes, assembling a car, and modeling a power grid.",[],"['Dominik Egarter', 'Mona Khalil', 'Sara Billen']",[],['Intermediate SQL Server'],https://www.datacamp.com/courses/hierarchical-and-recursive-queries-in-sql-server,Reporting,SQL
120,Human Resources Analytics in R: Exploring Employee Data,5,16,60,"5,614","4,750",Human Resources Analytics in R: Exploring Employee Data,"Human Resources Analytics in R: Exploring Employee Data
HR analytics, people analytics, workforce analytics -- whatever you call it, businesses are increasingly counting on their human resources departments to answer questions, provide insights, and make recommendations using data about their employees. In this course, you'll learn how to manipulate, visualize, and perform statistical tests on HR data through a series of HR analytics case studies.
In this chapter, you will get an introduction to how data science is used in a human resources context. Then you will dive into a case study where you'll analyze and visualize recruiting data to determine which source of new candidates ultimately produces the best new hires. The dataset you'll use in this and the other chapters in this course is synthetic, to maintain the privacy of actual employees.
Gallup defines engaged employees as those who are involved in, enthusiastic about and committed to their work and workplace. There is disagreement about the strength of the connection between employee engagement and business outcomes, but the idea is that employees that are more engaged will be more productive and stay with the organization longer. In this chapter, you'll look into potential reasons that one department's engagement scores are lower than the rest.
When employers make a new hire, they must determine what the new employee will be paid. If the employer is not careful, the new hires can come in with a higher salary than the employees that currently work at the same job, which can cause employee turnover and dissatisfaction. In this chapter, you will check whether new hires are really getting paid more than current employees, and how to double-check your initial observations.
Performance management helps an organization keep track of which employees are providing extra value, or below-average value, and compensating them accordingly. Whether performance is a rating or the result of a questionnaire, whether employees are rated each year or more often than that, the process is somewhat subjective. An organization should check that ratings are being given with regard to performance, and not individual managers' preferences, or even biases (conscious or subconscious).
In many industries, workplace safety is a critical consideration. Maintaining a safe workplace provides employees with confidence and reduces costs for workers' compensation and legal liabilities. In this chapter, you'll look for explanations for an increase in workplace accidents.",[],"['Ben Teusch', 'Richie Cotton', 'Sumedh Panchadhar']","[('Recruitment data', 'https://assets.datacamp.com/production/course_5977/datasets/recruitment_data.csv'), ('Survey data', 'https://assets.datacamp.com/production/course_5977/datasets/survey_data.csv'), ('Fair pay data', 'https://assets.datacamp.com/production/course_5977/datasets/fair_pay_data.csv'), ('Performance data', 'https://assets.datacamp.com/production/course_5977/datasets/performance_data.csv'), ('HR data', 'https://assets.datacamp.com/production/course_5977/datasets/hr_data.csv'), ('Accident data', 'https://assets.datacamp.com/production/course_5977/datasets/accident_data.csv'), ('HR data (2)', 'https://assets.datacamp.com/production/course_5977/datasets/hr_data_2.csv'), ('Survey data (2)', 'https://assets.datacamp.com/production/course_5977/datasets/survey_data_2.csv')]","['Introduction to the Tidyverse', 'Correlation and Regression']",https://www.datacamp.com/courses/human-resources-analytics-in-r-exploring-employee-data,Case Studies,R
121,Human Resources Analytics in R: Predicting Employee Churn,4,14,50,"1,917","4,000",Human Resources Analytics in R: Predicting Employee Churn,"Human Resources Analytics in R: Predicting Employee Churn
Organizational growth largely depends on staff retention. Losing employees frequently impacts the morale of the organization and hiring new employees is more expensive than retaining existing ones. Good news is that organizations can increase employee retention using data-driven intervention strategies. This course focuses on data acquisition from multiple HR sources, exploring and deriving new features, building and validating a logistic regression model, and finally, show how to calculate ROI for a potential retention strategy.
This chapter begins with a general introduction to employee churn/turnover and reasons for turnover as shared by employees. You will learn how to calculate turnover rate and explore turnover rate across different dimensions. You will also identify talent segments for your analysis and bring together relevant data from multiple HR data sources to derive more useful insights.
In this chapter, you will create new variables from existing data to explain employee turnover. You will analyze compensation data and create compa-ratio to measure pay equity of all employees. To identify the most important variables influencing turnover, you will use the concept of Information Value (IV).
In this chapter, you will build a logistic regression model to predict turnover by taking into account multicollinearity among variables.
In this chapter, you will calculate the accuracy of your model and categorize employees into specific risk buckets. You will then formulate an intervention strategy and calculate the ROI for this strategy.",[],"['Abhishek Trehan', 'Anurag Gupta', 'Richie Cotton', 'Sumedh Panchadhar']","[('Employee data', 'https://assets.datacamp.com/production/repositories/1746/datasets/ed764d8978ecdf6d91d2d3f0b5f1efcffe5cb7ec/employee_data.zip')]",['Human Resources Analytics in R: Exploring Employee Data'],https://www.datacamp.com/courses/human-resources-analytics-in-r-predicting-employee-churn,,
122,Hyperparameter Tuning in Python,4,13,44,653,"3,400",Hyperparameter Tuning,"Hyperparameter Tuning in Python
Building powerful machine learning models depends heavily on the set of hyperparameters used. But with increasingly complex models with lots of options, how do you efficiently find the best settings for your particular problem? In this course you will get practical experience in using some common methodologies for automated hyperparameter tuning in Python using Scikit Learn. These include Grid Search, Random Search & advanced optimization methodologies including Bayesian & Genetic algorithms . You will use a dataset predicting credit card defaults as you build skills to dramatically increase the efficiency and effectiveness of your machine learning model building.
In this introductory chapter you will learn the difference between hyperparameters and parameters. You will practice extracting and analyzing parameters, setting hyperparameter values for several popular machine learning algorithms. Along the way you will learn some best practice tips & tricks for choosing which hyperparameters to tune and what values to set & build learning curves to analyze your hyperparameter choices.
This chapter introduces you to a popular automated hyperparameter tuning methodology called Grid Search. You will learn what it is, how it works and practice undertaking a Grid Search using Scikit Learn. You will then learn how to analyze the output of a Grid Search & gain practical experience doing this.
In this chapter you will be introduced to another popular automated hyperparameter tuning methodology called Random Search. You will learn what it is, how it works and importantly how it differs from grid search. You will learn some advantages and disadvantages of this method and when to choose this method compared to Grid Search. You will practice undertaking a Random Search with Scikit Learn as well as visualizing & interpreting the output.
In this final chapter you will be given a taste of more advanced hyperparameter tuning methodologies known as ''informed search''. This includes a methodology known as Coarse To Fine as well as Bayesian & Genetic hyperparameter tuning algorithms. You will learn how informed search differs from uninformed search and gain practical skills with each of the mentioned methodologies, comparing and contrasting them as you go.",[],"['Alex Scriven', 'Hadrien Lacroix', 'Chester Ismay']","[('Credit Card Defaults', 'https://assets.datacamp.com/production/repositories/3983/datasets/bb158f1c76682286f938e02d71de21a3e5389cbf/credit-card-full.csv')]","['Intermediate Python for Data Science', 'Supervised Learning with scikit-learn']",https://www.datacamp.com/courses/hyperparameter-tuning-in-python,Machine Learning,Python
123,Hyperparameter Tuning in R,4,14,47,"2,055","3,500",Hyperparameter Tuning in R,"Hyperparameter Tuning in R
For many machine learning problems, simply running a model out-of-the-box and getting a prediction is not enough; you want the best model with the most accurate prediction. One way to perfect your model is with hyperparameter tuning, which means optimizing the settings for that specific model. In this course, you will work with the caret, mlr and h2o packages to find the optimal combination of hyperparameters in an efficient manner using grid search, random search, adaptive resampling and automatic machine learning (AutoML). Furthermore, you will work with different datasets and tune different supervised learning models, such as random forests, gradient boosting machines, support vector machines, and even neural nets. Get ready to tune!
Why do we use the strange word ""hyperparameter""? What makes it hyper? Here, you will understand what model parameters are, and why they are different from hyperparameters in machine learning. You will then see why we would want to tune them and how the default setting of caret automatically includes hyperparameter tuning.
In this chapter, you will learn how to tune hyperparameters with a Cartesian grid. Then, you will implement faster and more efficient approaches. You will use Random Search and adaptive resampling to tune the parameter grid, in a way that concentrates on values in the neighborhood of the optimal settings.
Here, you will use another package for machine learning that has very convenient hyperparameter tuning functions. You will define a Cartesian grid or perform Random Search, as well as advanced techniques. You will also learn different ways to plot and evaluate models with different hyperparameters.
In this final chapter, you will use h2o, another package for machine learning with very convenient hyperparameter tuning functions. You will use it to train different models and define a Cartesian grid. Then, You will implement a Random Search use stopping criteria. Finally, you will learn AutoML, an h2o interface which allows for very fast and convenient model and hyperparameter tuning with just one function.",[],"['Shirin Elsinghorst (formerly Glander)', 'Chester Ismay', 'Hadrien Lacroix']","[('Bc train data', 'https://assets.datacamp.com/production/course_6650/datasets/bc_train_data.csv'), ('Breast cancer data', 'https://assets.datacamp.com/production/course_6650/datasets/breast_cancer_data.csv')]","['Introduction to the Tidyverse', 'Supervised Learning in R: Classification', 'Machine Learning Toolbox']",https://www.datacamp.com/courses/hyperparameter-tuning-in-r,Machine Learning,R
124,Image Processing in Python,4,16,54,562,"4,450",Image Processing,"Image Processing in Python
Images are everywhere! We live in a time where images contain lots of information, which is sometimes difficult to obtain. This is why image pre-processing has become a highly valuable skill, applicable in many use cases. In this course, you will learn to process, transform, and manipulate images at your will, even when they come in thousands. You will also learn to restore damaged images, perform noise reduction, smart-resize images, count the number of dots on a dice, apply facial detection, and much more, using scikit-image. After completing this course, you will be able to apply your knowledge to different domains such as machine learning and artificial intelligence, machine and robotic vision, space and medical image analysis, retailing, and many more. Take the step and dive into the wonderful world that is computer vision!
Jump into digital image structures and learn to process them! Extract data, transform and analyze images using NumPy and Scikit-image.
With just a few lines of code, you will convert RGB images to grayscale, get data from them, obtain histograms containing very useful information, and separate objects from the background!
You will learn to detect object shapes using edge detection filters, improve medical images with contrast enhancement and even enlarge pictures to five times its original size!
You will also apply morphology to make thresholding more accurate when segmenting images and go to the next level of processing images with Python.
So far, you have done some very cool things with your image processing skills!
In this chapter, you will apply image restoration to remove objects, logos, text, or damaged areas in pictures!
You will also learn how to apply noise, use segmentation to speed up processing, and find elements in images by their contours.
After completing this chapter, you will have a deeper knowledge of image processing as you will be able to detect edges, corners, and even faces! You will learn how to detect not just front faces but also face profiles, cat, or dogs. You will apply your skills to more complex real-world applications.
Learn to master several widely used image processing techniques with very few lines of code!",[],"['Rebeca Saraí González Guerra', 'Hillary Green-Lerman', 'Sara Billen']","[('Images', 'https://assets.datacamp.com/production/repositories/4470/datasets/44adb5b3c76caece2225b30f7660c5e50508d2ee/Image Processing with Python course exercise dataset.zip')]",['Python Data Science Toolbox (Part 2)'],https://www.datacamp.com/courses/image-processing-in-python,Data Visualization,Python
125,Importing & Managing Financial Data in Python,5,16,53,"19,853","4,350",Importing & Managing Financial Data,"Importing & Managing Financial Data in Python
If you want to apply your new 'Python for Data Science' skills to real-world financial data, then this course will give you some very valuable tools.
First, you will learn how to get data out of Excel into pandas and back. Then, you will learn how to pull stock prices from various online APIs like
Google or Yahoo! Finance, macro data from the Federal Reserve, and exchange rates from OANDA. Finally, you will learn how to calculate returns for various time horizons,
analyze stock performance by sector for IPOs, and calculate and summarize correlations.
In this chapter, you will learn how to import, clean and combine data from Excel workbook sheets into a pandas DataFrame. You will also practice grouping data, summarizing information for categories, and visualizing the result using subplots and heatmaps.
You will use data on companies listed on the stock exchanges NASDAQ, NYSE, and AMEX with information on company name, stock symbol, last market capitalization and price, sector or industry group, and IPO year. In Chapter 2, you will build on this data to download and analyze stock price history for some of these companies.
This chapter introduces online data access to Google Finance and the Federal Reserve Data Service through the `pandas` `DataReader`. You will pull data, perform basic manipulations, combine data series, and visualize the results.
In this chapter, you will learn how to capture key characteristics of individual variables in simple metrics. As a result, it will be easier to understand the distribution of the variables in your data set: Which values are central to, or typical of your data? Is your data widely dispersed, or rather narrowly distributed around some mid point? Are there outliers? What does the overall distribution look like?
This chapter introduces the ability to group data by one or more categorical variables, and to calculate and visualize summary statistics for each caategory. In the process, you will learn to compare company statistics for different sectors and IPO vintages, analyze the global income distribution over time, and learn how to create various statistical charts from the seaborn library.",[],"['Stefan Jansen', 'Lore Dirick']","[('Amex listings .csv file', 'https://assets.datacamp.com/production/repositories/993/datasets/2bd8d6c19608fc6f3facbac31021d26a2fcac42f/amex-listings.csv'), ('Income growth .csv file', 'https://assets.datacamp.com/production/repositories/993/datasets/5c79f58382e658b649ed45070976f3c815b69307/income_growth.csv'), ('Listings .xlsx file', 'https://assets.datacamp.com/production/repositories/993/datasets/2dad49608ef2966bcd0fb209bafb3b365271c7c2/listings.xlsx'), ('Nasdaq listings .csv file', 'https://assets.datacamp.com/production/repositories/993/datasets/3e432c89aa85b5782fb16bdc8f16de01699d9885/nasdaq-listings.csv'), ('Per capita income .csv file', 'https://assets.datacamp.com/production/repositories/993/datasets/488a0764add121948fcdc683d29659425f39bfa4/per_capita_income.csv')]","['Introduction to Python', 'Intermediate Python for Data Science']",https://www.datacamp.com/courses/importing-managing-financial-data-in-python,Applied Finance,Python
126,Importing Data in Python (Part 1),3,15,54,"117,438","4,150",Importing Data,"Importing Data in Python (Part 1)
As a data scientist, you will need to clean data, wrangle and munge it, visualize it, build predictive models, and interpret these models. Before you can do so, however, you will need to know how to get data into Python. In this course, you'll learn the many ways to import data into Python: from flat files such as .txt and .csv; from files native to other software such as Excel spreadsheets, Stata, SAS, and MATLAB files; and from relational databases such as SQLite and PostgreSQL.
In this chapter, you'll learn how to import data into Python from all types of flat files, which are a simple and prevalent form of data storage. You've previously learned how to use NumPy and pandas—you will learn how to use these packages to import flat files and customize your imports.
You've learned how to import flat files, but there are many other file types you will potentially have to work with as a data scientist. In this chapter, you'll learn how to import data into Python from a wide array of important file types. These include pickled files, Excel spreadsheets, SAS and Stata files, HDF5 files, a file type for storing large quantities of numerical data, and MATLAB files.
In this chapter, you'll learn how to extract meaningful data from relational databases, an essential skill for any data scientist. You will learn about relational models, how to create SQL queries, how to filter and order your SQL records, and how to perform advanced queries by joining database tables.","['Data Analyst with Python', 'Data Scientist with Python', 'Importing & Cleaning Data with Python', 'Python Programmer']","['Hugo Bowne-Anderson', 'Francisco Castro']","[('Chinook (SQLite)', 'https://assets.datacamp.com/production/repositories/487/datasets/ec8aa8bc9ffea6b4e2729e1a0a2d4aea2f300b3a/Chinook.sqlite'), ('LIGO (HDF5)', 'https://assets.datacamp.com/production/repositories/487/datasets/ab9107b749b832daada36bfaa718d9a591a0d69c/L-L1_LOSC_4_V1-1126259446-32.hdf5'), ('Battledeath (XLSX)', 'https://assets.datacamp.com/production/repositories/487/datasets/5e8897e4624f8577ed0d33aeafbe7bd88bfc424b/battledeath.xlsx'), ('Extent of infectious diseases (DTA)', 'https://assets.datacamp.com/production/repositories/487/datasets/c4129edae533cf2683d8995f6dcdbcf5f41520ba/disarea.dta'), ('Gene expressions (MATLAB)', 'https://assets.datacamp.com/production/repositories/487/datasets/2fc0beea2d8cc7c93d79e79344a6e9e66f65d1fe/ja_data2.mat'), ('MNIST', 'https://assets.datacamp.com/production/repositories/487/datasets/d6d1b84ef06151ff913b4173e2eca8e6d5fa959b/mnist_kaggle_some_rows.csv'), ('Sales (SAS7BDAT)', 'https://assets.datacamp.com/production/repositories/487/datasets/0300d44b3ac77accc4b9706af86e33037bda6861/sales.sas7bdat'), ('Seaslugs', 'https://assets.datacamp.com/production/repositories/487/datasets/07cd090cb965782011a76af72c16b400a5ca5cc0/seaslug.txt'), ('Titanic', 'https://assets.datacamp.com/production/repositories/487/datasets/be79810c4288801167cfb31dbedd396559816ade/titanic_sub.csv')]","['Introduction to Python', 'Intermediate Python for Data Science']",https://www.datacamp.com/courses/importing-data-in-python-part-1,Importing & Cleaning Data,Python
127,Importing Data in Python (Part 2),2,7,29,"68,072","2,400",Importing Data,"Importing Data in Python (Part 2)
As a data scientist, you will need to clean data, wrangle and munge it, visualize it, build predictive models and interpret these models. Before you can do so, however, you will need to know how to get data into Python. In the prequel to this course, you learned many ways to import data into Python: from flat files such as .txt and .csv; from files native to other software such as Excel spreadsheets, Stata, SAS, and MATLAB files; and from relational databases such as SQLite and PostgreSQL. In this course, you'll extend this knowledge base by learning to import data from the web and by pulling data from Application Programming Interfaces— APIs—such as the Twitter streaming API, which allows us to stream real-time tweets.
The web is a rich source of data from which you can extract various types of insights and findings. In this chapter, you will learn how to get data from the web, whether it is stored in files or in HTML. You'll also learn the basics of scraping and parsing web data.
In this chapter, you will gain a deeper understanding of how to import data from the web. You will learn the basics of extracting data from APIs, gain insight on the importance of APIs, and practice extracting data by diving into the OMDB and Library of Congress APIs.
In this chapter, you will consolidate your knowledge of interacting with APIs in a deep dive into the Twitter streaming API. You'll learn how to stream real-time Twitter data, and how to analyze and visualize it.","['Data Analyst with Python', 'Data Scientist with Python', 'Importing & Cleaning Data with Python', 'Python Programmer']","['Hugo Bowne-Anderson', 'Francisco Castro']","[('Latitudes (XLS)', 'https://assets.datacamp.com/production/repositories/488/datasets/b422ace2fceada7b569e0ba3e8d833fddc684c4d/latitude.xls'), ('Tweets', 'https://assets.datacamp.com/production/repositories/488/datasets/3ef452f83a91556ea4284624b969392c0506fb33/tweets3.txt'), ('Red wine quality', 'https://assets.datacamp.com/production/repositories/488/datasets/013936d2700e2d00207ec42100d448c23692eb6f/winequality-red.csv')]","['Introduction to Python', 'Intermediate Python for Data Science', 'Importing Data in Python (Part 1)']",https://www.datacamp.com/courses/importing-data-in-python-part-2,Importing & Cleaning Data,Python
128,Importing Data in R (Part 1),3,11,42,"92,765","3,550",Importing Data in R,"Importing Data in R (Part 1)
Importing data into R should be the easiest step in your analysis. Unfortunately, that is almost never the case. Data can come in many formats, ranging from .csv and text files, to statistical software files, to databases and HTML data. Knowing which approach to use is key to getting started with the actual analysis.
In this course, you’ll start by learning how to read .csv and text files in R. You will then cover the readr and data.table packages to easily and efficiently import flat file data. After that, you will learn how to read .xls files in R using readxl and gdata.
A lot of data comes in the form of flat files: simple tabular text files. Learn how to import the common formats of flat file data with base R functions.
In addition to base R, there are dedicated packages to easily and efficiently import flat file data. We'll talk about two such packages: readr and data.table.
Excel is a widely used data analysis tool. If you prefer to do your analyses in R, though, you'll need an understanding of how to import .csv data into R. This chapter will show you how to use readxl and gdata to do so.
Beyond importing data from Excel, you can take things one step further with XLConnect. Learn all about it and bridge the gap between R and Excel.","['Data Analyst with R', 'Data Scientist with R', 'Importing & Cleaning Data with R']",['Filip Schouwenaars'],"[('Hotdogs', 'https://assets.datacamp.com/production/repositories/453/datasets/3e5a732b4467c1cbed6a8e8e7a1c9eec3fc86c58/hotdogs.txt'), ('Potatoes (CSV)', 'https://assets.datacamp.com/production/repositories/453/datasets/b47d250de5379914100e28075556fb24e55ca2cd/potatoes.csv'), ('Potatoes (TSV)', 'https://assets.datacamp.com/production/repositories/453/datasets/d78f476c64cf9bc91d4467ff64769afd64d4b450/potatoes.txt'), ('Swimming pools', 'https://assets.datacamp.com/production/repositories/453/datasets/0badb39b50c7daf000698efbca476716db7c1a6f/swimming_pools.csv'), ('Urban population (XLS)', 'https://assets.datacamp.com/production/repositories/453/datasets/ae595b67772d71e79ea9c25897192ba49dcb2b81/urbanpop.xls'), ('Urban population (XLSX)', 'https://assets.datacamp.com/production/repositories/453/datasets/775623dcd2ee9b07bff5b034edba3137bb24b748/urbanpop.xlsx')]",['Introduction to R'],https://www.datacamp.com/courses/importing-data-in-r-part-1,Importing & Cleaning Data,R
129,Importing Data in R (Part 2),3,10,48,"43,872","3,950",Importing Data in R,"Importing Data in R (Part 2)
Many companies store their information in relational databases. The R community has also developed R packages to get data from these architectures. You'll learn how to connect to a database and how to retrieve data from it.
Importing an entire table from a database while you might only need a tiny bit of information seems like a lot of unncessary work. In this chapter, you'll learn about SQL queries, which will help you make things more efficient by performing some computations on the database side.
More and more of the information that data scientists are using resides on the web. Importing this data into R requires an understanding of the protocols used on the web. In this chapter, you'll get a crash course in HTTP and learn to perform your own HTTP requests from inside R.
Importing data from the web is one thing; actually being able to extract useful information is another. Learn more about the JSON format to get one step closer to web domination.
Next to R, there are also other commonly used statistical software packages: SAS, STATA and SPSS. Each of them has their own file format. Learn how to use the haven and foreign packages to get them into R with remarkable ease!","['Data Analyst with R', 'Data Scientist with R', 'Importing & Cleaning Data with R']",['Filip Schouwenaars'],"[('Education equality data', 'https://assets.datacamp.com/production/repositories/454/datasets/c326824a049fa32779b2e06a0b3cab25c0055716/edequality.dta'), ('Employee data', 'https://assets.datacamp.com/production/repositories/454/datasets/7d9358f29b6b1a50c641ca11192d7ca383f7a19f/employee.sav'), ('Florida election data', 'https://assets.datacamp.com/production/repositories/454/datasets/3d5db3972c085c8f9bb99239ddd78f60aeff8300/florida.dta'), ('International socio-economic data', 'https://assets.datacamp.com/production/repositories/454/datasets/9a7178d07a670ab0dd88aa1f4d9d806948acdd43/international.sav'), ('Latitude (XLS)', 'https://assets.datacamp.com/production/repositories/454/datasets/b422ace2fceada7b569e0ba3e8d833fddc684c4d/latitude.xls'), ('Latitude (XLSX)', 'https://assets.datacamp.com/production/repositories/454/datasets/257641a69f9f56700a11be661315e285b6e61091/latitude.xlsx'), ('Big Five data', 'https://assets.datacamp.com/production/repositories/454/datasets/8919ed67a6692ad4474df6414a39f9749b24278e/person.sav'), ('Potatoes', 'https://assets.datacamp.com/production/repositories/454/datasets/3c295cdad28103efca12907eddda0acb15d2a2b8/potatoes.txt'), ('Sales data', 'https://assets.datacamp.com/production/repositories/454/datasets/1ce18d1211c51ef3d083d4e2881c9c056eada5ed/sales.sas7bdat'), ('Swimming pools', 'https://assets.datacamp.com/production/repositories/454/datasets/0badb39b50c7daf000698efbca476716db7c1a6f/swimming_pools.csv'), ('Sugar import data', 'https://assets.datacamp.com/production/repositories/454/datasets/fe0bdbfa768a4dc8ee6414fc40139bf47b60a7fb/trade.dta'), ('Water data', 'https://assets.datacamp.com/production/repositories/454/datasets/c189c407928639e85031c42483743f7edd2d6111/water.csv'), ('Wine data', 'https://assets.datacamp.com/production/repositories/454/datasets/f62786f0dab58bedeefe6af6ee9250a8cd8daa35/wine.RData')]",['Importing Data in R (Part 1)'],https://www.datacamp.com/courses/importing-data-in-r-part-2,Importing & Cleaning Data,R
130,Importing and Managing Financial Data in R,5,15,57,"9,985","4,850",Importing and Managing Financial Data in R,"Importing and Managing Financial Data in R
If you've ever done anything with financial or economic time series, you know the data come in various shapes, sizes, and periodicities. Getting the data into R can be stressful and time-consuming, especially when you need to merge data from several different sources into one data set. This course will cover importing data from local files as well as from internet sources.
A wealth of financial and economic data are available online. Learn how getSymbols() and Quandl() make it easy to access data from a variety of sources.
You've learned how to import data from online sources, now it's time to see how to extract columns from the imported data. After you've learned how to extract columns from a single object, you will explore how to import, transform, and extract data from multiple instruments.
Learn how to simplify and streamline your workflow by taking advantage of the ability to customize default arguments to `getSymbols()`. You will see how to customize defaults by data source, and then how to customize defaults by symbol. You will also learn how to handle problematic instrument symbols.
You've learned how to import, extract, and transform data from multiple data sources. You often have to manipulate data from different sources in order to combine them into a single data set. First, you will learn how to convert sparse, irregular data into a regular series. Then you will review how to aggregate dense data to a lower frequency. Finally, you will learn how to handle issues with intra-day data.
You've learned the core workflow of importing and manipulating financial data. Now you will see how to import data from text files of various formats. Then you will learn how to check data for weirdness and handle missing values. Finally, you will learn how to adjust stock prices for splits and dividends.","['Finance Basics with R', 'Quantitative Analyst with R']","['Joshua Ulrich', 'Lore Dirick', 'Davis Vaughan']","[('Amazon CSV file', 'https://assets.datacamp.com/production/repositories/389/datasets/ce26cee08d14cb53379495add3045ed98b5e3c66/AMZN.csv'), ('DC data', 'https://assets.datacamp.com/production/repositories/389/datasets/04adfd61735fae4293df91302c83e4fa77ee1a59/DC.RData'), ('UNE CSV file', 'https://assets.datacamp.com/production/repositories/389/datasets/69be01cc2dc9342b822c554dddc07d29c270a41f/UNE.csv'), ('two_symbols CSV file', 'https://assets.datacamp.com/production/repositories/389/datasets/6656e8b791f96452761dc709ceca5c0484994ca9/two_symbols.csv')]","['Introduction to R for Finance', 'Intermediate R for Finance', 'Manipulating Time Series Data in R with xts & zoo']",https://www.datacamp.com/courses/importing-and-managing-financial-data-in-r,Applied Finance,R
131,Improving Query Performance in PostgreSQL,4,15,53,854,"4,300",Improving Query Performance in PostgreSQL,"Improving Query Performance in PostgreSQL
Losing time on slow queries? Hesitant to share your queries with more seasoned coworkers? In this course, you will learn how to structure your PostgreSQL to run in a fraction of the time. Exploring intertwined data relating Olympic participation, country climate, and gross domestic product, you will experience firsthand how changes in filtering method and using subqueries impact query performance. You will learn the properties of a row oriented database while also seeing how Hawaii's volcanos impact air quality. Restructuring your queries with the query planner and the SQL order of operations, you will soon be dazzling your coworkers with your effortless efficiency.
Bundle up as you dive into the Winter Olympics! You will learn how to join, subquery, and create temporary tables while finding which Olympic athletes brave sub-freezing temperatures to train. You will also learn about the query planner and how its functionality can guide your SQL structure to faster queries.
Dig up those past algebra memories while learning the SQL order of operations. Find which countries ""should"" have the most athletes by looking at population and gross domestic product (GDP) while learning the best way to filter. You will also learn when your query aggregates (sums, counts, etc.) and how you can structure your query to optimize this process.
Zero in on the properties that improve database performance. Discover when your table is not a table but a view. Learn how your database's storage structure (row or column oriented) impacts your query structure. You will explore volcanic smog while using partitions and indexes to speed your queries.
Learn the lingo of the Query Lifecycle and dive into the query planner. Explore how the query planner creates and optimizes the query plan. Find your next vacation locale by looking for countries with recent population growth while also seeing how a join impacts the query steps. Fine tune your optimization techniques by seeing how different filters speed your query times.",[],"['Amy McCarty', 'Mona Khalil', 'Becca Robins']","[('GDP', 'https://assets.datacamp.com/production/repositories/4297/datasets/f7b2dc67088b46263792d6358b67b2ac6cee1432/population_gdp_transposed.csv'), ('Olympic Athletes', 'https://assets.datacamp.com/production/repositories/4297/datasets/199f66ce5b9d899a2609284547607078f4908990/olympic_athletes_2016_14.csv'), ('Olympic Regions', 'https://assets.datacamp.com/production/repositories/4297/datasets/64e4e1c14554cbb8cd115485d4301f92f1cbbd17/olympic_regions.csv'), ('AQI', 'https://assets.datacamp.com/production/repositories/4297/datasets/ae89ea124b77507cefe318f6499318571e63a88f/annual_aqi_by_county_2018.csv')]","['Joining Data in SQL', 'Intermediate SQL']",https://www.datacamp.com/courses/improving-query-performance-in-postgresql,Data Manipulation,SQL
132,Improving Query Performance in SQL Server,4,16,58,"1,388","4,450",Improving Query Performance in SQL Server,"Improving Query Performance in SQL Server
A mission critical assignment is depending on your SQL coding skills. You’ve been given some code to fix. It is giving the results you need but it’s running too slow, and it’s poorly formatted making it hard to read. The deadline is tomorrow. You’ll need to reformat the code and try different methods to improve performance. The pressure is on!!! In this course we’ll be using SQL on real world datasets, from sports and geoscience, to look at good coding practices and different ways how we can can improve the performance of queries to achieve the same outcome.
In this chapter, students will learn how SQL code formatting, commenting, and aliasing is used to make queries easy to read and understand. Students will also be introduced to query processing order in the database versus the order of the SQL syntax in a query.
This chapter introduces filtering with WHERE and HAVING and some best practices for how (and how not) to use these keywords. Next, it explains the methods used to interrogate data and the effects these may have on performance. Finally, the chapter goes over the roles of DISTINCT() and UNION in removing duplicates and their potential effects on performance.
This chapter is an introduction to sub-queries and their potential impacts on query performance. It also examines the different methods used to determine if the data in one table is present, or absent, in a related table.
Students are introduced to how STATISTICS TIME, STATISTICS IO, indexes, and executions plans can be used in SQL Server to help analyze and tune query performance.",[],"['Dean Smith', 'Mona Khalil', 'Becca Robins', 'Marianna Lamnina']","[('Orders dataset', 'https://assets.datacamp.com/production/repositories/4005/datasets/751fbc814728455952b3f12df8d4bd90abf4696b/Orders.csv'), ('NBAPlayers dataset', 'https://assets.datacamp.com/production/repositories/4005/datasets/f7dc8389514bc1d366e380d76bffc6bfc9be179b/NBAPlayers.csv'), ('NBATeams dataset', 'https://assets.datacamp.com/production/repositories/4005/datasets/c70021a4a78c360198ada1231b45a8521aced7f5/NBATeams.csv'), ('NBAPlayersStatistics dataset', 'https://assets.datacamp.com/production/repositories/4005/datasets/de2a75358e326167eec3a0077dafac33b0f204f5/NBAPlayerStatistics.csv')]",['Intermediate SQL Server'],https://www.datacamp.com/courses/improving-query-performance-in-sql-server,Data Manipulation,SQL
133,Improving Your Data Visualizations in Python,4,15,54,"3,100","4,650",Improving Your Data Visualizations,"Improving Your Data Visualizations in Python
Great data visualization is the cornerstone of impactful data science. Visualization helps you to both find insight in your data and share those insights with your audience. Everyone learns how to make a basic scatter plot or bar chart on their journey to becoming a data scientist, but the true potential of data visualization is realized when you take a step back and think about what, why, and how you are visualizing your data. In this course you will learn how to construct compelling and attractive visualizations that help you communicate the results of your analyses efficiently and effectively. We will cover comparing data, the ins and outs of color, showing uncertainty, and how to build the right visualization for your given audience through the investigation of a datasets on air pollution around the US and farmer's markets. We will finish the course by examining open-access farmers market data to build a polished and impactful visual report.
How do you show all of your data while making sure that viewers don't miss an important point or points? Here we discuss how to guide your viewer through the data with color-based highlights and text. We also introduce a dataset on common pollutant values across the United States.
Color is a powerful tool for encoded values in data visualization. However, with this power comes danger. In this chapter, we talk about how to choose an appropriate color palette for your visualization based upon the type of data it is showing.
Uncertainty occurs everywhere in data science, but it's frequently left out of visualizations where it should be included. Here, we review what a confidence interval is and how to visualize them for both single estimates and continuous functions. Additionally, we discuss the bootstrap resampling technique for assessing uncertainty and how to visualize it properly.
Often visualization is taught in isolation, with best practices only discussed in a general way. In reality, you will need to bend the rules for different scenarios. From messy exploratory visualizations to polishing the font sizes of your final product; in this chapter, we dive into how to optimize your visualizations at each step of a data science workflow.",['Data Visualization with Python'],"['Nicholas Strayer', 'Hillary Green-Lerman', 'Becca Robins']","[('State populations dataset', 'https://assets.datacamp.com/production/repositories/3841/datasets/f0dbd061f3851ac130cf2f8ad6b3f28f1d19c1fd/census-state-populations.csv'), (""U.S. farmer's markets dataset"", 'https://assets.datacamp.com/production/repositories/3841/datasets/efdbc5d7c7b734f0b091d924605c4ad2664ef830/markets_cleaned.csv'), ('Pollution dataset', 'https://assets.datacamp.com/production/repositories/3841/datasets/a6b11493e11dd47f3e03e0b96e2a2dbc51f03cb2/pollution_wide.csv')]","['Python Data Science Toolbox (Part 1)', 'Python Data Science Toolbox (Part 2)', 'Introduction to Data Visualization with Python', 'Data Visualization with Seaborn']",https://www.datacamp.com/courses/improving-your-data-visualizations-in-python,Data Visualization,Python
134,Inference for Categorical Data,4,14,53,"1,549","4,000",Inference Categorical Data,"Inference for Categorical Data
Categorical data is all around us. It's in the latest opinion polling numbers, in the data that lead to new breakthroughs in genomics, and in the troves of data that internet companies collect to sell products to you. In this course you'll learn techniques for parsing the signal from the noise; tools for identifying when structure in this data represents interesting phenomena and when it is just random noise.
In this chapter you will learn how to perform statistical inference on a single parameter that describes categorical data. This includes both resampling based methods and approximation based methods for a single proportion.
This chapter dives deeper into performing hypothesis tests and creating confidence intervals for a single parameter. Then, you'll learn how to perform inference on a difference between two proportions. Finally, this chapter wraps up with an exploration of what happens when you know the null hypothesis is true.
This part of the course will teach you how to use both resampling methods and classical methods to test for the indepence of two categorical variables. This chapter covers how to perform a Chi-squared test.
The course wraps up with two case studies using election data. Here, you'll learn how to use a Chi-squared test to check goodness-of-fit. You'll study election results from Iran and Iowa and test if Benford's law applies to these datasets.",['Statistical Inference with R'],"['Andrew Bray', 'Nick Solomon', 'Benjamin Feder', 'Jonathan Ng']","[('GSS data', 'https://assets.datacamp.com/production/repositories/1703/datasets/622fb3f93aa52cac9da874699feb95911eba8abd/gss.RData'), ('Iowa election data', 'https://assets.datacamp.com/production/repositories/1703/datasets/3e73a6c4432671bff5e6f05d340ac1ee41f2ba76/iowa.csv'), ('Iran election data', 'https://assets.datacamp.com/production/repositories/1703/datasets/a777b2366f4e576da5d58fda42f8337332acd3ae/iran.csv')]",['Foundations of Inference'],https://www.datacamp.com/courses/inference-for-categorical-data,Probability & Statistics,R
135,Inference for Linear Regression,4,15,59,"4,145","4,650",Inference Linear Regression,"Inference for Linear Regression
Previously, you learned the fundamentals of both statistical inference and linear models; now, the next step is to put them together. This course gives you a chance to think about how different samples can produce different linear models, where your goal is to understand the underlying population model. From the estimated linear model, you will learn how to create interval estimates for the effect size as well as how to determine if the effect is significant. Prediction intervals for the response variable will be contrasted with estimates of the average response. Throughout the course, you'll gain more practice with the dplyr and ggplot2 packages, and you will learn about the broom package for tidying models; all three packages are invaluable in data science.
In the first chapter, you will understand how and why to perform inferential (instead of descriptive only) analysis on a regression model.
In this chapter you will learn about the ideas of the sampling distribution using simulation methods for regression models.
In this chapter you will learn about how to use the t-distribution to perform inference in linear regression models. You will also learn about how to create prediction intervals for the response variable.
Additionally, you will consider the technical conditions that are important when using linear models to make claims about a larger population.
This chapter covers topics that build on the basic ideas of inference in linear models, including multicollinearity and inference for multiple regression models.",['Statistical Inference with R'],"['Jo Hardin', 'Nick Carchedi', 'Nick Solomon']","[('LA home price data', 'https://assets.datacamp.com/production/repositories/848/datasets/96a4003545f7eb48e1c14b855df9a97ab8c84b1d/LAhomes.csv'), ('NYC restaurant data', 'https://assets.datacamp.com/production/repositories/848/datasets/4ff34a40bd4e636556494f83cf40bdc10c33d49e/restNYC.csv'), ('Twin data', 'https://assets.datacamp.com/production/repositories/848/datasets/84f9e42a9041695d790dfe2b5e1b6e22fc3f0118/twins.csv')]","['Foundations of Inference', 'Multiple and Logistic Regression']",https://www.datacamp.com/courses/inference-for-linear-regression,Probability & Statistics,R
136,Inference for Numerical Data,4,15,49,"3,360","3,650",Inference Numerical Data,"Inference for Numerical Data
In this course, you'll learn how to use statistical techniques to make inferences and estimations using numerical data. This course uses two approaches to these common tasks. The first makes use of bootstrapping and permutation to create resample based tests and confidence intervals. The second uses theoretical results and the t-distribution to achieve the same result. You'll learn how (and when) to perform a t-test, create a confidence interval, and do an ANOVA!
In this chapter you'll use bootstrapping techniques to estimate a single parameter from a numerical distribution.
In this chapter you'll use Central Limit Theorem based techniques to estimate a single parameter from a numerical distribution. You will do this using the t-distribution.
In this chapter you'll extend what you have learned so far to use both simulation and CLT based techniques for inference on the difference between two parameters from two independent numerical distributions.
In this chapter you will use ANOVA (analysis of variance) to test for a difference in means across many groups.",['Statistical Inference with R'],"['Mine Cetinkaya-Rundel', 'Nick Carchedi', 'Nick Solomon']","[('Chp1-vid1-boot-dist-noaxes-parantheses', 'https://assets.datacamp.com/production/repositories/846/datasets/dc24f53d92a90863666f2e47827049caff156ccd/chp1-vid1-boot-dist-noaxes-parantheses.png'), ('Chp1-vid1-bootsamp-bootpop.001', 'https://assets.datacamp.com/production/repositories/846/datasets/641ae10cf7121130f50eb499f8daf2a7412c608d/chp1-vid1-bootsamp-bootpop.001.png'), ('Chp1-vid1-manhattan-rents', 'https://assets.datacamp.com/production/repositories/846/datasets/5b08f701debd264bf33d50ca7771617547516948/chp1-vid1-manhattan-rents.png'), ('Chp1-vid2-boot-dist-withaxes', 'https://assets.datacamp.com/production/repositories/846/datasets/28abc3cfc4421c4c000b460783cbadf28b695fa7/chp1-vid2-boot-dist-withaxes.png'), ('Chp1-vid2-perc-method.001', 'https://assets.datacamp.com/production/repositories/846/datasets/b56d4018323b33273a967acf0e0c6c56ad00a10e/chp1-vid2-perc-method.001.png'), ('Chp1-vid2-perc-method.002', 'https://assets.datacamp.com/production/repositories/846/datasets/36b43cc273e9fe611f2dd1a30973e9eda65fa861/chp1-vid2-perc-method.002.png'), ('Chp1-vid3-boot-test.001', 'https://assets.datacamp.com/production/repositories/846/datasets/4b5aa2f1d1d29d48ceb141df373313d2d26854c0/chp1-vid3-boot-test.001.png'), ('Chp3-vid3-hrly-rate-citizen-smaller', 'https://assets.datacamp.com/production/repositories/846/datasets/53b1c749b8fb60eeb1efc53fbbed5ac92d4a2e23/chp3-vid3-hrly-rate-citizen-smaller.png'), ('Chp3-vid3-hrly-rate-citizen', 'https://assets.datacamp.com/production/repositories/846/datasets/53b1c749b8fb60eeb1efc53fbbed5ac92d4a2e23/chp3-vid3-hrly-rate-citizen.png'), ('Chp4-vid1-class-bar', 'https://assets.datacamp.com/production/repositories/846/datasets/88de2cfbf76339e941b124df6bff1ad656f4bca8/chp4-vid1-class-bar.png'), ('Chp4-vid1-wodrsum-hist', 'https://assets.datacamp.com/production/repositories/846/datasets/5f946e0df3d682b4b92db4044cacd7bb08177409/chp4-vid1-wodrsum-hist.png'), ('Gss moredays', 'https://assets.datacamp.com/production/repositories/846/datasets/408f7effbe5b0ef743439636d9aae9aa27a149aa/gss_moredays.csv'), ('GSS data', 'https://assets.datacamp.com/production/repositories/846/datasets/1c0f04aae31ed37d453234a2b373315609492a7e/gss_wordsum_class.csv'), ('Manhattan rent data', 'https://assets.datacamp.com/production/repositories/846/datasets/bd62fb71666052ffe398d85e628eae9d0339c9c4/manhattan.csv'), ('Runners.001', 'https://assets.datacamp.com/production/repositories/846/datasets/7903500ef451067b1df953ec4f340d21beb55e92/runners.001.png'), ('Tdistcomparetonormaldist', 'https://assets.datacamp.com/production/repositories/846/datasets/9ef15957c776618902282d81ba7b9612d8cbbb72/tDistCompareToNormalDist.png')]",['Foundations of Inference'],https://www.datacamp.com/courses/inference-for-numerical-data,Probability & Statistics,R
137,Interactive Data Visualization with Bokeh,4,17,63,"34,569","5,100",Interactive Data Visualization Bokeh,"Interactive Data Visualization with Bokeh
Bokeh is an interactive data visualization library for Python—and other languages—that targets modern web browsers for presentation. It can create versatile, data-driven graphics and connect the full power of the entire Python data science stack to create rich, interactive visualizations.
This chapter provides an introduction to basic plotting with Bokeh. You will create your first plots, learn about different data formats Bokeh understands, and make visual customizations for selections and mouse hovering.
Learn how to combine multiple Bokeh plots into different kinds of layouts on a page, how to easily link different plots together, and how to add annotations such as legends and hover tooltips.
Bokeh server applications allow you to connect all of the powerful Python libraries for data science and analytics, such as NumPy and pandas to create rich, interactive Bokeh visualizations. Learn about Bokeh's built-in widgets, how to add them to Bokeh documents alongside plots, and how to connect everything to real Python code using the Bokeh server.
In this final chapter, you'll build a more sophisticated Bokeh data exploration application from the ground up based on the famous Gapminder dataset.","['Data Scientist with Python', 'Data Visualization with Python']","['Team Anaconda', 'Yashas Roy', 'Hugo Bowne-Anderson']","[('AAPL stock', 'https://assets.datacamp.com/production/repositories/401/datasets/313eb985cce85923756a128e49d7260a24ce6469/aapl.csv'), ('Automobile miles per gallon', 'https://assets.datacamp.com/production/repositories/401/datasets/2a776ae9ef4afc3f3f3d396560288229e160b830/auto-mpg.csv'), ('Gapminder', 'https://assets.datacamp.com/production/repositories/401/datasets/09378cc53faec573bcb802dce03b01318108a880/gapminder_tidy.csv'), ('Blood glucose levels', 'https://assets.datacamp.com/production/repositories/401/datasets/edcedae3825e0483a15987248f63f05a674244a6/glucose.csv'), ('Female literacy and birth rate', 'https://assets.datacamp.com/production/repositories/401/datasets/5aae6591ddd4819dec17e562f206b7840a272151/literacy_birth_rate.csv'), ('Olympic medals (100m sprint)', 'https://assets.datacamp.com/production/repositories/401/datasets/68b7a450b34d1a331d4ebfba22069ce87bb5625d/sprint.csv')]","['Introduction to Python', 'Intermediate Python for Data Science']",https://www.datacamp.com/courses/interactive-data-visualization-with-bokeh,Data Visualization,Python
138,Interactive Data Visualization with plotly in R,4,15,54,"3,085","4,600",Interactive Data Visualization plotly in R,"Interactive Data Visualization with plotly in R
Interactive graphics allow you to manipulate plotted data to gain further insight. As an example, an interactive graphic would allow you to zoom in on a subset of your data without the need to create a new plot. In this course, you will learn how to create and customize interactive graphics in plotly using the R programming language. Along the way, you will review data visualization best practices and be introduced to new plot types such as scatterplot matrices and binned scatterplots.
In this chapter, you will receive an introduction to basic graphics with plotly. You will create your first interactive graphics, displaying both univariate and bivariate distributions. Additionally, you will discover how to easily convert ggplot2 graphics to interactive plotly graphics.
In this chapter, you will learn how to customize the appearance of your graphics and use opacity, symbol, and color to clarify your message. You will also learn how to transform axes, label your axes, and customize the hover information of your graphs.
In this chapter, you move past basic plotly charts to explore more-complex relationships and larger datasets. You will learn how to layer traces, create faceted charts and scatterplot matrices, and create binned scatterplots.
In the final chapter, you use your plotly toolkit to explore the results of the 2018 United States midterm elections, learning how to create maps in plotly along the way.",['Interactive Data Visualization in R'],"['Adam Loy', 'Chester Ismay', 'David Campos', 'Shon Inouye']","[('Video game sales and ratings dataset', 'https://assets.datacamp.com/production/repositories/1792/datasets/2396f3f587e31ea726911e5d8974c5f98db5eee1/vgsales.csv'), ('Wine datasets', 'https://assets.datacamp.com/production/repositories/1792/datasets/df77160d2b3c71dded411ea6ab0910ca0be93045/wine_data.zip'), ('Midterm election datasets', 'https://assets.datacamp.com/production/repositories/1792/datasets/235e75c27821684690bb0ad9f3461b4d7ba89740/election_data.zip')]",['Introduction to the Tidyverse'],https://www.datacamp.com/courses/interactive-data-visualization-with-plotly-in-r,Data Visualization,R
139,Interactive Data Visualization with rbokeh,4,12,47,"1,212","4,000",Interactive Data Visualization rbokeh,"Interactive Data Visualization with rbokeh
Data visualization is an integral part of the data analysis process. This course will get you introduced to rbokeh: a visualization library for interactive web-based plots. You will learn how to use rbokeh layers and options to create effective visualizations that carry your message and emphasize your ideas. We will focus on the two main pieces of data visualization: wrangling data in the appropriate format as well as employing the appropriate visualization tools, charts and options from rbokeh.
In this chapter we get introduced to rbokeh layers. You will learn how to specify data and arguments to create the desired plot and how to combine multiple layers in one figure.
In this chapter you will learn how to customize your rbokeh figures using aesthetic attributes and figure options. You will see how aesthetic attributes such as color, transparancy and shape can serve a purpose and add more info to your visualizations. In addition, you will learn how to activate the tooltip and specify the hover info in your figures.
In this chapter, you will learn how to put your data in the right format to fit the desired figure. And how to transform between the wide and long formats. You will also see how to combine normal layers with regression lines. In addition you will learn how to customize the interaction tools that appear with each figure.
In this chapter you will learn how to combine multiple plots in one layout using grid plots. In addition, you will learn how to create interactive maps.",['Interactive Data Visualization in R'],"['Omayma Said', 'David Campos', 'Shon Inouye']","[('Human Development Index dataset', 'https://assets.datacamp.com/production/repositories/2062/datasets/25d3cc40fcd74d60135d47242462188054e7e6a1/hdi_data.csv'), ('Corruption Perception Index dataset', 'https://assets.datacamp.com/production/repositories/2062/datasets/8487d81d372e11c6e62373984f98d7c86360a059/hdi_cpi_2015.csv'), ('Tuberculosis Cases dataset', 'https://assets.datacamp.com/production/repositories/2062/datasets/77f871007000492983828f077b8f2d2566eb31c4/tb_tidy.csv'), ('New York Citi Bike Trips dataset', 'https://assets.datacamp.com/production/repositories/2062/datasets/de34f1073c85cae62cea5c887a3f927671828029/ny_bikedata.csv')]",['Introduction to the Tidyverse'],https://www.datacamp.com/courses/interactive-data-visualization-with-rbokeh,Data Visualization,R
140,Interactive Maps with leaflet in R,4,16,55,"4,859","4,500",Interactive Maps leaflet in R,"Interactive Maps with leaflet in R
Get ready to have some fun with maps! Interactive Maps with leaflet in R will give you the tools to make attractive and interactive web maps using spatial data and the tidyverse. In this course, you will create maps using the IPEDS dataset, which contains data on U.S. colleges and universities. Along the way, you will customize our maps using labels, popups, and custom markers, and add layers to enhance interactivity. Following the course, you will be able to create and customize your own interactive web maps to reveal patterns in your data.
Chapter 1 will introduce students to the htmlwidgets package and the leaflet package. Following this introduction, students will build their first interactive web map using leaflet. Through the process of creating this first map students will be introduced to many of the core features of the leaflet package, including adding different map tiles, setting the center point and zoom level, plotting single points based on latitude and longitude coordinates, and storing leaflet maps as objects. Chapter 1 will conclude with students geocoding DataCamp’s headquarters, and creating a leaflet map that plots the headquarters and displays a popup describing the location.
In chapter 2 students will build on the leaflet map they created in chapter 1 to create an interactive web map of every four year college in California. After plotting hundreds of points on an interactive leaflet map, students will learn to customize the markers on their leaflet map. This chapter will also how to color code markers based on a factor variable.
In chapter 3 students will expand on their map of all four year colleges in California to create a map of all American colleges. First, in section 3.1 students will review and build on the material from Chapter 2 to create a map of all American colleges. Then students will re-plot the colleges on their leaflet map by sector (public, private, or for-profit) using groups to enable users to toggle the colleges that are displayed on the map. In section 3.3 students will learn to add multiple base maps so that users can toggle between multiple map tiles.
In Chapter 4 students will learn to map polygons, which can be used to define geographic regions (e.g., zip codes, states, countries, etc.). Chapter 4 will start by plotting the zip codes in North Carolina that fall in the top quartile of mean family incomes. Students will learn to customize the polygons with color palettes and labels. Chapter 4 will conclude with adding a new layer to the map of every college in America that displays every zip code with a mean income of $200,000 or more during the 2015 tax year. Through the process of mapping zip codes students will learn about spatial data generally, geoJSON data, the @ symbol, and the addPolygons() function. Furthermore, students will have an opportunity to practice applying many of the options that they learned about in the previous chapters, such as popups and labels, as well as new ways to customize their maps, such as the highlight option in addPolygons().","['Interactive Data Visualization in R', 'Spatial Data with R']","['Rich Majerus', 'Chester Ismay', 'Becca Robins']","[('IPEDS All 4-Year Colleges', 'https://assets.datacamp.com/production/repositories/1942/datasets/18a000cf70d2fe999c6a6f2b28a7dc9813730e74/ipeds.csv'), ('NC Zipcode Income data', 'https://assets.datacamp.com/production/repositories/1942/datasets/09d53d484e4979a41a51a427a59a49d2654feb5d/mean_income_by_zip_nc.csv'), ('NC Zipcode Polygons', 'https://assets.datacamp.com/production/repositories/1942/datasets/fe567316eb621bf19798df15b1ed4a84a9aa4832/nc_zips.Rda'), (""America's Wealthiest Zipcodes"", 'https://assets.datacamp.com/production/repositories/1942/datasets/ecce5259c642b7b259bcf030b212e8c7f34786fa/wealthiest_zips.Rda')]","['Introduction to R', 'Introduction to the Tidyverse']",https://www.datacamp.com/courses/interactive-maps-with-leaflet-in-r,Data Visualization,R
141,Intermediate Functional Programming with purrr,4,17,49,"1,477","3,850",Intermediate Functional Programming purrr,"Intermediate Functional Programming with purrr
Have you ever been wondering what the purrr description (“A functional programming toolkit for R”) refers to? Then, you’ve come to the right place! This course will walk you through the functional programming part of purrr - in other words, you will learn how to take full advantage of the flexibility offered by the .f in map(.x, .f) to iterate other lists, vectors and data.frame with a robust, clean, and easy to maintain code. During this course, you will learn how to write your own mappers (or lambda functions), and how to use predicates and adverbs. Finally, this new knowledge will be applied to a use case, so that you’ll be able to see how you can use this newly acquired knowledge on a concrete example of a simple nested list, how to extract, keep or discard elements, how to compose functions to manipulate and parse results from this list, how to integrate purrr workflow inside other functions, how to avoid copy and pasting with purrr functional tools.
Do lambda functions, mappers, and predicates sound scary to you? Fear no more! After refreshing your purrr memory, we will dive into functional programming 101, discover anonymous functions and predicates, and see how we can use them to clean and explore data.
Ready to go deeper with functional programming and purrr? In this chapter, we'll discover the concept of functional programming, explore error handling using including safely() and possibly(), and introduce the function compact() for cleaning your code.
In this chapter, we'll use purrr to write code that is clearer, cleaner, and easier to maintain. We'll learn how to write clean functions with compose() and negate(). We'll also use partial() to compose functions by ""prefilling"" arguments from existing functions. Lastly, we'll introduce list-columns, which are a convenient data structure that helps us write clean code using the Tidyverse.
We'll wrap up everything we know about purrr in a case study. Here, we'll use purrr to analyze data that has been scraped from Twitter. We'll use clean code to organize the data and then we'll identify Twitter influencers from the 2018 RStudio conference.",['Intermediate Tidyverse Toolbox'],"['Colin FAY', 'Chester Ismay', 'Becca Robins']",[],"['Introduction to the Tidyverse', 'Foundations of Functional Programming with purrr']",https://www.datacamp.com/courses/intermediate-functional-programming-with-purrr,Programming,R
142,Intermediate Interactive Data Visualization with plotly in R,4,15,54,937,"4,400",Intermediate Interactive Data Visualization plotly in R,"Intermediate Interactive Data Visualization with plotly in R
The plotly package enables the construction of interactive and animated graphics entirely within R. This goes beyond basic interactivity such as panning, zooming, and tooltips. In this course, you will extend your understanding of plotly to create animated and linked interactive graphics, which will enable you to communicate multivariate stories quickly and effectively. Along the way, you will review the basics of plotly, learn how to wrangle your data in new ways to facilitate cumulative animations, and learn how to add filters to your graphics without using Shiny.
A review of key plotly commands. You will review how to create multiple plot types in plotly and how to polish your charts. Additionally, you will create static versions of the bubble and line charts that you will animate in the next chapter.
In this chapter, you will learn how to implement keyframe animation in plotly. You will explore how to create animations, such as Hans Rosling's bubble charts, as well as cumulative animations, such as an animation of a stock's valuation over time.
When you are exploring unexpected structure in your graphics, it's useful to have selections made on one chart update the other. For example, if you are exploring clusters observed on a scatterplot, it is useful to have the selected cluster update some chart of group membership, such as a jittered scatterplot or sets of bar charts. In this chapter, you will learn how to link your plotly charts to enable linked brushing. Along the way, you will also learn how to add dropdown menus, checkboxes, and sliders to your plotly charts, without the need for Shiny.
In the final chapter, you will use your expanded plotly toolkit to explore orbital space launches between 1957 and 2018. Along the way, you'll learn how to wrangle data to enable cumulative animations without common starting points, and hone your understanding of the crosstalk package.",['Interactive Data Visualization in R'],"['Adam Loy', 'Chester Ismay', 'David Campos']","[('Economic indicators for the 50 states and Washington, D.C. from 1997 to 2017', 'https://assets.datacamp.com/production/repositories/2166/datasets/1367560ab66f0b7006da2075a5a97a99b5184bf7/state_economic_data.csv'), ('Complete list of all orbital space launches between 1957 and 2018', 'https://assets.datacamp.com/production/repositories/2166/datasets/c09b75e6d503e5253c80bcfcdfb8f95a606d9793/launches.csv')]",['Interactive Data Visualization with plotly in R'],https://www.datacamp.com/courses/intermediate-interactive-data-visualization-with-plotly-in-r,,
143,Intermediate Portfolio Analysis in R,5,12,42,"5,093","3,250",Intermediate Portfolio Analysis in R,"Intermediate Portfolio Analysis in R
This course builds on the fundamental concepts from Introduction to Portfolio Analysis in R and explores advanced concepts in the portfolio optimization process. It is critical for an analyst or portfolio manager to understand all aspects of the portfolio optimization problem to make informed decisions. In this course, you will learn a quantitative approach to apply the principles of modern portfolio theory to specify a portfolio, define constraints and objectives, solve the problem, and analyze the results. This course will use the R package PortfolioAnalytics to solve portfolio optimization problems with complex constraints and objectives that mirror real world problems.
This chapter will give you a brief review of Modern Portfolio Theory and introduce you to the PortfolioAnalytics package by solving a couple portfolio optimization problems.
The focus of this chapter is a detailed overview of the recommended workflow for solving portfolio optimization problems with PortfolioAnalytics. You will learn how to create a portfolio specification, add constraints, objectives, run the optimization, and analyze the results of the optimization output.
In this chapter, you will learn about estimating moments, characteristics of the distribution of asset returns, as well as custom objective functions.
In the final chapter of the course, you will solve a portfolio optimization problem that mimics a real world real world example of constructing a portfolio of hedge fund strategy with different style definitions.","['Applied Finance with R', 'Quantitative Analyst with R']","['Ross Bennett', 'Lore Dirick', 'Davis Vaughan']","[('Portfolio specifications object I', 'https://assets.datacamp.com/production/repositories/484/datasets/b88fab92460b3085545ce05d863b6d3431cad69a/port_spec_fi_lo_ret.rds'), ('Portfolio specifications object II', 'https://assets.datacamp.com/production/repositories/484/datasets/326375e59f26728e82bc83d2b2cda2a9713102ef/port_spec_ws_lo_ret_ri_ribud.rds'), ('Set of random portfolios I', 'https://assets.datacamp.com/production/repositories/484/datasets/e9eca7c9ab5cefdca301b6617b6764e300fcc3c3/rp_fi_lo_ret.rds'), ('Set of random portfolios II', 'https://assets.datacamp.com/production/repositories/484/datasets/9e5f72ca2e7411905ddb94424f21e625f978b3a2/rp_ws_lo_ret_ri_ribud.rds')]","['Introduction to R for Finance', 'Intermediate R for Finance', 'Introduction to Portfolio Analysis in R']",https://www.datacamp.com/courses/intermediate-portfolio-analysis-in-r,Applied Finance,R
144,Intermediate Python for Data Science,4,18,87,"365,486","7,400",Intermediate Python Data Science,"Intermediate Python for Data Science
Intermediate Python for Data Science is crucial for any aspiring data science practitioner learning Python. Learn to visualize real data with Matplotlib's functions and get acquainted with data structures such as the dictionary and the pandas DataFrame. After covering key concepts such as boolean logic, control flow, and loops in Python, you'll be ready to blend together everything you've learned to solve a case study using hacker statistics.
Data visualization is a key skill for aspiring data scientists. Matplotlib makes it easy to create meaningful and insightful plots. In this chapter, you’ll learn how to build various types of plots, and customize them to be more visually appealing and interpretable.
Learn about the dictionary, an alternative to the Python list, and the pandas DataFrame, the de facto standard to work with tabular data in Python. You will get hands-on practice with creating and manipulating datasets, and you’ll learn how to access the information you need from these data structures.
Boolean logic is the foundation of decision-making in Python programs. Learn about different comparison operators, how to combine them with Boolean operators, and how to use the Boolean outcomes in control structures. You'll also learn to filter data in pandas DataFrames using logic.
There are several techniques you can use to repeatedly execute Python code. While loops are like repeated if statements, the for loop iterates over all kinds of data structures. Learn all about them in this chapter.
This chapter will allow you to apply all the concepts you've learned in this course. You will use hacker statistics to calculate your chances of winning a bet. Use random number generators, loops, and Matplotlib to gain a competitive edge!","['Data Analyst with Python', 'Data Scientist with Python', 'Python Programmer', 'Python Programming']","['Filip Schouwenaars', 'Vincent Vankrunkelsven', 'Patrick Varilly', 'Florian Goossens']","[('Gapminder', 'https://assets.datacamp.com/production/repositories/287/datasets/5b1e4356f9fa5b5ce32e9bd2b75c777284819cca/gapminder.csv'), ('Cars', 'https://assets.datacamp.com/production/repositories/287/datasets/79b3c22c47a2f45a800c62cae39035ff2ea4e609/cars.csv'), ('BRICS', 'https://assets.datacamp.com/production/repositories/287/datasets/b60fb5bdbeb4e4ab0545c485d351e6ff5428a155/brics.csv')]",['Introduction to Python'],https://www.datacamp.com/courses/intermediate-python-for-data-science,Programming,Python
145,Intermediate R,6,14,81,"324,271","6,950",Intermediate R,"Intermediate R
Intermediate R is the next stop on your journey in mastering the R programming language. In this R training, you will learn about conditional statements, loops, and functions to power your own R scripts. Next, make your R code more efficient and readable using the apply functions. Finally, the utilities chapter gets you up to speed with regular expressions in R, data structure manipulations, and times and dates. This course will allow you to take the next step in advancing your overall knowledge and capabilities while programming in R.
In this chapter, you'll learn about relational operators for comparing R objects, and logical operators like ""and"" and ""or"" for combining TRUE and FALSE values. Then, you'll use this knowledge to build conditional statements.
Loops can come in handy on numerous occasions. While loops are like repeated if statements, the for loop is designed to iterate over all elements in a sequence. Learn about them in this chapter.
Functions are an extremely important concept in almost every programming language, and R is no different. Learn what functions are and how to use them—then take charge by writing your own functions.
Whenever you're using a for loop, you may want to revise your code to see whether you can use the lapply function instead. Learn all about this intuitive way of applying a function over a list or a vector, and how to use its variants, sapply and vapply.
Mastering R programming is not only about understanding its programming concepts. Having a solid understanding of a wide range of R functions is also important. This chapter introduces you to many useful functions for data structure manipulation, regular expressions, and working with times and dates.","['Data Analyst with R', 'Data Scientist with R', 'R Programmer', 'R Programming']",['Filip Schouwenaars'],[],['Introduction to R'],https://www.datacamp.com/courses/intermediate-r,Programming,R
146,Intermediate R for Finance,5,15,59,"14,629","5,050",Intermediate R Finance,"Intermediate R for Finance
If you enjoyed the Introduction to R for Finance course, then you will love Intermediate R for Finance. Here, you will first learn the basics about how dates work in R, an important skill for the rest of the course. Your next step will be to explore the world of if statements, loops, and functions. These are powerful ideas that are essential to any financial data scientist's toolkit. Finally, we will spend some time working with the family of apply functions as a vectorized alternative to loops. And of course, all examples will be finance related! Enjoy!
Welcome! Before we go deeper into the world of R, it will be nice to have an understanding of how dates and times are created. This chapter will teach you enough to begin working with dates, but only scratches the surface of what you can do with them.
Imagine you own stock in a company. If the stock goes above a certain price, you might want to sell. If the stock drops below a certain price, you might want to buy it while it's cheap! This kind of thinking can be implemented using operators and if statements. In this chapter, you will learn all about them, and create a program that tells you to buy or sell a stock.
Loops can be useful for doing the same operation to each element of your data structure. In this chapter you will learn all about repeat, while, and for loops!
If data structures like data frames and vectors are how you hold your data, functions are how you tell R what to do with your data. In this chapter, you will learn about using built-in functions, creating your own unique functions, and you will finish off with a brief introduction to packages.
A popular alternative to loops in R are the apply functions. These are often more readable than loops, and are incredibly useful for scaling the data science workflow to perform a complicated calculation on any number of observations. Learn about them here!","['Finance Basics with R', 'Quantitative Analyst with R']","['Lore Dirick', 'Davis Vaughan']",[],['Introduction to R for Finance'],https://www.datacamp.com/courses/intermediate-r-for-finance,Applied Finance,R
147,Intermediate R: Practice,4,0,52,"55,678","4,800",Intermediate R: Practice,"Intermediate R: Practice
This follow-up course on Intermediate R does not cover new programming concepts. Instead, you will strengthen your knowledge of the topics in Intermediate R with a bunch of new and fun exercises.
If conditionals are your thing, these exercises will be a walk in the park. Else, let the feedback guide you and add these vital elements of R to your toolkit!
Looping through data structures is something you'll often do. While and for loops help you do this. Get more practice on them by analyzing the log data from a chemical plant.
Functions make R powerful: you can isolate chunks of code, wrap them in a function and use them whenever you want. In this set of exercises, you'll practice more on using functions and writing your own functions.
lapply, sapply and vapply are all members of R's apply family: they provide a fast and intuitive alternative to the while and for loops you've learned about before. Become an apply pro with some more practice!
To finish off these supplementary exercises, you can exercise some more with often-used functions in R, regular expressions and manipulating dates and times.",[],['Filip Schouwenaars'],"[('The 1912 Titanic ship disaster', 'https://assets.datacamp.com/production/repositories/239/datasets/ea08b483790c2a7bc9b95b0f923526f8e60eae44/titanic.csv'), ('Chemical company log files', 'https://assets.datacamp.com/production/course_7747/datasets/logs.rds')]","['Introduction to R', 'Intermediate R']",https://www.datacamp.com/courses/intermediate-r-practice,Other,R
148,Intermediate SQL,4,15,55,"17,588","4,700",Intermediate SQL,"Intermediate SQL
So you've learned how to aggregate and join data from tables in your database—now what? How do you manipulate, transform, and make the most sense of your data? This intermediate-level course will teach you several key functions necessary to wrangle, filter, and categorize information in a relational database, expand your SQL toolkit, and answer complex questions. You will learn the robust use of CASE statements, subqueries, and window functions—all while discovering some interesting facts about soccer using the European Soccer Database.
In this chapter, you will learn how to use the CASE WHEN statement to create categorical variables, aggregate data into a single column with multiple filtering conditions, and calculate counts and percentages.
In this chapter, you will learn about subqueries in the SELECT, FROM, and WHERE clauses. You will gain an understanding of when subqueries are necessary to construct your dataset and where to best include them in your queries.
In this chapter, you will learn how to use nested and correlated subqueries to extract more complex data from a relational database. You will also learn about common table expressions and how to best construct queries using multiple common table expressions.
You will learn about window functions and how to pass aggregate functions along a dataset. You will also learn how to calculate running totals and partitioned averages.",['SQL Fundamentals'],"['Mona Khalil', 'Hillary Green-Lerman', 'Sumedh Panchadhar']",[],['Joining Data in SQL'],https://www.datacamp.com/courses/intermediate-sql,Data Manipulation,SQL
149,Intermediate SQL Server,4,14,47,"12,394","3,850",Intermediate SQL Server,"Intermediate SQL Server
A majority of data is stored in databases and knowing the necessary tools needed to analyze and clean data directly in databases is indispensable. This course focuses on T-SQL, the version of SQL used in Microsoft SQL Server, needed for data analysis. You will learn several concepts in this course such as dealing with missing data, working with dates, and calculating summary statistics using advanced queries. After completing this course, you will have the skills needed to analyze data and provide insights quickly and easily.
One of the first steps in data analysis is examining data through aggregations. This chapter explores how to create aggregations in SQL Server, a common first step in data exploration. You will also clean missing data and categorize data into bins with CASE statements.
This chapter explores essential math operations such as rounding numbers, calculating squares and square roots, and counting records. You will also work with dates in this chapter!
In this chapter, you will create variables and write while loops to process data. You will also write complex queries by using derived tables and common table expressions.
In the final chapter of this course, you will work with partitions of data and window functions to calculate several summary stats and see how easy it is to create running totals and compute the mode of numeric columns.",['SQL Server Fundamentals'],"['Ginger Grant', 'Richie Cotton', 'Sumedh Panchadhar']","[('Incidents', 'https://assets.datacamp.com/production/repositories/1611/datasets/d34780ca1f1bf7578939a2fea4398809e0160d1f/Incidents.csv'), ('Shipments', 'https://assets.datacamp.com/production/repositories/1611/datasets/3222b2ba724c7fc672b77a88d07ec1b51eb5cc22/MixData.csv'), ('Kidney', 'https://assets.datacamp.com/production/repositories/1611/datasets/e974b5ed6baeda8ab34b73bd1c36105f0735be47/ChronicKidneyDisease.csv'), ('Orders', 'https://assets.datacamp.com/production/repositories/1611/datasets/cc02f651accbb545cd5b37bb98236ce7da0f1fb2/Orders.csv')]","['Intro to SQL for Data Science', 'Joining Data in SQL']",https://www.datacamp.com/courses/intermediate-t-sql,Programming,SQL
150,Intermediate Spreadsheets for Data Science,4,12,48,"6,925","4,150",Intermediate Spreadsheets Data Science,"Intermediate Spreadsheets for Data Science
This course will expand your Google Sheets vocabulary. You'll dive deeper into data types, practice manipulating numeric and logical data, explore missing data and error types, and calculate some summary statistics. As you go, you'll explore datasets on 100m sprint world records, asteroid close encounters, benefit claims, and butterflies.
In which you learn to interrogate cells to determine the data type of their contents, and to convert between data types.
In which you learn to apply log and square root transformations to numbers, round them up and down, and generate random numbers.
In which you learn how to work with logical data consisting of TRUE and FALSE values, and how to handle missing values and errors.
In which you learn about cell addresses, advanced matching, sorting and filtering, and simple imputation.",[],['Richie Cotton'],[],[],https://www.datacamp.com/courses/intermediate-spreadsheets-for-data-science,Programming,Spreadsheets
151,Intro to Financial Concepts using Python,4,13,50,"8,192","4,200",Intro Financial Concepts using Python,"Intro to Financial Concepts using Python
Understanding the basic principles of finance is essential for making important financial decisions ranging from taking out a student loan to constructing an investment portfolio. Combining basic financial knowledge with Python will allow you to construct some very powerful tools. You'll come out of this course understanding the time value of money, how to compare potential projects and how to make rational, data-driven financial decisions.
Learn about fundamental financial concepts like the time value of money, growth and rate of return, discount factors, depreciation, and inflation.
In this chapter, you will act as the CEO of a company, making important data-driven financial decisions about projects and financing using measures such as IRR and NPV.
You just got married, and you're looking for a new home in Hoboken, New Jersey. You will build a mortgage payment simulator to estimate your mortgage payments and analyze different possible economic scenarios.
You just got a new job as a data scientist in San Francisco, and you're looking for an apartment. In this chapter, you'll be building your own budgeting application to plan out your financial future.",[],"['Dakota Wixom', 'Lore Dirick', 'Sumedh Panchadhar']",[],[],https://www.datacamp.com/courses/intro-to-financial-concepts-using-python,Applied Finance,Python
152,Intro to Portfolio Risk Management in Python,4,13,51,"4,574","4,250",Intro Portfolio Risk Management,"Intro to Portfolio Risk Management in Python
This course will teach you how to evaluate basic portfolio risk and returns like a quantitative analyst on Wall Street. This is the most critical step towards being able to fully automate your portfolio construction and management processes. Discover what factors are driving your portfolio returns, construct market-cap weighted equity portfolios, and learn how to forecast and hedge market risk via scenario generation.
Learn about the fundamentals of investment risk and financial return distributions.
Level up your understanding of investing by constructing portfolios of assets to enhance your risk-adjusted returns.
Learn about the main factors that influence the returns of your portfolios and how to quantify your portfolio's exposure to these factors.
In this chapter, you will learn two different methods to estimate the probability of sustaining losses and the expected values of those losses for a given asset or portfolio of assets.",[],"['Dakota Wixom', 'Lore Dirick', 'Sumedh Panchadhar', 'Eunkyung Park']","[('All returns (2017)', 'https://assets.datacamp.com/production/repositories/1546/datasets/fb7165b7270a3721f69abf9ff09b85938d9d1068/Big9Returns2017.csv'), ('Efficient Frontier Portfolios', 'https://assets.datacamp.com/production/repositories/1546/datasets/85e2663a50d3445cbc2c2d30ac81abbaae6a7f56/EfficientFrontierPortfoliosSlim.csv'), ('Fama-French factors', 'https://assets.datacamp.com/production/repositories/1546/datasets/3d9b734fea954b629d2477ef48c36525dfecf6e0/FamaFrenchFactors.csv'), ('Microsoft prices', 'https://assets.datacamp.com/production/repositories/1546/datasets/0f1a004a8aa693163fa55f277513309f710b700d/MSFTPrices.csv'), ('ETF of oil prices (UFO)', 'https://assets.datacamp.com/production/repositories/1546/datasets/dfe9da08c986709d59943d1d5c0106537a8c608a/USO.csv')]","['Intro to Financial Concepts using Python', 'Manipulating Time Series Data in Python']",https://www.datacamp.com/courses/intro-to-portfolio-risk-management-in-python,Applied Finance,Python
153,Intro to Python for Finance,4,14,55,"9,409","4,650",Intro Python Finance,"Intro to Python for Finance
The financial industry is increasingly adopting Python for general-purpose programming and quantitative analysis, ranging from understanding trading dynamics to risk management systems. This course focuses specifically on introducing Python for financial analysis. Using practical examples, you will learn the fundamentals of Python data structures such as lists and arrays and learn powerful ways to store and manipulate financial data to identify trends.
This chapter is an introduction to basics in Python, including how to name variables and various data types in Python.
This chapter introduces lists in Python and how they can be used to work with data.
This chapter introduces packages in Python, specifically the NumPy package and how it can be efficiently used to manipulate arrays.
In this chapter, you will be introduced to the Matplotlib package for creating line plots, scatter plots, and histograms.
In this chapter, you will get a chance to apply all the techniques you learned in the course on the S&P 100 data.",[],"['Adina Howe', 'Lore Dirick', 'Eunkyung Park', 'Sumedh Panchadhar']","[('Stocks data (I)', 'https://assets.datacamp.com/production/repositories/1715/datasets/2623c8037df0505d619c87a09131af9105e5883d/stock_data.csv'), ('Stocks data (II)', 'https://assets.datacamp.com/production/repositories/1715/datasets/d96bf818f1f6f52af429edcaaf9dd96d37ab7b0a/stock_data2.csv'), ('S&P 100 data', 'https://assets.datacamp.com/production/repositories/1715/datasets/0ef2a37a04b12d12368f060efd02b93cd110bd29/sector.txt')]",[],https://www.datacamp.com/courses/intro-to-python-for-finance,Applied Finance,Python
154,Introduction to AWS Boto in Python,4,15,54,300,"4,550",Introduction AWS Boto,"Introduction to AWS Boto in Python
What if you were no longer constrained by the capabilities of your laptop? What if you could get an SMS when a city garbage truck camera spots a missing a cat? This is all possible with cloud technology. This course will teach you how to integrate Amazon Web Services (AWS) into your data workflow. You’ll learn how to upload data to S3, AWS cloud storage. You’ll use triggers from your analysis to send text messages with AWS SNS. You will use Rekognition to detect objects in an image. And you will use Comprehend to decide if a piece of feedback is negative. By the time you’re done, you will learn how to build a pipeline, subscribe people to it, and send them text messages when an image contains a cat!
Embark on the world of cloud technology! From learning how AWS works to creating S3 buckets and uploading files to them. You will master the basics of setting up AWS and uploading files to the cloud!
Continue your journey in mastering AWS by learning how to upload and share files securely. You will learn how set files to be public or private, and cap off what you learned by generating web-based reports!
Next, you will learn how to automate sharing your findings with the world by building notification triggers for your analysis! You will learn how to harness AWS to send SMS and email notifications to users and cap off what you learned by making custom notifications depending on a user's needs.
Finally, you will go beyond uploading, sharing and notifying into rekognizing using AWS Rekognition and other AWS machine learning services to recognize cats, translate language and detect sentiment. You will be capping off your learning journey by applying a real-world use case that mixes everything you've learned!",[],"['Maksim Pecherskiy', 'Hillary Green-Lerman', 'Adel Nehme']","[('Get It Done Requests', 'https://assets.datacamp.com/production/repositories/4607/datasets/77f70071e5e5ea42aa31d5384640bee6931a5d50/get_it_done_2019_requests_datasd.csv')]","['Introduction to Python', 'Intermediate Python for Data Science', 'Python Data Science Toolbox (Part 1)', 'Python Data Science Toolbox (Part 2)']",https://www.datacamp.com/courses/introduction-to-aws-boto-in-python,Programming,Python
155,Introduction to Bioconductor,4,14,54,"1,442","4,050",Introduction Bioconductor,"Introduction to Bioconductor
Much of the biological research, from medicine to biotech, is moving toward sequence analysis. We are now generating targeted and whole genome big data, which needs to be analyzed to answer biological questions. To help you get started, you will be introduced to The Bioconductor project. Bioconductor is and builds the infrastructure to share software tools (packages), workflows and datasets for the analysis and comprehension of genomic data. Bioconductor is a great platform accessible to you, and it is a community developed open software resource. By the end of this course, you will be able to use essential Bioconductor packages and get a grasp of its infrastructure and some built-in datasets. Using BSgenome, Biostrings, IRanges, GenomicRanges, TxDB, ShortRead and Rqc with real datasets from different species is going to be an exceptional experience!
In this chapter you will get hands-on with Bioconductor. Bioconductor is the specialized repository for bioinformatics software, developed and maintained by the R community. You will learn how to install and use bioconductor packages. You will be introduced to S4 objects and functions, because most packages within Bioconductor inherit from S4. Additionally, you will use a real genomic dataset of a fungus to explore the BSgenome package.
Biostrings are memory efficient string containers. Biostring has matching algorithms, and other utilities, for fast manipulation of large biological sequences or sets of sequences. How efficient you can become by using the right containers for your sequences? You will learn about alphabets, and sequence manipulation by using the tiny genome of a virus.
The IRanges and GenomicRanges packages are also containers for storing and manipulating genomic intervals and variables defined along a genome. These packages provide infrastructure and support to many other Bioconductor packages because of their enriching features. You will learn how to use these containers and their associated metadata, for manipulation of your sequences. The dataset you will be looking at is a special gene of interest in the human genome.
ShortRead is the package for input, manipulation and assessment of fasta and fastq files. You can subset, trim and filter the sequences of interest, and even do a report of quality. An extra bonus towards the last exercises will give you the tools for parallel quality assessment, wink, wink Rqc. Exciting enough, for this you will use plant genome sequences!",[],"['Paula Martinez', 'Sascha Mayr', 'David Campos', 'Shon Inouye']","[('Zika Genomic DNA dataset', 'https://assets.datacamp.com/production/repositories/1641/datasets/790618555a5e420bbda36fd93effe01182896e1f/zika_genomic.fa.txt'), ('A. Thaliana Short Reads with Quality dataset', 'https://assets.datacamp.com/production/repositories/1641/datasets/0b92c84cc116f3c838b709fccd9cbede96f7fe1e/small_SRR1971253.fastq'), ('Human Gene & Transcript ID dataset', 'https://assets.datacamp.com/production/repositories/1641/datasets/7d0e830ab73ed1b85fbc2f9149c244eee1dfe4d7/gene_id_tx_id.txt'), ('Yeast Genome dataset', 'https://assets.datacamp.com/production/repositories/1641/datasets/4870f7b72822ef33b987e40d5f6aed21a54de858/sacCer3.fasta.gz')]","['Introduction to R', 'Introduction to the Tidyverse']",https://www.datacamp.com/courses/introduction-to-bioconductor,Other,R
156,Introduction to Data,4,15,46,"60,570","3,200",Introduction Data,"Introduction to Data
Scientists seek to answer questions using rigorous methods and careful observations. These observations—collected from the likes of field notes, surveys, and experiments—form the backbone of a statistical investigation and are called data. Statistics is the study of how best to collect, analyze, and draw conclusions from data. It is helpful to put statistics in the context of a general process of investigation: 1) identify a question or problem; 2) collect relevant data on the topic; 3) analyze the data; and 4) form a conclusion. In this course, you'll focus on the first two steps of the process.
This chapter introduces terminology of datasets and data frames in R.
In this chapter, you will learn about observational studies and experiments, scope of inference, and Simpson's paradox.
This chapter defines various sampling strategies and their benefits/drawbacks as well as principles of experimental design.
Apply terminology, principles, and R code learned in the first three chapters of this course to a case study looking at how the physical appearance of instructors impacts their students' course evaluations.","['Data Analyst with R', 'Data Scientist with R', 'Statistics Fundamentals with R']","['Mine Cetinkaya-Rundel', 'Nick Carchedi', 'Tom Jeon']","[('Course evaluation', 'https://assets.datacamp.com/production/repositories/539/datasets/e4bb6dc2496e3a50208dccb81dcbcb62faf5b122/evals.RData'), ('UC Berkeley admissions', 'https://assets.datacamp.com/production/repositories/539/datasets/312d8ff0bad2cd9d567adce0181435a99892c5f8/ucb_admit.RData'), ('US state regions', 'https://assets.datacamp.com/production/repositories/539/datasets/5a549cee71a2347201fb145e25312eaa426ec9be/us_regions.RData')]",['Introduction to R'],https://www.datacamp.com/courses/introduction-to-data,Probability & Statistics,R
157,Introduction to Data Engineering,4,15,57,147,"4,100",Introduction Data Engineering,"Introduction to Data Engineering
Have you heard people talk about data engineers and wonder what it is they do? Do you know what data engineers do but you're not sure how to become one yourself? This course is the perfect introduction. It touches upon all things you need to know to streamline your data processing. This introductory course will give you enough context to start exploring the world of data engineering. It's perfect for people who work at a company with several data sources and don't have a clear idea of how to use all those data sources in a scalable way. Be the first one to introduce these techniques to your company and become the company star employee.
In this first chapter, you will be exposed to the world of data engineering! Explore the differences between a data engineer and a data scientist, get an overview of the various tools data engineers use and expand your understanding of how cloud technology plays a role in data engineering.
Now that you know the primary differences between a data engineer and a data scientist, get ready to explore the data engineer's toolbox! Learn in detail about different types of databases data engineers use, how parallel computing is a cornerstone of the data engineer's toolkit, and how to schedule data processing jobs using scheduling frameworks.
Having been exposed to the toolbox of data engineers, it's now time to jump into the bread and butter of a data engineer's workflow! With ETL, you will learn how to extract raw data from various sources, transform this raw data into actionable insights, and load it into relevant databases ready for consumption!
Cap off all that you've learned in the previous three chapters by completing a real-world data engineering use case from DataCamp! You will perform and schedule an ETL process that transforms raw course rating data, into actionable course recommendations for DataCamp students!",[],"['Vincent Vankrunkelsven', 'Adel Nehme']",[],"['Introduction to Python', 'Intermediate Python for Data Science', 'Intro to SQL for Data Science']",https://www.datacamp.com/courses/introduction-to-data-engineering,Programming,Python
158,Introduction to Data Science in Python,4,13,44,"20,105","3,700",Introduction Data Science,"Introduction to Data Science in Python
Begin your journey into Data Science! Even if you've never written a line of code in your life, you'll be able to follow this course and witness the power of Python to perform Data Science. You'll use data to solve the mystery of Bayes, the kidnapped Golden Retriever, and along the way you'll become familiar with basic Python syntax and popular Data Science modules like Matplotlib (for charts and graphs) and Pandas (for tabular data).
Welcome to the wonderful world of Data Analysis in Python! In this chapter, you'll learn the basics of Python syntax, load your first Python modules, and use functions to get a suspect list for the kidnapping of Bayes, DataCamp's prize-winning Golden Retriever.
In this chapter, you'll learn a powerful Python libary: pandas. Pandas lets you read, modify, and search tabular datasets (like spreadsheets and database tables). You'll examine credit card records for the suspects and see if any of them made suspicious purchases.
Get ready to visualize your data! You'll create line plots with another Python module: matplotlib. Using line plots, you'll analyze the letter frequencies from the ransom note and several handwriting samples to determine the kidnapper.
In this final chapter, you'll learn how to create three new plot types: scatter plots, bar plots, and histograms. You'll use these tools to locate where the kidnapper is hiding and rescue Bayes, the Golden Retriever.",['Data Analyst with Python'],"['Hillary Green-Lerman', 'Mona Khalil']",[],[],https://www.datacamp.com/courses/introduction-to-data-science-in-python,Programming,Python
159,Introduction to Data Visualization with Python,4,14,58,"87,388","5,000",Introduction Data Visualization Python,"Introduction to Data Visualization with Python
This course extends Intermediate Python for Data Science to provide a stronger foundation in data visualization in Python. You’ll get a broader coverage of the Matplotlib library and an overview of seaborn, a package for statistical graphics. Topics covered include customizing graphics, plotting two-dimensional arrays (like pseudocolor plots, contour plots, and images), statistical graphics (like visualizing distributions and regressions), and working with time series and image data.