This repository contains some scripts that help exploring a course review dataset and getting some insights such as:
- Top rated courses: Average ratings per course
- Time series analysis: rating by period
- Positive and negative reviews
- What day of the week are people the happiest...etc
The following Web app visualization is achieved using justpy. It represents the day of the week where people are the happiest. It is identified by the day the course receives the highest ratings
- Libraries to access and analyse data like pandas
- Creating interactive charts in a web app using justpy
- Creating simple visualization plots using matplotlib
Course | Name | Timestamp | Rating | Comment |
---|---|---|---|---|
The Python Mega Course: Build 10 Real World Ap... | 2021-04-02 06:25:52+00:00 4.0 | NaN | ||
The Python Mega Course: Build 10 Real World Ap... | 2021-04-02 05:12:34+00:00 4.0 | NaN | ||
The Python Mega Course: Build 10 Real World Ap... | 2021-04-02 05:11:03+00:00 4.0 | NaN | ||
The Python Mega Course: Build 10 Real World Ap... | 2021-04-02 03:33:24+00:00 5.0 | NaN | ||
The Python Mega Course: Build 10 Real World Ap... | 2021-04-02 03:31:49+00:00 4.5 | NaN |
import pandas as pd
from datetime import datetime
from pytz import utc
import matplotlib.pyplot as plt
data= pd.read_csv("reviews.csv", parse_dates=["Timestamp"])
data.head
<bound method NDFrame.head of Course Name \
0 The Python Mega Course: Build 10 Real World Ap...
1 The Python Mega Course: Build 10 Real World Ap...
2 The Python Mega Course: Build 10 Real World Ap...
3 The Python Mega Course: Build 10 Real World Ap...
4 The Python Mega Course: Build 10 Real World Ap...
... ...
44995 Python for Beginners with Examples
44996 The Python Mega Course: Build 10 Real World Ap...
44997 The Python Mega Course: Build 10 Real World Ap...
44998 Python for Beginners with Examples
44999 The Python Mega Course: Build 10 Real World Ap...
Timestamp Rating Comment
0 2021-04-02 06:25:52+00:00 4.0 NaN
1 2021-04-02 05:12:34+00:00 4.0 NaN
2 2021-04-02 05:11:03+00:00 4.0 NaN
3 2021-04-02 03:33:24+00:00 5.0 NaN
4 2021-04-02 03:31:49+00:00 4.5 NaN
... ... ... ...
44995 2018-01-01 01:11:26+00:00 4.0 NaN
44996 2018-01-01 01:09:56+00:00 5.0 NaN
44997 2018-01-01 01:08:11+00:00 5.0 NaN
44998 2018-01-01 01:05:26+00:00 5.0 NaN
44999 2018-01-01 01:01:16+00:00 5.0 NaN
[45000 rows x 4 columns]>
# add a day column
data["Day"]= data["Timestamp"].dt.date
day_average=data.groupby(["Day"]).mean()
list(day_average.index)
plt.plot(day_average.index, day_average['Rating'])
# Add figure object to resize the graph
plt.figure(figsize=(25, 3))
plt.plot(day_average.index, day_average['Rating'])
data["Week"]=data["Timestamp"].dt.strftime("%Y-%U") # week with its year
data.head()
average_week=data.groupby( ["Week"]).mean()
average_week
Rating | |
---|---|
Week | |
2018-00 | 4.434564 |
2018-01 | 4.424933 |
2018-02 | 4.417702 |
2018-03 | 4.401024 |
173 rows × 1 columns
plt.figure(figsize=(30, 6))
plt.plot(average_week.index, average_week["Rating"])
data["Month"]=data["Timestamp"].dt.strftime("%y-%m")
average_month=data.groupby("Month").mean()
average_month
Rating | |
---|---|
Month | |
18-01 | 4.429645 |
18-02 | 4.436248 |
18-03 | 4.421671 |
18-04 | 4.468211 |
18-05 | 4.396420 |
18-06 | 4.375379 |
18-07 | 4.393184 |
18-08 | 4.344753 |
18-09 | 4.347247 |
18-10 | 4.374429 |
18-11 | 4.386817 |
18-12 | 4.342105 |
19-01 | 4.401920 |
19-02 | 4.346964 |
19-03 | 4.333145 |
19-04 | 4.420049 |
19-05 | 4.405569 |
19-06 | 4.398559 |
19-07 | 4.382353 |
19-08 | 4.417059 |
19-09 | 4.451135 |
19-10 | 4.483871 |
19-11 | 4.493260 |
19-12 | 4.471046 |
20-01 | 4.439615 |
20-02 | 4.428642 |
20-03 | 4.480690 |
20-04 | 4.475220 |
20-05 | 4.448082 |
20-06 | 4.482812 |
20-07 | 4.517508 |
20-08 | 4.470987 |
20-09 | 4.485862 |
20-10 | 4.515201 |
20-11 | 4.479306 |
20-12 | 4.528358 |
21-01 | 4.551325 |
21-02 | 4.567901 |
21-03 | 4.589207 |
21-04 | 4.544118 |
plt.figure(figsize=(30,6))
plt.plot(average_month.index, average_month['Rating'])
data["Month"]=data["Timestamp"].dt.strftime("%y-%m")
average_month_course=data.groupby(["Month","Course Name"]).mean()
average_month_course[:20]
# dataframe with 2 indexes
(262, 1)
average_month_course=data.groupby(["Month","Course Name"]).mean().unstack()
average_month_course[:20]
average_month_course.columns
average_month_course.plot(figsize=(20,6))
data.head
<bound method NDFrame.head of Course Name \
0 The Python Mega Course: Build 10 Real World Ap...
1 The Python Mega Course: Build 10 Real World Ap...
2 The Python Mega Course: Build 10 Real World Ap...
3 The Python Mega Course: Build 10 Real World Ap...
4 The Python Mega Course: Build 10 Real World Ap...
... ...
44995 Python for Beginners with Examples
44996 The Python Mega Course: Build 10 Real World Ap...
44997 The Python Mega Course: Build 10 Real World Ap...
44998 Python for Beginners with Examples
44999 The Python Mega Course: Build 10 Real World Ap...
Timestamp Rating Comment Day Week Month
0 2021-04-02 06:25:52+00:00 4.0 NaN 2021-04-02 2021-13 21-04
1 2021-04-02 05:12:34+00:00 4.0 NaN 2021-04-02 2021-13 21-04
2 2021-04-02 05:11:03+00:00 4.0 NaN 2021-04-02 2021-13 21-04
3 2021-04-02 03:33:24+00:00 5.0 NaN 2021-04-02 2021-13 21-04
4 2021-04-02 03:31:49+00:00 4.5 NaN 2021-04-02 2021-13 21-04
... ... ... ... ... ... ...
44995 2018-01-01 01:11:26+00:00 4.0 NaN 2018-01-01 2018-00 18-01
44996 2018-01-01 01:09:56+00:00 5.0 NaN 2018-01-01 2018-00 18-01
44997 2018-01-01 01:08:11+00:00 5.0 NaN 2018-01-01 2018-00 18-01
44998 2018-01-01 01:05:26+00:00 5.0 NaN 2018-01-01 2018-00 18-01
44999 2018-01-01 01:01:16+00:00 5.0 NaN 2018-01-01 2018-00 18-01
[45000 rows x 7 columns]>
data["Weekday"]=data["Timestamp"].dt.strftime("%A")
data["daynumber"]=data["Timestamp"].dt.strftime("%w")
data
average_weekday=data.groupby(["Weekday", "daynumber"]).mean()
average_weekday=average_weekday.sort_values("daynumber")
average_weekday
Rating | ||
---|---|---|
Weekday | daynumber | |
Sunday | 0 | 4.439097 |
Monday | 1 | 4.449335 |
Tuesday | 2 | 4.446240 |
Wednesday | 3 | 4.427452 |
Thursday | 4 | 4.437880 |
Friday | 5 | 4.455207 |
Saturday | 6 | 4.440274 |
plt.figure(figsize=(15,6))
plt.plot(average_weekday.index.get_level_values(0), average_weekday["Rating"])
nb_comment=data.groupby("Course Name")["Comment"].count()
list(nb_comment)
nb_comment.index
plt.pie(nb_comment, labels=nb_comment.index)
sudo apt install python3-pandas
pip install matplotlib
pip install justpy
https://www.highcharts.com/docs/chart-and-series-types/pie-chart