Skip to content

Commit 50922b0

Browse files
committed
Lecture 6 Exercise Solutions
1 parent c88b037 commit 50922b0

File tree

2 files changed

+222
-0
lines changed

2 files changed

+222
-0
lines changed

Diff for: .DS_Store

0 Bytes
Binary file not shown.
+222
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Subsetting `DataFrames` -- Exercises"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"## Goal\n",
15+
"\n",
16+
"Practice `pandas` subsetting operations"
17+
]
18+
},
19+
{
20+
"cell_type": "markdown",
21+
"metadata": {},
22+
"source": [
23+
"## Exercises"
24+
]
25+
},
26+
{
27+
"cell_type": "markdown",
28+
"metadata": {},
29+
"source": [
30+
"### 0. Import `pandas` and load the penguins data set"
31+
]
32+
},
33+
{
34+
"cell_type": "code",
35+
"execution_count": null,
36+
"metadata": {},
37+
"outputs": [],
38+
"source": [
39+
"import pandas as pd\n",
40+
"\n",
41+
"penguins = pd.read_csv(\"https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv\")"
42+
]
43+
},
44+
{
45+
"cell_type": "markdown",
46+
"metadata": {},
47+
"source": [
48+
"### 1. Get the data from the column for the flipper length. What is its type of the output when you select a single column from a `DataFrame`?"
49+
]
50+
},
51+
{
52+
"cell_type": "code",
53+
"execution_count": null,
54+
"metadata": {},
55+
"outputs": [],
56+
"source": [
57+
"# When subsetting to specific columns, it's useful to print out the column names for reference\n",
58+
"list(penguins.columns)"
59+
]
60+
},
61+
{
62+
"cell_type": "code",
63+
"execution_count": null,
64+
"metadata": {},
65+
"outputs": [],
66+
"source": [
67+
"# Just index the DataFrame with the column name\n",
68+
"flipper_len = penguins[\"flipper_length_mm\"]\n",
69+
"flipper_len.head(10)"
70+
]
71+
},
72+
{
73+
"cell_type": "code",
74+
"execution_count": null,
75+
"metadata": {},
76+
"outputs": [],
77+
"source": [
78+
"# Use the type function to get the type of the output\n",
79+
"# Here, the type is a pandas Series object since we only selected a single column\n",
80+
"type(flipper_len)"
81+
]
82+
},
83+
{
84+
"cell_type": "markdown",
85+
"metadata": {},
86+
"source": [
87+
"### 2. Subset the penguins data to just the columns containing length/depth measurements. What is the type of the output when you select multiple columns from a `DataFrame`?"
88+
]
89+
},
90+
{
91+
"cell_type": "code",
92+
"execution_count": null,
93+
"metadata": {},
94+
"outputs": [],
95+
"source": [
96+
"# Subsetting to multiple columns requires wraping the column names in a Python list with []\n",
97+
"length_data = penguins[[\"bill_length_mm\", \"bill_depth_mm\", \"flipper_length_mm\"]]\n",
98+
"length_data.head(10)"
99+
]
100+
},
101+
{
102+
"cell_type": "code",
103+
"execution_count": null,
104+
"metadata": {},
105+
"outputs": [],
106+
"source": [
107+
"# Since multiple columns were selected, the output is DataFrame\n",
108+
"type(length_data)"
109+
]
110+
},
111+
{
112+
"cell_type": "markdown",
113+
"metadata": {},
114+
"source": [
115+
"### 3. What are the names of the different islands represented in the data set?"
116+
]
117+
},
118+
{
119+
"cell_type": "code",
120+
"execution_count": null,
121+
"metadata": {},
122+
"outputs": [],
123+
"source": [
124+
"# Select the island column, then get the unique values\n",
125+
"list(penguins[\"island\"].unique())"
126+
]
127+
},
128+
{
129+
"cell_type": "markdown",
130+
"metadata": {},
131+
"source": [
132+
"### 4. How many rows have missing body mass values?\n",
133+
"\n",
134+
"Hint: You'll need to find (or guess) the name of a helper function very similar to one we used in the lesson."
135+
]
136+
},
137+
{
138+
"cell_type": "code",
139+
"execution_count": null,
140+
"metadata": {},
141+
"outputs": [],
142+
"source": [
143+
"# The isna function indicates whether or not a value is missing\n",
144+
"penguins[penguins[\"body_mass_g\"].isna()]"
145+
]
146+
},
147+
{
148+
"cell_type": "code",
149+
"execution_count": null,
150+
"metadata": {},
151+
"outputs": [],
152+
"source": [
153+
"# It's easy to count the rows from the output, but if there were many more\n",
154+
"# shape can be used to get the count programatically\n",
155+
"penguins[penguins[\"body_mass_g\"].isna()].shape[0]"
156+
]
157+
},
158+
{
159+
"cell_type": "markdown",
160+
"metadata": {},
161+
"source": [
162+
"### 5. Get the subset of data that match ALL of the following criteria\n",
163+
"\n",
164+
"* Penguins of the Gentoo and Chinstrap species\n",
165+
"* Flipper length less than 200\n",
166+
"* Females only"
167+
]
168+
},
169+
{
170+
"cell_type": "code",
171+
"execution_count": null,
172+
"metadata": {},
173+
"outputs": [],
174+
"source": [
175+
"# This is mainly an exercise in getting the syntax correct\n",
176+
"penguins[(penguins[\"species\"].isin([\"Gentoo\", \"Chinstrap\"])) & \\\n",
177+
" (penguins[\"flipper_length_mm\"] < 200) & \\\n",
178+
" (penguins[\"sex\"] == \"female\")]"
179+
]
180+
},
181+
{
182+
"cell_type": "markdown",
183+
"metadata": {},
184+
"source": [
185+
"### 6. If we only wanted to select the `species`, `flipper_length_mm`, and `sex` columns from the above exercise, how would we need to modify the code?"
186+
]
187+
},
188+
{
189+
"cell_type": "code",
190+
"execution_count": null,
191+
"metadata": {},
192+
"outputs": [],
193+
"source": [
194+
"# To filter rows AND select specific columns, we need to use the .loc function\n",
195+
"penguins.loc[(penguins[\"species\"].isin([\"Gentoo\", \"Chinstrap\"])) & \\\n",
196+
" (penguins[\"flipper_length_mm\"] < 200) & \\\n",
197+
" (penguins[\"sex\"] == \"female\"), [\"species\", \"flipper_length_mm\", \"sex\"]]"
198+
]
199+
}
200+
],
201+
"metadata": {
202+
"kernelspec": {
203+
"display_name": "Python 3.9.5 - rstudio",
204+
"language": "python",
205+
"name": "rstudio-user-3.9.5"
206+
},
207+
"language_info": {
208+
"codemirror_mode": {
209+
"name": "ipython",
210+
"version": 3
211+
},
212+
"file_extension": ".py",
213+
"mimetype": "text/x-python",
214+
"name": "python",
215+
"nbconvert_exporter": "python",
216+
"pygments_lexer": "ipython3",
217+
"version": "3.9.5"
218+
}
219+
},
220+
"nbformat": 4,
221+
"nbformat_minor": 4
222+
}

0 commit comments

Comments
 (0)