socalrug
diff --git a/Diff for: ‎.DS_Store
0 Bytes b/Diff for: ‎.DS_Store
0 Bytes
diff --git a/Diff for: ‎Exercise_Solutions/06_Solution-Subsetting_DataFrames.ipynb
+222 b/Diff for: ‎Exercise_Solutions/06_Solution-Subsetting_DataFrames.ipynb
+222
@@ -0,0 +1,222 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Subsetting `DataFrames` -- Exercises"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Goal\n",
+    "\n",
+    "Practice `pandas` subsetting operations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Exercises"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 0. Import `pandas` and load the penguins data set"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "\n",
+    "penguins = pd.read_csv(\"https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 1. Get the data from the column for the flipper length.  What is its type of the output when you select a single column from a `DataFrame`?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# When subsetting to specific columns, it's useful to print out the column names for reference\n",
+    "list(penguins.columns)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Just index the DataFrame with the column name\n",
+    "flipper_len = penguins[\"flipper_length_mm\"]\n",
+    "flipper_len.head(10)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Use the type function to get the type of the output\n",
+    "# Here, the type is a pandas Series object since we only selected a single column\n",
+    "type(flipper_len)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2. Subset the penguins data to just the columns containing length/depth measurements.  What is the type of the output when you select multiple columns from a `DataFrame`?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Subsetting to multiple columns requires wraping the column names in a Python list with []\n",
+    "length_data = penguins[[\"bill_length_mm\", \"bill_depth_mm\", \"flipper_length_mm\"]]\n",
+    "length_data.head(10)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Since multiple columns were selected, the output is DataFrame\n",
+    "type(length_data)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3. What are the names of the different islands represented in the data set?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Select the island column, then get the unique values\n",
+    "list(penguins[\"island\"].unique())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 4. How many rows have missing body mass values?\n",
+    "\n",
+    "Hint: You'll need to find (or guess) the name of a helper function very similar to one we used in the lesson."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# The isna function indicates whether or not a value is missing\n",
+    "penguins[penguins[\"body_mass_g\"].isna()]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# It's easy to count the rows from the output, but if there were many more\n",
+    "# shape can be used to get the count programatically\n",
+    "penguins[penguins[\"body_mass_g\"].isna()].shape[0]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 5. Get the subset of data that match ALL of the following criteria\n",
+    "\n",
+    "* Penguins of the Gentoo and Chinstrap species\n",
+    "* Flipper length less than 200\n",
+    "* Females only"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# This is mainly an exercise in getting the syntax correct\n",
+    "penguins[(penguins[\"species\"].isin([\"Gentoo\", \"Chinstrap\"])) & \\\n",
+    "         (penguins[\"flipper_length_mm\"] < 200) & \\\n",
+    "         (penguins[\"sex\"] == \"female\")]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 6. If we only wanted to select the `species`, `flipper_length_mm`, and `sex` columns from the above exercise, how would we need to modify the code?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# To filter rows AND select specific columns, we need to use the .loc function\n",
+    "penguins.loc[(penguins[\"species\"].isin([\"Gentoo\", \"Chinstrap\"])) & \\\n",
+    "             (penguins[\"flipper_length_mm\"] < 200) & \\\n",
+    "             (penguins[\"sex\"] == \"female\"), [\"species\", \"flipper_length_mm\", \"sex\"]]"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3.9.5 - rstudio",
+   "language": "python",
+   "name": "rstudio-user-3.9.5"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}