Update student.ipynb

Drop null ,duplicatedvalues,column names and replace values.
learn-co-curriculum · Mngambi · Apr 26, 2024 · Apr 26, 2024 · Apr 26, 2024 · Apr 26, 2024
commit 459371ee39d26d043f0944b249ed8bd62189e47b
diff --git a/student.ipynb b/student.ipynb
@@ -712,6 +712,25 @@
     "df.head()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Each row represents information about a specific house, and each column provides different characteristics of the houses for example the house with id 7129300520 goes for the price of 221900,has three bedrooms,one bathroom,a squarefeet of 1180,a squarefeet lot of 5650. \n",
+    "This is the same criteria we use in for the other houses.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This function transform_and_drop_yr_renovated(df)  transforms the 'yr_renovated' column in a DataFrame and then drops the original column we now have a column stating whether the house renovation took place or not replacing the column there which was showing what year the renovation took place.\n",
+    "\n",
+    "This transformation allows you to categorize whether each house has been renovated ('Yes') or not ('No'), based on the presence or absence of a renovation year in the original 'yr_renovated' column.\n",
+    "\n",
+    "The df.head() statement prints the first few rows of the transformed DataFrame to check the result."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -820,12 +839,143 @@
     "print (info_df)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "'waterfront' column has 2376 null values.\n",
+    "\n",
+    "'view' column has 63 null values.\n",
+    "\n",
+    "All other columns have zero null values.\n",
+    "\n",
+    "No Duplicated Rows Found: This line indicates that there are no duplicated rows in DataFrame."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Function for dropping duplicates,nulls and column names and replacing values.\n",
+    "So we will use the Python function 'dropper'. This function is used for cleaning a dataframe by dropping duplicates,null values and separated columns.In the function below we also include replacing the NaN values in our waterfront column with None."
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 10,
    "metadata": {},
-   "outputs": [],
-   "source": []
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "               id        date     price  bedrooms  bathrooms  sqft_living  \\\n",
+      "0      7129300520  10/13/2014  221900.0         3       1.00         1180   \n",
+      "1      6414100192   12/9/2014  538000.0         3       2.25         2570   \n",
+      "2      5631500400   2/25/2015  180000.0         2       1.00          770   \n",
+      "3      2487200875   12/9/2014  604000.0         4       3.00         1960   \n",
+      "4      1954400510   2/18/2015  510000.0         3       2.00         1680   \n",
+      "...           ...         ...       ...       ...        ...          ...   \n",
+      "21592   263000018   5/21/2014  360000.0         3       2.50         1530   \n",
+      "21593  6600060120   2/23/2015  400000.0         4       2.50         2310   \n",
+      "21594  1523300141   6/23/2014  402101.0         2       0.75         1020   \n",
+      "21595   291310100   1/16/2015  400000.0         3       2.50         1600   \n",
+      "21596  1523300157  10/15/2014  325000.0         2       0.75         1020   \n",
+      "\n",
+      "       sqft_lot  floors waterfront  view  ...          grade sqft_above  \\\n",
+      "0          5650     1.0       NONE  NONE  ...      7 Average       1180   \n",
+      "1          7242     2.0         NO  NONE  ...      7 Average       2170   \n",
+      "2         10000     1.0         NO  NONE  ...  6 Low Average        770   \n",
+      "3          5000     1.0         NO  NONE  ...      7 Average       1050   \n",
+      "4          8080     1.0         NO  NONE  ...         8 Good       1680   \n",
+      "...         ...     ...        ...   ...  ...            ...        ...   \n",
+      "21592      1131     3.0         NO  NONE  ...         8 Good       1530   \n",
+      "21593      5813     2.0         NO  NONE  ...         8 Good       2310   \n",
+      "21594      1350     2.0         NO  NONE  ...      7 Average       1020   \n",
+      "21595      2388     2.0       NONE  NONE  ...         8 Good       1600   \n",
+      "21596      1076     2.0         NO  NONE  ...      7 Average       1020   \n",
+      "\n",
+      "       sqft_basement yr_built  zipcode      lat     long  sqft_living15  \\\n",
+      "0                0.0     1955    98178  47.5112 -122.257           1340   \n",
+      "1              400.0     1951    98125  47.7210 -122.319           1690   \n",
+      "2                0.0     1933    98028  47.7379 -122.233           2720   \n",
+      "3              910.0     1965    98136  47.5208 -122.393           1360   \n",
+      "4                0.0     1987    98074  47.6168 -122.045           1800   \n",
+      "...              ...      ...      ...      ...      ...            ...   \n",
+      "21592            0.0     2009    98103  47.6993 -122.346           1530   \n",
+      "21593            0.0     2014    98146  47.5107 -122.362           1830   \n",
+      "21594            0.0     2009    98144  47.5944 -122.299           1020   \n",
+      "21595            0.0     2004    98027  47.5345 -122.069           1410   \n",
+      "21596            0.0     2008    98144  47.5941 -122.299           1020   \n",
+      "\n",
+      "       sqft_lot15  house_renovation  \n",
+      "0            5650                No  \n",
+      "1            7639               Yes  \n",
+      "2            8062                No  \n",
+      "3            5000                No  \n",
+      "4            7503                No  \n",
+      "...           ...               ...  \n",
+      "21592        1509                No  \n",
+      "21593        7200                No  \n",
+      "21594        2007                No  \n",
+      "21595        1287                No  \n",
+      "21596        1357                No  \n",
+      "\n",
+      "[21597 rows x 21 columns]\n",
+      "NO      19075\n",
+      "NONE     2376\n",
+      "YES       146\n",
+      "Name: waterfront, dtype: int64\n"
+     ]
+    }
+   ],
+   "source": [
+    "def dropper(df, one=None, two=None, three=None):\n",
+    "    '''\n",
+    "    Input: DataFrame, request 1,request 2, request 3\n",
+    "    requests:\n",
+    "    'duplicates' to drop duplicates\n",
+    "    'nulls' to drop null values\n",
+    "    list containing df column names l = ['','','']\n",
+    "    '''\n",
+    "    request = [one,two,three]\n",
+    "    if 'duplicates' in request:\n",
+    "        df = df.drop_duplicates()\n",
+    "    if 'nulls' in request:\n",
+    "        df = df.dropna()\n",
+    "    for req in request:\n",
+    "        if isinstance(req, list):\n",
+    "            df = df.drop(columns=req, axis=1).reset_index(drop=True)\n",
+    "    return(df)\n",
+    "data_info = check_dtypes(df)\n",
+    "print(df)\n",
+    "\n",
+    "#Changing values for our column waterfront\n",
+    "# Assuming your DataFrame is named df\n",
+    "df['waterfront'] = df['waterfront'].fillna('NONE')\n",
+    "print(df['waterfront'].value_counts())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "After finding the number of null values in the previous function, we have now dropped our null values using df =df.dropna()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this dataframe, we have changed the output of waterfront column from NaN to None using the fill.na().\n",
+    "the column waterfront has the data type interger.\n",
+    "\n",
+    "'NO': There are 19075 occurrences of 'NO' in the 'waterfront' column. This indicates that these properties do not have a waterfront view.\n",
+    "\n",
+    "'NONE': There are 2376 occurrences of 'NONE' in the 'waterfront' column. This likely indicates that these records originally had missing values (NaN) for the waterfront attribute, and they have been replaced with the string 'NONE'.\n",
+    "\n",
+    "'YES': There are 146 occurrences of 'YES' in the 'waterfront' column. This indicates that these properties have a waterfront view."
+   ]
   }
  ],
  "metadata": {