Repository for running jupyter notebooks and keeping relevant files in one place
check how to remove null values from dataframe
pandas .iloc() - locate by row, col indices .loc() - locate by row index and col NAME
df.drop('Column name', axis=1) - where axies = 0 for rows, 1 for columns - drops referenced column from data frame - inplace=True argument to ensure column stays dropped. df.drop(1,axis=0).reset_index() - new col with old indices df.drop(1,axis=0).reset_index(drop=True,inplace=True)
df.copy
df['columnname'].apply(type).value_counts() - this looks at and notes the values by type and then counts them
df['colname'] = df['colname'].replace('missing','inf'],np.nan) - replaces our specified strings 'missing' and 'inf' - with np.nan
df['colname'] = df['colname'].astype(float) - convert values to float
Review note: when we substitute np.nan in for strings the resulting data type is (if all the other entries are say float) float.
df.info() - rerunning this after data cleaning may result in cleaned columns type changing to, say, float.
Check length of each column Columns shorter than max col length means missing values as empty cells
data_new = pd.read_csv('/content/drive/MyDrive/Python Course/Melbourne_Housing.csv',na_values=['missing','inf'])
- on load, above line automatically converts all missing and inf to nan so, running: data_new['BuildingArea'].dtype
- gives dtype('float64') as only float (and nan which seems to be treated as whatever the rest of the data types are)
data['BuildingArea'].unique()
- above line run before cleaning gives unique values in column as a numpy array
- so can inspect to find out which strings to remove.
python3 -m venv .venv - in bash - and on Windows source .venv/bin/activate - in bash source .venv/Scripts/activate - on Windows /workspace/jupyter-6/.venv/bin/python -m pip install --upgrade pip - in GitPod python3 -m pip install --upgrade pip - on Windows
pip install --upgrade pip pip install jupyter notebook pip install matplotlib pip install pandas pip install seaborn pip install numpy pip install scipy pip install statsmodels pip install -U scikit-learn pip install ipykernel pip install nb-black
Ctrl Shift P Create New Jupyter Notebook Save and name notebook Paste in necessary code
Ctrl Shift P Python: Select Interpreter use Python version in ./.venv/bin/python
pip freeze > requirements.txt
pip install -r requirements.txt
auto-mpg.csv
Extension: Excel Viewer - for viewing csv files in VSCode
per above Python:Select Interpreter 3.10.9 (.venv)
after running pip install ipykernel on running LinearRegression_HandsOn-1.ipynb message appears saying: it is necessary to install ipykernel OK installing ipykernel Rerun LinearRegression_HandsOn-1.ipynb
after running pip install pandas pandas not found
create new jupyter notebook using Ctrl Shift P Create New Jupyter Notebook
jupyter-test jupyter-repo-2 jupyter-3
- LMS - Hands_on_Notebook_Week3.ipynb
- LMS - ENews_Express_Learner_Notebook%5BLow_Code_Version%5D.ipynb
- LMS - abtest.csv
- 2.13 Pandas - Accessing and Modifying DataFrames (condition-based indexing)
- Google Colab mount drive
Windows Anaconda conda create --name .cenv y conda activate .cenv
python3
not installed so Windows store opens install Python 3.10
python3 -m venv .venv
command was slow at first but self-resolved
- search string: stuck on $ python3 -m venv .venv setting up environment in virtaulenv using python3 stuck on ...
- search string: installing collected packages stuck why is the pip install process stuck on ''Installing collected packages" step?
- search string: sklearn
- scikit-learn | Machine Learning in Python
- Getting Started -- skikit-learn
- Citing scikit-learn
- User Guide
- Installing scikit-learn
- Scikit-learn: Machine Learning in Python Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
- redirects to https://scikit-learn.org/stable/ (link 2 in this section, above) Source code, binaries, and documentation
- search string: ipykernel
- pip install ipykernel ipykernel 6.19.2
- used for first attempt at naming arbitrary number of variables
- second attempt at naming arbitrary number of variables
- Remove name, dtype from pandas output of dataframe or series
- 2ndary source for turning off index on pandas dataframe print out