Now that you have a high-level overview of PCA, as well as some of the details of the algorithm itself, it's time to practice implementing PCA on your own using the NumPy package.
You will be able to:
- Implement PCA from scratch using NumPy
- Import the data stored in the file
'foodusa.csv'
(setindex_col=0
) - Print the first five rows of the DataFrame
import pandas as pd
data = None
Next, normalize your data by subtracting the mean from each of the columns.
data = None
data.head()
The next step is to calculate the covariance matrix for your normalized data.
cov_mat = None
cov_mat
Next, calculate the eigenvectors and eigenvalues for your covariance matrix.
import numpy as np
eig_values, eig_vectors = None
Great! Now that you have the eigenvectors and their associated eigenvalues, sort the eigenvectors based on their eigenvalues to determine primary components!
# Get the index values of the sorted eigenvalues
e_indices = None
# Sort
eigenvectors_sorted = None
eigenvectors_sorted
Finally, reproject the dataset using your eigenvectors. Reproject this dataset down to 2 dimensions.
Well done! You've now coded PCA on your own using NumPy! With that, it's time to look at further applications of PCA.