GitHub

README: PCA and KPCA Implementation and Analysis Hello! In this project, I've explored dimensionality reduction techniques using Principal Component Analysis (PCA) and Kernel Principal Component Analysis (KPCA) on two datasets: TrainData and TestData. Below, I've detailed the steps I undertook and my findings.

Datasets: I utilized two datasets for this project: TrainData and TestData.

Part 1: Principal Component Analysis (PCA) 1.1 PCA Implementation from Scratch: a. I've written a custom Python code to implement PCA without relying on third-party libraries.

b. My PCA implementation was applied to the TrainData dataset to reduce its dimensionality.

c. To retain a significant amount of variance in the dataset, I carefully selected an appropriate number of principal components.

1.2 PCA using scikit-learn: a. I imported the PCA module from the popular machine learning library, sklearn.

b. Using this module, I then applied PCA to the TrainData dataset.

Part 2: Kernel PCA (KPCA) 2.1 KPCA with RBF Kernel: a. I implemented the Kernel PCA using the Radial Basis Function (RBF) kernel from scratch.

b. Subsequently, this KPCA implementation was applied to the TrainData dataset.

2.2 KPCA with Polynomial Kernel: a. I delved further into KPCA by creating a version that uses the Polynomial kernel, again crafted from scratch.

b. This version was also applied to the TrainData dataset.

2.3 KPCA with Linear Kernel: a. Lastly, I implemented KPCA with a Linear kernel from scratch.

b. And, like the previous implementations, I applied it to the TrainData dataset.

Part 3: Testing and Evaluation 3.1 Applying PCA and KPCA to the Test Dataset: a. I used the trained PCA and KPCA models (both RBF and Polynomial) from the training phase to transform the TestDataset.

b. To ensure consistency, I made certain that the dimensionality reduction performed on the test data matched the training data.

3.2 Classification Experiment: a. For classification purposes, I utilized a minimum distance classifier on the TestDataset.

b. I assessed the classification performance using accuracy metrics to determine the effectiveness of my models.

Conclusion: I summarized my results, highlighting the potency of PCA and KPCA in reducing dimensionality. Moreover, I reflected on their impact on classification performance. I also discussed the pros and cons of creating these algorithms from scratch in comparison to employing scikit-learn's pre-built functions.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
Data		Data
__MACOSX/Data		__MACOSX/Data
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

Abrahami2/Data_HamitouPCAKPCA

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages