The SOCOFing (Sokoto Coventry Fingerprint Dataset) is selected for this project due to its valuable features:
- Dataset Size: 6,000 images from 600 individuals, enabling models with up to 600 parameters.
- Class Diversity: Includes five classes—whorl, left loop, right loop, tented arch, and arch—supporting comprehensive classification.
- Image Quality: Consistent image resolution of 96x103 pixels at 500dpi in BMP format.
- Synthetic Variations: Includes modified versions to improve model robustness.
- Accessibility: Publicly available for research.
- Feature Usage: Patterns of ridges and troughs are extracted as input features.
- Convert to grayscale.
- Resize images to 96x96 pixels.
- Noise reduction and normalization.
- Total Images: 6,000
- Subjects: 600
- Classes: 5
- Image Properties: BMP format, 96x103 pixels, 500dpi.
-
Class Distribution: Slight imbalance with loops and whorls more common.
import matplotlib.pyplot as plt classes = ['Arch', 'Tented Arch', 'Left Loop', 'Right Loop', 'Whorl'] counts = [800, 400, 1600, 1600, 1600] plt.bar(classes, counts) plt.title('Distribution of Fingerprint Classes') plt.xlabel('Class') plt.ylabel('Count') plt.show()
-
Pixel Intensity Distribution: Histogram shows a bimodal distribution, revealing clear contrast in fingerprint patterns.
-
Image Quality Assessment: PSNR values confirm high-quality images.
-
Feature Visualization: t-SNE visualization demonstrates separability in feature space.
- Consider class weighting or oversampling for class imbalance.
- Edge detection may enhance feature extraction.
- Minimal denoising required due to high image quality.
- t-SNE indicates separability, suggesting simpler classifiers might be effective.
- Loading and Initial Preprocessing: Images were converted to grayscale and labels extracted.
- Reshaping and Scaling: Images were flattened and standardized.
- Dimensionality Reduction: PCA retained 95% of variance.
- Feature Selection: Top 100 features selected via mutual information.
- Label Encoding: Class labels were numerically encoded.
- Train-Test Split: Data split into 80% training and 20% testing.
Algorithms chosen for evaluation:
- CNNs: High performance in image classification.
- Random Forest: Robust with feature relevance insights.
- SVM: Effective with high-dimensional data.
- KNN: Simple but capable of capturing local patterns.
Using scikit-learn
, each algorithm was implemented and evaluated on F1 scores:
import numpy as np
import pandas as pd
from sklearn.model_selection import cross_val_score
from sklearn.metrics import classification_report, f1_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
algorithms = {
"Random Forest": RandomForestClassifier(random_state=42),
"Support Vector Machine": SVC(random_state=42),
"K-Nearest Neighbors": KNeighborsClassifier()
}
for name, algorithm in algorithms.items():
algorithm.fit(X_train, y_train)
y_pred = algorithm.predict(X_test)
print(f"\n{name} Classification Report:")
print(classification_report(y_test, y_pred))
f1 = f1_score(y_test, y_pred, average='weighted')
print(f"F1 Score: {f1:.4f}")
The best model was identified based on cross-validation scores and retrained on the full training set, achieving high accuracy and reliability in fingerprint classification.