THE MNIST DATABASE of handwritten digits

Dataset

csv
- The MNIST Dataset of handwritten digits
- MNIST in CSV

sklearn (load from OpenML - mnist_784)

X, y = fetch_openml('mnist_784', version=1, return_X_y=True)

fetch_mldata (Deprecated since version 0.20: Will be removed in version 0.22)

from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')

Data Set Information

The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.

Abstract

-	-
Data Set Characteristics	Image
Attribute Characteristics	Integer (0~255)
Number of Attributes	784 (28x28 pixel)
Number of Instances	42000 (original 60000)
Associated Tasks	Classification

Source

Yann LeCun, Courant Institute, NYU
Corinna Cortes, Google Labs, New York
Christopher J.C. Burges, Microsoft Research, Redmond

Result

Measure the accuracy of the test subset (30% of instances)

I have normalized all the data to [-1, 1].

If data (pixel value) > 100 (threshold) then I'll give 1. Otherwise, -1.

Binary Classifier (is 0 or not quesiton)

I also set over the label greater than 1 to be 1. And 0 to be -1. (The MUST step)

Use the last 50 row as training data

Model	Kernel	Accuracy	Parameters
Binary SVM From Scratch	Linear	1.0	C = 1, tol = 0.001

Use the last 5000 row as training data

Model	Kernel	Accuracy	Parameters
Binary SVM From Scratch	Linear	0.9713	C = 1, tol = 0.001
Binary SVM From Scratch	RBF	0.8960	C = 5, gamma = 0.05, tol = 0.001

Ps. The cost of calculating RBF kernel of all training sample is too high to take. I haven't realize why sklearn can calculate so fast.

Multi-class Classifier

Use the last 500 row as training data Use the last 5500~500 row as testing data

Model	Kernel	Accuracy	Parameters
OVR SVM From Scratch	Linear	0.7886	C = 1, tol = 0.001
OVR SVM From Scratch	Linear	0.7266	C = 100, tol = 0.001
OVR SVM From Scratch	RBF	0.7888	C = 1, gamma = 1.3, tol = 0.0001

Example

SVM MNIST handwritten digit classification
- Github

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MNIST.md

MNIST.md

THE MNIST DATABASE of handwritten digits

Dataset

Data Set Information

Abstract

Source

Result

Binary Classifier (is 0 or not quesiton)

Multi-class Classifier

Example

Files

MNIST.md

Latest commit

History

MNIST.md

File metadata and controls

THE MNIST DATABASE of handwritten digits

Dataset

Data Set Information

Abstract

Source

Result

Binary Classifier (is 0 or not quesiton)

Multi-class Classifier

Example