This project is Implemented in python3. Based on the kaggle competition introduced by UNIVERSITY OF SAPIENZA "Predicting GPA from SAT score" . The dataset which is used in this competition was based on the dataset provided by :- "StatCrunch website" url to the dataset :- "https://www.statcrunch.com/app/index.php?dataid=1583665"
1:- OVERVIEW
2:- REQUIREMENTS
3:- INSTALLATION
4:- DATASET SANPSHOT
5:- DATA PLOTTING
6:- TRAINING GRAPH
7:- PREDICTING SCRIPT
The SAT is a test widely used for college admission in USA. People think that students that achieve high score in SAT, have also the highest GPA during college. In this competition we want to predict the average SAT cumulated by a student , based only on the GPA. You are given a training dataset, in which each row contain the student ID, SAT score and average GPA cumulated during college. Use this dataset for training your model and then test your model using test dataset. Each row of test dataset contain student ID and SAT score.
1:- SKLEARN
2:- NUMPY
3:- PANDAS
4:- MATPLOTLIB
5:- PICKLE
6:- MLXTENDS
Suggested to use Python3 pip version i.e pip3 to install packages.
if you do not have pip3 installed in your system .
Use this command:
sudo apt-get install python3-pip --upgrade # for pip3
sudo apt-get install python-pip --upgrade #for pip
eg:-
pip3 install scikit-learn #for sklearn
Fitting Regression in Testing Set
Fitting Regression in Training Set
Fitting Descision Tree in Testing set
Fitting Descision Tree in Training Set
Note : The point point which lies outside the clusture is called as outlier.these points are to be neglected.
Now we completed our assesment, one thing i needed to discuss about our algorithm is about the challanges.What are the challenges that one faces during the implementation.To understand that we need to go a little bit deeper into the algorithm and its behaviour.
please click here to got to that file.