This is a simple implementation of linear regression using a gradient descent algorithm. The goal is to find the best fitting line for a multi-dimensional dataset.
The algorithm will return the best fitting line for any multidimensional dataset that supports linear regression.
- Fallback to using the normal equation method for datasets with less than 10,000 features
- Automatic feature scaling
- Customizable parameters (learning rate, iterations)
- Multidimensional data support
Data files must be in CSV format, with the following structure:
x1,x2,y
200,300,32
203,231,42
231,232,13
Where x1,x2,y constitutes the header of the data. This project supports multidimensional data, feel free to use any number of features.
This project relies on the Python 3 package NumPy. To install the requirements use Python pip:
$ pip install -r requirements.txtI recommend that you use a virtual environment when installing your dependencies.
This project can be used from either the terminal, or as an imported module.
$ python linreg.py mydataset.csv
Found the following parameters that best fits the data:
intercept = 2.0, size = 6499.998156236331The following arguments are available:
-h,--help: Display help on usage-a,--alpha: Set the learning rate manually (default is 0.01)-i,--iterations: Set the number of iterations manually (default is 1500)-f,--force: Force gradient descent and skip the normal equation method-ns,--noscaling: Turn off feature scaling (there is no feature scaling when using the normal equation method)
import linreg
import numpy as np
features = np.asmatrix(np.random.rand(3, 3))
values = np.random.rand(3, 1)
# Feature scaling
scales = linreg.scalefeatures(features)
# Gradient descent
iterations = 1500
alpha = 0.01
print(linreg.gradientdescent(features, values, iterations, alpha))
# Cost
parameters = np.random.rand(3, 1)
print(linreg.cost(features, values, parameters))Python 3+ is required, as well as the NumPy package.
Code copyright 2018 Søren Qvist Christensen. Code released under the MIT license.
