Date: 04-04-2022
Student: Xuan Gao & Kunlei Yu
It is a common task to analyse the price of commodity based on a pool of potential factors. Avocado is a new and popular fruit nowadays. Yet the dynamics of its price are hidden in the data. In this project, we aim to analyse the pattern of the price data. You should try different regression methods, tune necessary parameters, and compare their performance. As a suggestion, you can try linear regression, (empirical) Bayesian linear regression, SVM regression, RVM regression, neural networks, and any other methods you prefer. In the link below, there are more description and potential notebooks which you can refer to formulate your solution.
- Problem statement
- Data pre-processing
- Exploratory data analysis
- Split data into training data and testing data (e.g., 8:2)
- Select models
- Test models
- Compare models/results
Please follow the steps below.
git clone https://github.com/xuangao6/Avocado-Price.git
There are 3 instances: data avocado.csv, Exploratory Data Analysis.ipynb, and results.ipynb
For exploratory data analysis, open Exploratory Data Analysis.ipynb located at .\Avocado-Price\Exploratory Data Analysis.ipynb.
For general results, open results.ipynb located at .\Avocado-Price\results.ipynb.
(for our daily unsorted codes, please find them at .\Avocado-Price\daily update codes folder)
Run Exploratory Data Analysis.ipynb and results.ipynb, then you will get the same results as shown in our report.
- pandas database: https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html?highlight=pd%20to_datetime
- sklearn library: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#examples-using-sklearn-linear-model-linearregression
- matplotlib library: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.figure.html