Skip to content

Latest commit

 

History

History
15 lines (8 loc) · 1.4 KB

README.md

File metadata and controls

15 lines (8 loc) · 1.4 KB

Analysis of a Time-Series Data Set from a Kaggle Competition.

Reference Acea Smart Water Analytics

This is a Machine Learning project that predicts the water levels from aquifers.

The project involved researching different notebooks on data cleaning, preparation, Exploratory Data Analysis (EDA), model selection, and evaluation, based on rankings (votes) from Kaggle. There were four datasets, each for a specific aquifer: Auser, Doganella, Luco, and Pertrignago. From numerous notebooks, I selected five of the best based on Kaggle's votes. Afterward, I evaluated different approaches for EDA, including R and Python code. Notably, the winning team utilized R with visually appealing visualization styles.

Following this, I delved into their approach and adapted the analysis to include EDA tasks such as visualization, data cleaning (handling missing values and outliers), feature engineering (transformation and differencing), and assessing stationarity and seasonality. Subsequently, I developed models for trend analysis and water level forecasting using two methods: averaging outputs from multiple wells and utilizing data directly from these wells.

To enhance model performance, I explored various algorithms, including Prophet, ARIMA, LSTM, Multivariate Prophet, XGBoost, and Random Forest.

Enjoy.

References are included in the corresponding notebooks.