Stock market prediction using PySpark📈

Stock market prediction is the act of trying to determine the future value of a company stock or other financial instrument traded on an exchange. The successful prediction of a stock's future price could yield significant profit.

Using machine learning techniques I will try to

predict the stock value of more than 700 companies over a period of 20 years
predict the stock value of more than 1200 companies over a period of 5 years, also using financial indicators

After having trained the models (linear regression, random forest regression), I will test them during the COVID crisis period, a major financial crisis in which even the best stock lost value, and see how they perform in comparison to a normal period. Finally I will try to train a neural network, and see how it compares with more linear methods.

This notebook is divided in the following way:

Configuring the environment: in this paragraph dependencies and pyspark are installed, also the configuration of how the rest of the notebook will be run is set, so make sure to check it in order to personalise your experience using this notebook.
Building the datasets: here the datasets that will be later used for the training of the model are created
Exploring the datasets: using Colab's tools you can visualize the datasets, and see useful insight
Features engineering: we can't feed the model the dataset as it is. We must create features that are useful for our prediction task.
Learning pipeline: in this step the actual training is done, several models are available, so that we can see which performs better
Hyperparameters tuning: using the appropriate hyperparameters can significally improve the performance of our models, so we try many of them and choose the best one
Testing the models: we choose a period and test how the model would have predicted that period
Plotting results: here we visualize the results of the predictions
Conclusion: the project is over, what have we learned?

In order to correctly execute the notebook, follow each step sequentially, because each step depends on the previous one.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
Symbols		Symbols
Big_Data_project.ipynb		Big_Data_project.ipynb
Dataset_intraday_compressed.zip		Dataset_intraday_compressed.zip
Dataset_medium_compressed.zip		Dataset_medium_compressed.zip
Dataset_simple_compressed.zip		Dataset_simple_compressed.zip
README.md		README.md
Stock_market_prediction_presentation.pdf		Stock_market_prediction_presentation.pdf
test_results.zip		test_results.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stock market prediction using PySpark📈

About

Releases

Packages

Languages

itsbenigno/stock_market_predictions

Folders and files

Latest commit

History

Repository files navigation

Stock market prediction using PySpark📈

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages