S&P 500 Stock Movement Prediction

This project centers on predicting future price movements for S&P 500 stocks—whether prices are likely to move up or down the following day (or the following month). It integrates data collection, feature engineering (like Bollinger Bands, RSI, and moving averages), and training multiple machine learning models to help guide decision‐making in the stock market.

Overview

Our project addresses the challenge of forecasting stock price direction for companies in the S&P 500 index. By blending historical stock data (spanning from 2007 onward) with various technical indicators (like moving averages, volatility, etc.), we aimed to produce “buy” or “sell” signals—helping us decide if a particular stock might move Up (+1) or Down (−1) on the following trading day.

This project applies machine learning techniques to stock market data to analyze and predict trends. It leverages several models, including Logistic Regression, Random Forest, and XGBoost, for classification tasks. Our goal is to evaluate model performance on stock data and compare results across these algorithms.

Models Results Preview

Accuracy of the S&P 500 Forecast (1-Month Horizon)

[ Karaoke Version ] I Still Remember - Blackmore's Night

Model Evaluations

Cross-Validation Summaries

Data Collection

Scope of Data: We downloaded daily “adjusted close” prices for all S&P 500 stocks starting from January 2007 to present.
Coverage: Approximately 500 tickers were used; we also gathered the current day’s closing price for real-time predictions.
Handling Missing or Delisted Stocks: We tracked errors in case any ticker was delisted or had incomplete data, then either excluded or treated them as needed.

Data Preparation & Feature Engineering

Cleaning & Merging

Removing Outliers: We dropped rows where stock prices were zero or obviously incorrect.
Dealing with Missing Values: We filled small data gaps in certain technical indicators by carrying forward or backward the last known value.
Filtering Early Dates: We avoided the earliest months to ensure our calculations for rolling indicators (like moving averages) had a proper historical window.

Technical Indicators & Signals

We enriched the data with widely used technical indicators, including:

Moving Averages (50, 100, 200 days)
Relative Strength Index (RSI)
Volatility Measures (daily standard deviations)
Bollinger Bands (upper and lower price boundaries)
Support & Resistance (based on recent price minima and maxima)

Finally, we defined target labels indicating “up” or “down” for each stock on the next trading day.

Model Selection & Training

We tested three machine learning models:

XGBoost Classifier
Random Forest Classifier
Logistic Regression

Each model was trained on historical data (with all the engineered features) and then evaluated on a test set to gauge how well it could predict unseen outcomes.

Why These Models?

XGBoost: Known for high performance in structured data scenarios.
Random Forest: A robust, easy‐to‐interpret ensemble method.
Logistic Regression: A simpler, baseline model that can be useful for interpretability.

Evaluation & Results

We measured:

Accuracy: The percentage of correct “up vs. down” predictions.
Precision & Recall: How well the model correctly identifies each class (up or down).
F1 Score: The harmonic mean of precision and recall, offering a balanced measure.
Confusion Matrix: The exact count of correct and incorrect predictions for each category (up vs. down).

Additionally, we performed cross validation (splitting data into multiple subsets for repeated training/testing) to ensure the results were not overly reliant on any one particular time period.

Key Findings

XGBoost Performed Best
- We obtained an accuracy of around 63% on the test set—meaning the model correctly predicted up or down roughly 63% of the time.
Random Forest & Logistic Regression
- Both also performed respectably but trailed slightly behind XGBoost in terms of overall accuracy and consistency.
Importance of Feature Engineering
- Indicators like RSI and Volatility were particularly beneficial in improving model accuracy.
Hyperparameter Tuning
- Fine‐tuning XGBoost (e.g., adjusting the maximum depth, number of trees, and learning rate) led to measurable performance gains.

Future Steps

Longer‐term Predictions
- Instead of just next‐day moves, we might explore weekly or monthly returns for a broader trading strategy.
Additional Data
- Incorporate macroeconomic factors (interest rates, GDP data) or investor sentiment from social media/news.
Probabilistic Predictions
- Instead of a strict “up/down” call, provide the probability of an upward or downward move to aid risk management.
Ensemble Stacking
- Combine the three different models into a “meta‐model” that can potentially outperform any single one.

Disclaimers

Educational Purpose Only: This project does not constitute financial advice.
Past Performance ≠ Future Results: Though our model may show promising results in certain windows, real‐world market behavior can differ significantly.
Data Limitations: Some stocks in the S&P 500 may have incomplete histories or confounding corporate actions (like splits or mergers) that are not fully accounted for.

Contributors

Christian Palacios (@rune-encoder)
Corey Holton (@corey-holton)
Edwin Lovera (@ed-lovera)
Vickram Dass (@DassV24)
Montre Davis (@tredavis)

Data Sources

This project utilizes financial data from Yahoo Finance, accessed via the yfinance library. We acknowledge Yahoo Finance as the primary source of our historical stock data.

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
Resources		Resources
data		data
models		models
notebooks		notebooks
reports		reports
scripts		scripts
utilities		utilities
.gitignore		.gitignore
Presentation.pptx		Presentation.pptx
README.md		README.md
main.ipynb		main.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

S&P 500 Stock Movement Prediction

Overview

Models Results Preview

Table of Contents

Data Collection

Data Preparation & Feature Engineering

Cleaning & Merging

Technical Indicators & Signals

Model Selection & Training

Why These Models?

Evaluation & Results

Key Findings

Future Steps

Disclaimers

Contributors

Data Sources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

S&P 500 Stock Movement Prediction

Overview

Models Results Preview

Table of Contents

Data Collection

Data Preparation & Feature Engineering

Cleaning & Merging

Technical Indicators & Signals

Model Selection & Training

Why These Models?

Evaluation & Results

Key Findings

Future Steps

Disclaimers

Contributors

Data Sources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages