This repository contains the data and code for building and evaluating linear regression models to analyze the relationship between the returns of a growth-focused ETF and key financial indicators.
The primary goal of this project is to quantify and analyze the relationship between the daily returns of the iShares S&P 500 Value ETF (IVW) and three key market predictors: the S&P 500 (SPY), the Volatility Index (VIX), and Gold prices.
How sensitive are IVW returns to movements in the broader equity market (SPY returns)?
Does market volatility (VIX returns) play a significant role in influencing IVW performance?
Do alternative assets (Gold returns) act as a potential hedge for IVW?
What is the overall predictive power of the chosen variables in explaining variations in IVW returns?
Data Source: The dataset, consisting of daily returns, was sourced directly from Yahoo Finance.
Variable Role Nature IVW Return Response Variable (Y) Numerical (continuous) SPY Return Predictor Variable (X 1 ) Numerical (continuous) VIX Return Predictor Variable (X 2 ) Numerical (continuous) Gold Return Predictor Variable (X 3 ) Numerical (continuous)
Export to Sheets Data Preprocessing Minimal preprocessing was required. Key steps included:
Handling Missing Values: No missing values were present.
Variable Type: All variables are numerical (daily returns).
Scaling: No explicit scaling was performed as all variables are already in comparable daily return percentages.
Transformation: Daily returns were used instead of prices to help stabilize variance and reduce strong trends, partially mitigating time series dependence issues.
Model Types Two linear regression models were fitted to the training data (80% of the dataset):
IVW Return=
IVW Return=
The reliability of the models was confirmed by checking the following assumptions on the residuals:
Linearity and Homoscedasticity: Checked via Residuals vs. Fitted Plot.
Normality of Residuals: Confirmed to be approximately bell-shaped and centered around zero (Figure 8 in the project document).
Independence of Errors: Confirmed by a Durbin-Watson statistic of 1.8182 (p-value 0.1116), indicating no significant autocorrelation.
Multicollinearity: Assessed using the Variance Inflation Factor (VIF). All VIF values were well below the common threshold of 5 (e.g., SPY VIF = 2.18, VIX VIF = 2.15), confirming low multicollinearity.
Multiple Linear Regression Equation The final fitted equation for the multiple regression model is:
ivw_return=−0.0004681+1.0823(
Coefficient Interpretation
Predictor Coefficient (
VIX Return 0.01376 Statistically Significant (p=0.028) Marginal Impact. Increased market volatility has a small positive impact on IVW returns, but the effect size is small relative to SPY.
Gold Return 0.01095 Insignificant (p=0.616) No Meaningful Relationship. Gold price movements do not meaningfully predict IVW performance.
Export to Sheets Model Performance The model demonstrated strong predictive power and generalization to unseen data.
Metric Training Data (Multiple Regression) Test Data (Multiple Regression)
0.9272
0.9148
Adjusted
0.9257
N/A Test RMSE N/A 0.01225
Export to Sheets
The high
The analysis strongly concludes that SPY returns are the overwhelming primary driver of IVW performance. While VIX adds minor, statistically significant explanatory power, Gold returns are insignificant. Linear regression proved to be a simple, yet highly effective, method for capturing the short-term co-movements between this growth-focused ETF and major market indicators.