This project aims to develop a predictive model for forecasting stock prices in the Tunisian stock market using historical data and machine learning techniques.
The project is organized as follows:
notebooks/
: Contains Jupyter notebooks for data analysis and preprocessing.check_data.ipynb
: Notebook for initial data checking.Data_Preprocessing.ipynb
: Notebook for data cleaning and preprocessing.data/
: Directory containing various stages of stock market data.weekly_stock_market.csv
: Raw weekly stock market data.checked_weekly_stock_market.csv
: Data after initial checks.cleaned_weekly_stock_market.csv
: Data after cleaning.normalized_weekly_stock_market.csv
: Data after normalization.
stock_scraper/
: Contains the web scraping scripts to collect stock market data.companies_data/
: JSON files with data for individual companies.companies.json
: List of companies to scrape.import_test.py
: Script for testing data import functionality.scrapy.cfg
: Configuration file for Scrapy.
README.md
: This file, containing project documentation.requirements.txt
: List of Python libraries required for the project.
To ensure you have all the necessary dependencies for the Tunisia Stock Market Prediction project, you can use the requirements.txt
file provided in the repository. This file includes all the required libraries and frameworks for data analysis, machine learning, deep learning, web scraping, and web development.
-
Clone the Repository:
First, clone the repository to your local machine:
git clone https://github.com/yourusername/tunisia-stock-market-prediction.git cd tunisia-stock-market-prediction
-
Create a Virtual Environment:
Next, create a new virtual environment using Python 3. You can create a new virtual environment using
venv
:python3 -m venv env source env/bin/activate
-
Install the Required Libraries:
You can install the required libraries using the following command:
pip install -r requirements.txt
The data for the Tunisian stock market is visualized on the website through canvas graphs, which do not allow for direct scraping of the data from the webpage's HTML. To overcome this challenge, we adopted a more technical approach by inspecting the network traffic to identify the server requests that fetch the stock data.
-
Inspect Network Traffic:
- Open the website where the stock market data is displayed.
- Use the browser's Developer Tools (usually accessible by pressing
F12
or right-clicking and selecting "Inspect") to monitor the network traffic. - Navigate to the "Network" tab and filter by XHR (XMLHttpRequest) to observe the API calls made by the webpage.
-
Identify Data Requests:
- Look for requests that fetch the stock data. These requests are often made to an API endpoint and return data in JSON format.
- Analyze the request headers, method (GET or POST), and any query parameters or payloads used to retrieve the data.
-
Craft Custom Requests with Scrapy:
- Using Scrapy, a popular web scraping framework in Python, create a spider to mimic the identified requests that fetch the stock data.
- Ensure to include any required headers, cookies, or parameters identified in the previous step. This can involve setting custom headers or cookies in your Scrapy spider to ensure the server accepts and processes your request as if it were coming from a legitimate user.
-
Parse and Save the Data:
- Once the data is retrieved, parse the JSON response to extract the necessary information.
- Save the parsed data into a structured format like CSV or a database for further analysis or processing. This step is crucial for transforming the raw data into a usable format for data analysis, machine learning models, or any other intended use case.
Feature engineering is crucial in preparing raw data for machine learning models by creating meaningful input features. For the analysis of weekly Tunisian stock market data, the following steps were implemented:
- Converted the date column to datetime format to facilitate extraction of temporal features.
- Extracted features such as year, month, day_of_month, and week_of_year to capture seasonal and time-related patterns.
- Calculated
price_range
as the difference betweenhighestPrice
andlowestPrice
to capture weekly price volatility. - Derived
price_change
as the difference betweenclosingPrice
andopeningPrice
to gauge weekly price movement. - Computed
weekly_return
as the percentage change fromopeningPrice
toclosingPrice
, normalized to account for weekly fluctuations.
- Applied a logarithmic transformation (
log_volume
) to the volume column to normalize the distribution and reduce skewness, making the volume data more suitable for modeling.
- Calculated
moving_avg_4
as the 4-week rolling average ofclosingPrice
to smooth out short-term fluctuations and identify long-term trends. - Utilized exponential moving average (
ema_4
) to give more weight to recent prices while computing the average, reflecting recent market sentiment. - Determined
volatility_4
as the 4-week rolling standard deviation ofclosingPrice
to quantify the weekly price fluctuation or risk.
- Handled missing values in rolling statistics by filling NaN values appropriately, ensuring continuity in feature calculations.
These engineered features are designed to enhance the predictive capability of machine learning models by providing meaningful insights into the dynamics of weekly stock market behavior. The processed dataset, containing these engineered features alongside the target variable (closingPrice
), is then used for training and evaluating predictive models in subsequent steps.
By performing robust feature engineering, the aim is to improve model accuracy and effectiveness in forecasting stock prices based on historical data patterns.