This repository contains a Python script for financial anomaly detection using the Isolation Forest algorithm. The script includes data loading, preprocessing, model training, and evaluation steps.
The financial anomaly dataset is stored in 'financial_anomaly_data.csv'. It contains information about transactions, including timestamps, transaction amounts, and types.
Make sure you have the following libraries installed:
- pandas
- scikit-learn
- matplotlib
- seaborn
You can install them using: bash pip install pandas scikit-learn matplotlib seaborn
-
Load the Dataset:
- Load the financial anomaly dataset using Pandas.
-
Check and Impute Missing Values:
- Check for missing values in the dataset.
- Impute missing values using the mean strategy.
-
Normalize Data:
- Standardize data by normalizing if needed.
-
Train Isolation Forest Model:
- Use the Isolation Forest algorithm to detect potential fraudulent transactions.
-
Identify Potential Fraudulent Transactions:
- Predict and identify potential fraudulent transactions.
- Example feature engineering: Extract the hour from the timestamp.
-
Visualize Anomaly Scores:
- Create scatter plots, histograms, box plots, and violin plots to visualize anomaly scores and potential fraudulent transactions.
-
Evaluate Model Performance:
- Split the data into training and testing sets.
- Train the model on the training set and evaluate its performance on the test set.
-
Pair Plot of Relevant Variables:
- Create a pair plot to visualize the relationships between variables, including anomaly scores.
Clone the repository and run the Python script. Make sure to have the required libraries installed.
Feel free to customize the script or extend it based on your specific requirements.