Automated Insurance Claim Fraud Detection System

Project Overview

This project aims to detect fraudulent insurance claims using machine learning techniques. The system enhances the security and efficiency of the insurance claims process by identifying potentially fraudulent claims.

Key Features

Data Ingestion and Preprocessing

Local Dataset: Utilizes a simulated dataset of insurance claims stored in JSON files within the Simulation/Data directory. The dataset includes features such as claim amount, claimant history, claim type, and timestamps.
Data Cleaning and Preparation: Employs Python and Pandas to clean and preprocess the data, handle missing values, normalize features, and create new features like claim frequency.

Machine Learning Integration

Model Training: Trains a neural network model using the simulated claims dataset to predict the likelihood of a claim being fraudulent. The model, named EnhancedFraudDetectionModel, is a multi-layer feedforward neural network with 5 layers. The training process involves the following steps:
1. Data Loading and Preprocessing: The training data is loaded from JSON files, and features are extracted from each claim. The dataset is then balanced using SMOTE (Synthetic Minority Over-sampling Technique) to handle class imbalance.
2. Model Initialization: The EnhancedFraudDetectionModel is initialized, which includes defining the architecture with layers, batch normalization, and dropout for regularization.
3. Loss Function and Optimizer: The model uses Cross-Entropy Loss as the loss function and the Adam optimizer for training.
4. Training Loop: The model is trained for 100 epochs. In each epoch, the optimizer resets gradients, performs a forward pass to compute predictions, calculates the loss, performs a backward pass to compute gradients, and updates the model parameters.
5. Model Saving: After training, the model's state dictionary is saved to a specified path for later use.
SHAP Analysis: Uses SHAP (SHapley Additive exPlanations) to interpret the model's predictions and understand the impact of each feature on the fraud likelihood score.

Small Sample SHAP Summary Plot

Varied-Distribution Sample SHAP Summary Plot

System Architecture

File-Based Design: The system uses a file-based approach for data processing and model predictions, ensuring simplicity and ease of use.
Python Scripts: All functionalities, including data ingestion, rule-based analysis, and machine learning predictions, are implemented using Python scripts.
CLI Interface: Exposes the fraud detection capabilities via a Command Line Interface (CLI), allowing users to run scripts for data processing and model training.

Frontend Interface

Dashboard: Develops a web dashboard using .Net MVC (C#) and HTML/JavaScript to display the results of the fraud detection process, including visualizations of flagged claims and their associated fraud scores.

Menu Overview

Example Visualization

Key Technologies

Languages: Python, C#
Frameworks: ASP.NET MVC for the backend and frontend, Torch and Shap for machine learning
Data Management: Pandas for data processing, JSON for local data storage
Web Development: HTML5, CSS, JavaScript

Outcome

The Automated Insurance Claim Fraud Detection System offers a unique, valuable solution that extends beyond typical platform features. It not only showcases proficiency in Python and machine learning but also demonstrates the ability to solve complex problems and enhance the security of the insurance claims process.

User Guide

Getting Started

Clone the Repository:

git clone https://github.com/tylermaginnis/AutomatedInsuranceClaimFraudDetection.git
cd AutomatedInsuranceClaimFraudDetection

Set Up the Environment: Ensure you have Python installed. Create a virtual environment and install the required dependencies:
```
python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`
pip install -r requirements.txt
```

Prepare the Data: Generate the simulated dataset:

python Simulation/Generator.py -n 1000 -p 100
python Simulation/Generator.py -a -n 1000 -p 100

Run the Web Dashboard: Navigate to the MLDashboard directory and run the Flask application:
```
cd MLDashboard
dotnet run
```

Using the System

Access the Dashboard: Open your web browser and go to http://127.0.0.1:5000. You will see the main dashboard displaying key metrics and visualizations.
View Detailed Visualizations: Click on the different sections of the visualizations menu to explore various charts and graphs, such as claims by coverage type, claims over time, and fraud risk analysis.
Review Fraud Scores: In the "Claims Fraud Risk" section, review the list of claims along with their fraud likelihood scores. Click on "View Details" to see more information about a specific claim.

Customizing Rules

Update the Machine Learning Model: If you want to retrain the machine learning model with new data or different parameters, modify the Generator.py or MLTool.py script and run it to update the model.
Normalize the Data: Run the Loader.py script to clean and preprocess the data. This will handle missing values, normalize numeric features, and create additional fields inferred by machine learning:
```
python Loader/Loader.py -d MLTool/Insights
```
By default, Loader.py will take the data from Simulation/Data and save the cleaned data to Loader/Data/normalized.json. By passing -d, you can instruct the script to take data from a different directory.

Additional Resources

Support: If you encounter any issues or have questions, please open an issue on the GitHub repository or contact the project maintainers.

By following this user guide, you will be able to set up, use, and customize the Automated Insurance Claim Fraud Detection System effectively.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Loader		Loader
MLDashboard		MLDashboard
MLTool		MLTool
Simulation		Simulation
images		images
AutomatedInsuranceFraudDetectionSystem.generated.sln		AutomatedInsuranceFraudDetectionSystem.generated.sln
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated Insurance Claim Fraud Detection System

Project Overview

Key Features

Data Ingestion and Preprocessing

Machine Learning Integration

Small Sample SHAP Summary Plot

Varied-Distribution Sample SHAP Summary Plot

System Architecture

Frontend Interface

Menu Overview

Example Visualization

Key Technologies

Outcome

User Guide

Getting Started

Using the System

Customizing Rules

Additional Resources

About

Releases

Packages

Languages

tylermaginnis/AutomatedInsuranceFraudDetectionSystem

Folders and files

Latest commit

History

Repository files navigation

Automated Insurance Claim Fraud Detection System

Project Overview

Key Features

Data Ingestion and Preprocessing

Machine Learning Integration

Small Sample SHAP Summary Plot

Varied-Distribution Sample SHAP Summary Plot

System Architecture

Frontend Interface

Menu Overview

Example Visualization

Key Technologies

Outcome

User Guide

Getting Started

Using the System

Customizing Rules

Additional Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages