Brownie point: This project won the Best Women’s Team award at IEEE Kalpana 6.0 – Hack for Humanity.
Verifyy is a real-time multimodal deepfake detection system designed to restore trust in digital media. It analyzes video, audio, and metadata to generate a calibrated trust score indicating whether content is likely real or manipulated.
The system is built with a scalable backend architecture and a browser extension interface, making deepfake detection accessible and practical for real-world use.
Important Note
This extension currently detects deepfakes based on patterns learned from the DFDC (Kaggle) dataset used during training.
Detection performance may vary significantly on content outside this dataset due to domain shift. If you have access to a more diverse or representative dataset, you are encouraged to fork the project and retrain the models accordingly.
Current observed accuracy: ~80–85% under DFDC-like conditions.
This project is an evolving research prototype and not a production-grade forensic system.
Verifyy performs detection across three independent layers:
- Video Analysis – Detects facial manipulation artifacts.
- Audio Analysis – Identifies synthetic or cloned voice patterns.
- Metadata Analysis – Examines file-level forensic signals.
The outputs from these components are fused into a final trust score that reflects the overall likelihood of manipulation.
- Multimodal deepfake detection (video + audio + metadata)
- Real-time inference using asynchronous backend services
- Explainable using Grad-CAM heatmaps
- Browser extension interface
- REST API and WebSocket support for live updates
[User Layer]
↓
[Browser Extension]
↓
[API Gateway]
↓
[Processing Engine]
↓
[Trust Scoring System]
↓
[Database]
- Python
- FastAPI
- Celery
- Redis
- PyTorch
- FFmpeg
- Librosa
- ImageHash
- Chrome Extension (Manifest V3)
- HTML, CSS, JavaScript
- ResNet18 (Video classification)
- CRNN (Audio spoof detection)
- Extracts key frames using FFmpeg.
- Detects faces using MTCNN.
- Crops and preprocesses facial regions.
- Runs inference using a fine-tuned ResNet18 model.
- Generates probability of facial manipulation.
- Optionally produces Grad-CAM heatmaps for explainability.
- Extracts audio from video at 16 kHz mono.
- Converts waveform to Mel-spectrogram.
- Processes through a CRNN model.
- Outputs probability of voice spoofing.
- Computes SHA-256 hash for duplicate detection.
- Generates perceptual hash (pHash) for similarity detection.
- Extracts encoding metadata using FFprobe.
- Produces metadata risk score.
Final trust score is computed using weighted fusion:
Final Score = 0.6 × Video Score + 0.4 × Audio Score
Metadata acts as an override signal in cases of detected recycling or duplication.
The result is categorized as:
- Likely Real
- Likely Fake
git clone <repository-url>
cd Verifyy
python3 -m venv venv
source venv/bin/activate
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
Ensure both ffmpeg and ffprobe are installed and available in PATH.
Verify:
ffmpeg -version
ffprobe -version
a)
cd backend/audio_service
python train.py
b)
cd backend/video_service
python train.py
redis-server
cd backend
celery -A backend.celery_app worker --loglevel=info -P solo
In a new terminal:
cd backend
uvicorn main:app --reload
API available at:
http://127.0.0.1:8000
To test the full pipeline locally:
python test_full_pipeline.py
Expected results:
- Open Chrome.
- Go to
chrome://extensions. - Enable Developer Mode.
- Click “Load Unpacked”.
- Select the extension folder.
Ensure the backend server is running before testing.
spock--/
├── .git/
├── .gitignore
├── README.md
├── requirements.txt
├── mac-requirements.txt
├── fedora-requirements-freeze.txt
├── frontend/
│ ├── index.html
│ ├── app.js
│ ├── styles.css
│ ├── popup.html
│ ├── popup.js
│ ├── analyzer.html
│ └── analyzer.js
├── backend/
│ ├── __init__.py
│ ├── main.py
│ ├── celery_app.py
│ ├── metadata_service.py
│ ├── scoring.py
│ ├── utils.py
│ ├── test_full_pipeline.py
│ ├── audio_service/
│ │ ├── __init__.py
│ │ ├── dataset.py
│ │ ├── model.py
│ │ ├── service.py
│ │ └── train.py
│ ├── video_service/
│ │ ├── __init__.py
│ │ ├── model.py
│ │ ├── service.py
│ │ └── train.py
│ ├── tests/
│ │ ├── __init__.py
│ │ ├── test_audio_module.py
│ │ ├── test_video_module.py
│ │ ├── test_scoring.py
│ │ ├── test_celery.py
│ │ └── test_ws.py
│ ├── weights/
│ ├── media/
│ ├── audio_dataset/
│ │ ├── fake/
│ │ └── real/
│ └── video_dataset/
│ ├── fake/
│ └── real/
- Performance may vary across datasets (domain shift).
- Lip-sync consistency is not currently verified.
- Low-resolution faces may reduce detection accuracy.
- Highly advanced generative models may evade detection.
- Transformer-based multimodal fusion
- Lip-sync detection module
- Larger cross-dataset training
- Cloud deployment with scalable worker pools