AI Tamil Hate Speech Detector

Introduction

This project is the result of a collaboration between DreamSpace Academy, NYU CIC, and Omdena, and was funded by NYU CIC. The goal of the project is to detect hate speech on social media platforms that's in either Tamil, English or Tanglish (English transliterated into Tamil). A global team of 50 AI changemakers took on the task to detect hate speech in Tamil language.The partner for this challenge is social enterprise DreamSpace Academy (DSA). The Challenge is supported by the NYU Center on International Cooperation and the Netherlands Ministry of Foreign Affairs.

The focus is on the following hate-speech related categories:

Community-based hate speech
Religion-based hate speech
Gender-based hate speech
Political hate speech

Solution

An AI model written in Python: Built using Fastapi and Streamlit making the complete code base in Python.

Project Setup and Documentation

Clone the Repo.
Run the backend service. (Make sure Docker is running.)
- Go to the backend folder
- Run the Docker Compose command
```
$ cd backend
backend:~$ sudo docker-compose up -d
```
Run the frontend service.
- Go to the frontend folder
- Run the app with the streamlit run command
```
$ cd frontend
frontend:~$ streamlit run NLPfile.py
```
Access to Fastapi Documentation:
- Hate Classification: http://localhost:8080/api/v1/classification/docs

Project Details

Screenshot

Directory Details

Front End: streamlit code is in the frontend folder. Along with the Dockerfile and requirements.txt
Back End: Fastapi code is in the backend folder.
- The project has been implemented as a microservice, with its own fastapi server and requirements and Dockerfile.
- Directory tree as below:
```
- classification
    > app
        > api
            > bert_model_artifacts
                - model.bin
                - network.py
```
- Each folder model will need the following files:
  - Model bin file is the saved model after training.
  - network.py for customised model, define class here.
- config.json: This file contains the details of the models in the backend and the dataset they are trained on.

Test API with Python Script

Run the following script with your desired text input as the data variable:

$ cd backend
backend:~$ python backend\test_api.py

License

This project is licensed under the Apache License 2.0. You may not use any trademarks associated with the software without permission. The full text of the license can be found in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
.dvc		.dvc
DS-data		DS-data
api		api
backend		backend
config		config
dataset		dataset
docs		docs
frontend		frontend
misc		misc
notebooks		notebooks
reports		reports
tasks		tasks
.DS_Store		.DS_Store
.dvcignore		.dvcignore
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Tamil Hate Speech Detector

Contents

Introduction

Solution

Project Setup and Documentation

Project Details

Screenshot

Directory Details

Test API with Python Script

License

About

Releases

Packages

Contributors 14

Languages

License

dreamspace-academy/ai-tamil-hate-speech-project

Folders and files

Latest commit

History

Repository files navigation

AI Tamil Hate Speech Detector

Contents

Introduction

Solution

Project Setup and Documentation

Project Details

Screenshot

Directory Details

Test API with Python Script

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 14

Languages

Packages