Skip to content
/ ViTHSD Public

Vietnamese Hate Speech Detection with real-time data from streaming platform such as Youtube, Facebook and Tiktok.

Notifications You must be signed in to change notification settings

bakansm/ViTHSD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ViTHSD-Vietnamese-Targeted-Hate-Speech-Detection

Vietnamese Targeted Hate Speech Detection on Social Media Texts.
Contact information: Mr. Son T. Luu
Email: [email protected] (Alternative: [email protected])

Data

10,000 comments, each comment has 05 targets with three relevant hateful levels.

Publication

https://arxiv.org/abs/2404.19252
(Please cite this paper when using the dataset)

Citation:
Vo, C.N., Huynh, K.B., Luu, S.T. et al. ViTHSD: exploiting hatred by targets for hate speech detection on Vietnamese social media texts. J Comput Soc Sc 8, 30 (2025). https://doi.org/10.1007/s42001-024-00348-6

@article{vo2025vithsd,
  title={ViTHSD: exploiting hatred by targets for hate speech detection on Vietnamese social media texts},
  author={Vo, Cuong Nhat and Huynh, Khanh Bao and Luu, Son T and Do, Trong-Hop},
  journal={Journal of Computational Social Science},
  volume={8},
  number={2},
  pages={30},
  year={2025},
  publisher={Springer}
}

Model

Updating

Streaming

Technologies

  • Apache Kafka
  • Apache Spark Structured Streaming
  • QuestDB - for sink

Requirements

How to run

  • Step 1: Start zookeeper server and kafka server Code:

    • Start zookeeper server
      bin/zookeeper-server-start.sh config/zookeeper.properties

    • Start kafka server
      bin/kafka-server-start.sh config/server.properties

  • Step 2: Create topic

    • Create topic named "youtube"
      bin/kafka-topics.sh --create --topic youtube --bootstrap-server localhost:9092
  • Step 3: Start questdb and connect questdb to topic.

    • Start questdb
      sudo questdb start
    • Connect questdb connector to kafka topic
      bin/connect-standalone.sh config/connect-standalone.properties config/questdb-connector.properties
  • Step 4: Submit spark to kafka topic
    spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.2 sparkStreaming.py

  • Step 5: Start producer and consumer

    • Producer
      python3 youtubeLiveData.py

    • Consumer
      python3 consumer.py

Now you can see the data on questdb at here

Application

Updating at here

About

Vietnamese Hate Speech Detection with real-time data from streaming platform such as Youtube, Facebook and Tiktok.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published