Skip to content

boostcampaitech4nlp1/level1_semantictextsimilarity_nlp-level1-nlp-09

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

👋 Hello there!

This is 🚀 Team NLP-09's repository for the 1st boostcamp AI Tech competition (2022.10.26 ~ 2022.11.03 19:00).

The competition is on sentence-level Semantic Text Similarity, STS.

Data

We are given a fixed train/dev/test dataset of ratio 85:5:10.

  • Total # of data: 10,974 sentence pairs
    • # of train data: 9,324
    • # of dev data: 1,100
    • # of test data: 550.

The data is from 3 sources:

  1. petition (국민청원 게시판 제목 데이터)
  2. NSMC (네이버 영화 감성 분석 코퍼스, Naver Sentiment Movie Corpus)
  3. slack (업스테이지(Upstage) 슬랙 데이터).

Each sentence pair in train and dev datasets also include (1) a score from 0 to 5 where 0 indicates zero similarity while 5 indicates that the two sentences match in terms of their core and supplementary content and (2) a binary score where a score of less than 3 from the previous score is mapped to 0 and 1 otherwise.

About

level1_semantictextsimilarity_nlp-level1-nlp-09 created by GitHub Classroom

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published