Skip to content

Latest commit

 

History

History
24 lines (15 loc) · 884 Bytes

README.md

File metadata and controls

24 lines (15 loc) · 884 Bytes

Caviar

a Chinese word deliminator named Caviar who does a stupid job.

(Caviar) 请输入中文后回车:
(User) 你好笨
(Caviar) [你好, 笨]

About Caviar

The idea of making a Chinese word deliminator comes from the book, The Beauty of Mathenamatics in Computer Science by Jun Wu. To deliminate words for a segment in Chinese is the very first step of NLP in Chinese. And I happened to know a little about Markoc Chain and Viterbi Algorithm.

Thanks for Caviar's contribution to my college application.

Limitations

There are a lot of problems in this project, including but not limiting to

  • biasd and outdated corpus
  • brute method to create the tree
  • dictionary not cleaned, may include numbers which means nothing
  • use the simplest Markov model for it, the context is limited to the one segment prior to the segment

made in August 2021