GitHub

Chinese Intent Match 2018-11

1.preprocess

prepare() 将按类文件保存的数据原地去重，augment() 进行数据增强

gather() 将数据汇总、打乱，保存为 (text, label) 格式，make_pair() 对每条数据

从同类、异类中抽样组合为正例、反例，保存为 (text1, text2, flag) 格式

2.explore

统计词汇、长度、类别的频率，条形图可视化，计算 sent / word_per_sent 指标

3.represent

vectorize() 和 vectorize_pair() 分别进行向量化，label2ind() 建立标签索引

4.build

train 80% / dev 20% 划分，分别通过 dnn、cnn、rnn 构建匹配模型

5.encode

定义模型的编码部分、按层名载入相应权重，对训练数据进行预编码

6.match

定义模型的匹配部分、按层名载入相应权重，读取缓存数据

predict() 实时交互，输入单句、清洗后进行预测，输出相似概率前 5 的语句

7.eval

通过最近邻判决得到标签，test_pair()、test() 分别评估匹配、分类

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
cache		cache
dict		dict
feat		feat
metric		metric
model		model
stat		stat
.gitignore		.gitignore
README.md		README.md
build.py		build.py
encode.py		encode.py
eval.py		eval.py
explore.py		explore.py
match.py		match.py
nn_arch.py		nn_arch.py
preprocess.py		preprocess.py
represent.py		represent.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chinese Intent Match 2018-11

1.preprocess

2.explore

3.represent

4.build

5.encode

6.match

7.eval

About

Releases

Packages

Languages

CyanYoung/chinese_intent_match_4

Folders and files

Latest commit

History

Repository files navigation

Chinese Intent Match 2018-11

1.preprocess

2.explore

3.represent

4.build

5.encode

6.match

7.eval

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages