Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

图3-9-3 单篇文章的词云图.R有问题 #6

Open
shixiangbupt opened this issue May 28, 2022 · 1 comment
Open

图3-9-3 单篇文章的词云图.R有问题 #6

shixiangbupt opened this issue May 28, 2022 · 1 comment

Comments

@shixiangbupt
Copy link

commonality.cloud
P9@9YTOK3$8GXZ` H%C8T70
library(tm)
library(wordcloud)

Paper1<-paste(scan("Paper1.txt", what = character(0),sep =""),collapse = " ") #读入 TXT 文档 1
Paper2<-paste(scan("Paper2.txt", what = character(0),sep =""),collapse = " ") #读入 TXT 文档 2
tmpText<- data.frame(c(Paper1,Paper2),row.names=c("Text1","Text2"))
df_title <- data.frame(doc_id=row.names(tmpText),text=tmpText$c.Paper1..Paper2.)
ds <- DataframeSource(df_title)
#创建一个数据框格式的数据源,首列是文档 id(doc_id),第二列是文档内容
corp = VCorpus(ds)
#加载文档集中的文本并生成语料库文件
corp = tm_map(corp,removePunctuation) #清除语料库内的标点符号
corp = tm_map(corp,PlainTextDocument) #转换为纯文本
corp = tm_map(corp,removeNumbers) #清除数字符号
corp = tm_map(corp,function(x){removeWords(x,stopwords())}) #过滤停止词库
term.matrix <- TermDocumentMatrix(corp)
#利用 TermDocumentMatrix()函数将处理后的语料库进行断字处理,生成词频权重矩阵
term.matrix <- as.matrix(term.matrix) #频率
colnames(term.matrix) <- c("Paper1","paper2")
#comparison.cloud(term.matrix,max.words=300,random.order=FALSE,colors=c("#00B2FF", "red"))
#图 3-9-4(a)
commonality.cloud(term.matrix,max.words=100,random.order=FALSE,color="#E7298A")
#图 3-9-4(b)
df<-data.frame(term.matrix)
#wordcloud(row.names(df) , df$Paper1 ,min.freq=10,col=brewer.pal(8,"Dark2"),rot.per=0.3 )
#图 3-9-3(a)
#wordcloud(row.names(df) , df$Paper2 , min.freq=10,col=brewer.pal(8,"Dark2"),rot.per=0.3 )
#图 3-9-3(b

错误是:> commonality.cloud(term.matrix,max.words=100,random.order=FALSE,color="#E7298A")
Error in is_overlap(x1, y1, sw1, sh1, boxes) :
程序包'Rcpp_precious_remove'不提供'Rcpp'这样的函数

@chenwenqi228
Copy link

我也是频繁报错,后来结合chatgpt反复修改,发现问题主要出在:
1)Rstudio版本和命令版本不兼容,需要检查,改更新的就更新;
2)paper1和paper2不是UTF-8格式,需要设定一下
3)过滤停止词库的命令有bug,需要补充:
corp <- tm_map(corp, content_transformer(function(x) iconv(enc2utf8(x), sub = "byte")))
corp <- tm_map(corp, function(x){removeWords(x,stopwords("en"))})

现在这个版本的可以运行:
library(tm)
library(wordcloud)
Paper1 <- paste(scan("R语言可视化之美/第3章_类别比较型图表/Paper1.txt", what = character(0), sep = "", encoding = "UTF-8"), collapse = " ")
Paper2 <- paste(scan("R语言可视化之美/第3章_类别比较型图表/Paper2.txt", what = character(0), sep = "", encoding = "UTF-8"), collapse = " ")
#补充encoding = "UTF-8",防止文件识别格式错误

tmpText<- data.frame(c(Paper1, Paper2),row.names=c("Text1","Text2"))
df_title <- data.frame(doc_id=row.names(tmpText),
text=tmpText$c.Paper1..Paper2.)
ds <- DataframeSource(df_title)
#创建一个数据框格式的数据源,首列是文档id(doc_id),第二列是文档内容
corp <- VCorpus(ds)
#加载文档集中的文本并生成语料库文件
corp<- tm_map(corp,removePunctuation) #清除语料库内的标点符号
corp <- tm_map(corp,PlainTextDocument) #转换为纯文本
corp <- tm_map(corp,removeNumbers) #清除数字符号

corp <- tm_map(corp, content_transformer(function(x) iconv(enc2utf8(x), sub = "byte")))
corp <- tm_map(corp, function(x){removeWords(x,stopwords("en"))}) #过滤停止词库
term.matrix <- TermDocumentMatrix(corp)
#利用TermDocumentMatrix()函数将处理后的语料库进行断字处理,生成词频权重矩阵

term.matrix <- as.matrix(term.matrix) #频率
colnames(term.matrix) <- c("Paper1","paper2")
df<-data.frame(term.matrix)
write.csv(df,'term_matrix.csv') #导出两篇文章的频率分析结果

#---------------------------------------导入数据------------------------------------------
df<-read.csv('term_matrix.csv',header=TRUE,row.names=1)

#----------------------------------------两篇文章数据的对比-------------------------------------------------------------
comparison.cloud(df, max.words=300, random.order=FALSE, rot.per=.15, c(4,0.4), title.size=1.4)
image
comparison.cloud(df,max.words=300,random.order=FALSE,colors=c("#00B2FF", "red"))
image
commonality.cloud(df,max.words=100,random.order=FALSE,color="#E7298A")
image

comparison cloud

comparison.cloud(df, random.order=FALSE,
colors = c("#00B2FF", "red", "#FF0099", "#6600CC"),
title.size=1.5, max.words=500)
image
#-------------------------------------单篇文章数据的展示-----------------------------------------------------------------
#Colors<-colorRampPalette(rev(brewer.pal(9,'RdBu')))(length(df$Paper1>10))
wordcloud(row.names(df) , df$Paper1 , min.freq=10,col=brewer.pal(8, "Dark2"), rot.per=0.3 )
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants