Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
524 views
in Technique[技术] by (71.8m points)

r - Twitter Data Analysis - Error in Term Document Matrix

Trying to do some analysis of twitter data. Downloaded the tweets and created a corpus from the text of the tweets using the below

# Creating a Corpus
wim_corpus = Corpus(VectorSource(wimbledon_text)) 

In trying to create a TermDocumentMatrix as below, I am getting an error and warnings.

tdm = TermDocumentMatrix(wim_corpus, 
                       control = list(removePunctuation = TRUE, 
                                      stopwords =  TRUE, 
                                      removeNumbers = TRUE, tolower = TRUE)) 

Error in simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms),    : 'i, j, v' different lengths


In addition: Warning messages:
1: In parallel::mclapply(x, termFreq, control) :
 all scheduled cores encountered errors in user code
2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
3: In TermDocumentMatrix.VCorpus(corpus) : invalid document identifiers
4: In simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms),  :
NAs introduced by coercion

Can anyone point to what this error indicates?Could this be related to the tm package?

The tm library has been imported. I am using R Version: R 3.0.1 and RStudio: 0.97

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I had the same problem and it turns out it is an issue with package compatibility. Try installing

install.packages("SnowballC")

and load with

library(SnowballC)

before calling DocumentTermMatrix.

It solved my problem.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...