Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
597 views
in Technique[技术] by (71.8m points)

data.table - Splitting text column into ragged multiple new columns in a data table in R

I have a data table containing 20000+ rows and one column. The string in each column has different number of words. I want to split the words and put each of them in a new column. I know how I can do it word by word:

Data [ , Word1 := as.character(lapply(strsplit(as.character(Data$complaint), split=" "), "[", 1))]

(Data is my data table and complaint is the name of the column)

Obviously, this is not efficient because each cell in each row has different number of words.

Could you please tell me about a more efficient way to do this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Two functions, transpose() and tstrsplit(), are available since version 1.9.6 on CRAN.

With this we can do:

require(data.table)
setDT(tstrsplit(as.character(df$x), " ", fixed=TRUE))[]
#      V1       V2          V3  V4
# 1: This       is interesting  NA
# 2: This actually          is not

tstrsplit is a wrapper for transpose(strsplit(...)).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...