Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
317 views
in Technique[技术] by (71.8m points)

r - split character columns and get names of field in string

I need to split a column that contains information into several columns.
I'd use tstrsplit but the same kind of information is not in the same order among the rows and I need to extract the name of the new column within the variable. Important to know: there can be many pieces of information (fields to become new variables) and I don't know all of them, so I don't want a "field by field" solution.

Below is an example of what I have:

library(data.table)

myDT <- structure(list(chr = c("chr1", "chr2", "chr4"), pos = c(123L,
                  435L, 120L), info = c("type=3;end=4", "end=6", "end=5;pos=TRUE;type=2"
                  )), class = c("data.table", "data.frame"), row.names = c(NA,-3L))

#    chr pos                  info
#1: chr1 123          type=3;end=4
#2: chr2 435                 end=6
#3: chr4 120 end=5;pos=TRUE;type=2

And I'd like to get:

#    chr pos end  pos type
#1: chr1 123   4 <NA>    3
#2: chr2 435   6 <NA> <NA>
#3: chr4 120   5 TRUE    2

A most straightforward way to get that would be much appreciated! (Note: I'm not willing to go with a dplyr/tidyr way)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Using regex and the stringi packages:

setDT(myDT) # After creating data.table from structure()

library(stringi)

fields <- unique(unlist(stri_extract_all(regex = "[a-z]+(?==)", myDT$info)))
patterns <- sprintf("(?<=%s=)[^;]+", fields)
myDT[, (fields) := lapply(patterns, function(x) stri_extract(regex = x, info))]
myDT[, !"info"]

    chr  pos type end
1: chr1 <NA>    3   4
2: chr2 <NA> <NA>   6
3: chr4 TRUE    2   5

Edit: To get the correct type it seems (?) type.convert() can be used:

myDT[, (fields) := lapply(patterns, function(x) type.convert(stri_extract(regex = x, info), as.is = TRUE))]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...