Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
229 views
in Technique[技术] by (71.8m points)

stringr - How do I import a txt file into R and separate text into columns based on certain criteria

I have some job descriptions saved in a txt file format. The job title, job description, job title, etc are all lumped together and I am trying to separate them into columns. The text is about 5 pages long. Here is a sample of how the text is structured -

EXECUTIVE LEVEL
001 Chief Executive Officer: Job description of CEO.
040 Area Director: This line contains job description of the Area Director.

FINANCE TEAM
025 Chief Operating Officer: This line contains job description of the Chief Operating Officer
055 Chief Financial Officer: This person controls operations of the company and reports to the COO

MARKETING TEAM
056 Marketing Director: This person is in charge of the marketing team. Blab la bla

I would like to create a dataframe (or is it called tibble these days?) with 4 columns -

column 1 - The team name (Executive Level, Finance Team, Marketing Team, etc)

column 2 - Team number (001, 040 025, 055, etc)

column 3 - The job title (Chief Executive Officer, Chief Operating Officer, etc)

column 4 - The job description

Thanks in advance

question from:https://stackoverflow.com/questions/65927895/how-do-i-import-a-txt-file-into-r-and-separate-text-into-columns-based-on-certai

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
x2 <- x[nzchar(x)]
x3 <- split(x2, cumsum(grepl("^[A-Z]", x2)))
x4 <- lapply(x3, function(z) transform(strcapture("^([0-9]+)\s+([^:]+):\s*(.*)$", z[-1], list(num="", title="", desc="")), name=z[1]))
x5 <- do.call(rbind, x4)
x5
#     num                   title                                                                  desc            name
# 1.1 001 Chief Executive Officer                                               Job description of CEO. EXECUTIVE LEVEL
# 1.2 040           Area Director              This line contains job description of the Area Director. EXECUTIVE LEVEL
# 2.1 025 Chief Operating Officer     This line contains job description of the Chief Operating Officer    FINANCE TEAM
# 2.2 055 Chief Financial Officer This person controls operations of the company and reports to the COO    FINANCE TEAM
# 3   056      Marketing Director           This person is in charge of the marketing team. Blab la bla  MARKETING TEAM

Data, likely the results of x <- readLines(path_to_file).

x <- c("EXECUTIVE LEVEL", "001 Chief Executive Officer: Job description of CEO.", "040 Area Director: This line contains job description of the Area Director.", "", "FINANCE TEAM", "025 Chief Operating Officer: This line contains job description of the Chief Operating Officer", "055 Chief Financial Officer: This person controls operations of the company and reports to the COO", "", "MARKETING TEAM", "056 Marketing Director: This person is in charge of the marketing team. Blab la bla")

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...