1. adorn_totals
library(purrr)
library(janitor)
library(tidyr)
library(dplyr)
df %>%
dplyr::select(sex, everything()) %>%
dplyr::group_split(year, state, age) %>%
purrr::map_dfr(janitor::adorn_totals) %>%
dplyr::mutate(across(c(year, state, age), ~ ifelse(sex == "Total", NA, .))) %>%
tidyr::fill(everything(), .direction = "down") %>%
dplyr::select(year, state, everything())
FYI, across
is from the dplyr
library.
Output
year state sex age population deaths
1970 Cal Male 12 134 45
1970 Cal Female 12 100 20
1970 Cal Total 12 234 65
1980 Pen Male 13 200 10
1980 Pen Female 13 150 50
1980 Pen Total 13 350 60
How it works
- The first
select
statement is to accommodate the behavior of janitor::adorn_totals
, which puts the "Total" label in the first column. So this puts sex
as the first column.
group_split
splits the dataframe into a list of dataframes by same year
, state
, and age
.
purrr::map_dfr
will map janitor::adorn_totals
over each dataframe in the list, which essentially is creating subtotals, and then row binds back into one dataframe, hence the r
in dfr
.
- The remainder of the pipes are to format the data how you requested. So that the
year
, state
, and age
columns will have the same value in the "Total" row. The last select
reorders everything back to the original order.
2. Pivot
Alternatively, since janitor::adorn_totals
is a bit buggy you can do this using some pivots to create the exact same output:
library(tidyr)
library(dplyr)
df %>%
tidyr::pivot_wider(id_cols = c(year, state, age),
names_from = sex,
values_from = population:deaths) %>%
dplyr::mutate(population_Total = rowSums(across(starts_with("population"))),
deaths_Total = rowSums(across(starts_with("deaths")))) %>%
tidyr::pivot_longer(cols = contains("_"),
names_to = c(".value", "sex"),
names_pattern = c("(.*)_(.*)$"))
FYI, across
, contains
, and starts_with
are exported or from the dplyr
library.
How it works
pivot_wider
re-orients the data so there is one row per year
, state
, and age
. New columns are created appending sex
to the end of the pivoted columns, which are population
and deaths
in this case. This creates columns population_Male
, population_Female
, deaths_Male
, and so on.
- We then sum across rows using
tidyselect
syntax to sum all the rows starting with "population" and likewise for "deaths", storing them in appropriately named columns appended with "_Total".
- The final pivot maintains the
year
, state
, and age
as the ID columns. It pivots all columns with an underscore: population_Male
, population_Female
, population_Total
, etc. Using regular expressions the column names are parsed by the underscore: population_Total
is parsed into the strings population
and Total
. The former becomes a column name storing values, the latter is a value stored in a column called "sex".
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…