Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
114 views
in Technique[技术] by (71.8m points)

r - How to use info in a dataframe and apply it to another dataframe? (Counties are urban/rural)?

I am new to Stackflow so I apologize in advance if my question isn't completely clear.

I am using R.

So I have 2 data frames.. one is Census Bureau data containing all counties in the United States and if they are classified as rural or urban. In my other dataframe, I have iNaturalist occurrence data of a moth species and I have counties and states for those occurrences.

I want to mutate a new column in the iNaturalist dataframe classifying each county and state as urban or rural using the Census Bureau data. However, I don't know how to narrow down the county data or link it to urban/rural like it is in the Census Bureau data. I'm just not sure how to achieve it through code. I've included code for heads of both dataframes. Thanks in advance for the help!

head_of_iNat_data <- structure(list(id = c(1031950L, 2377237L, 2377432L, 4284321L, 
4343263L, 4378730L), observed_on = c("2014-10-23", "2015-11-13", 
"2015-11-13", "2016-10-06", "2016-10-13", "2016-10-10"), Year = c("2014", 
"2015", "2015", "2016", "2016", "2016"), Month = c("10", "11", 
"11", "10", "10", "10"), Day = c("23", "13", "13", "06", "13", 
"10"), quality_grade = c("research", "research", "research", 
"research", "research", "research"), url = c("http://www.inaturalist.org/observations/1031950", 
"http://www.inaturalist.org/observations/2377237", "http://www.inaturalist.org/observations/2377432", 
"http://www.inaturalist.org/observations/4284321", "http://www.inaturalist.org/observations/4343263", 
"http://www.inaturalist.org/observations/4378730"), captive_cultivated = c(FALSE, 
FALSE, FALSE, FALSE, FALSE, FALSE), latitude = c(32.586924, 32.58703748, 
32.586952, 30.27109297, 33.15875283, 33.17152287), longitude = c(-97.102204, 
-97.102051, -97.101858, -97.72142226, -97.0424805, -97.15339088
), coordinates_obscured = c(FALSE, FALSE, FALSE, FALSE, FALSE, 
TRUE), scientific_name = c("Amorpha juglandis", "Amorpha juglandis", 
"Amorpha juglandis", "Amorpha juglandis", "Amorpha juglandis", 
"Amorpha juglandis"), taxon_id = c(84023L, 84023L, 84023L, 84023L, 
84023L, 84023L), state = c("texas", "texas", "texas", "texas", 
"texas", "texas"), county = c("tarrant", "tarrant", "tarrant", 
"travis", "denton", "denton")), row.names = c(NA, 6L), class = "data.frame")
head_of_census_bureau <- structure(list(i_2015_geoid = c(1001L, 1003L, 1005L, 1007L, 1009L, 
1011L), x2010_census_total_population = c("54,571", "182,265", 
"27,457", "22,915", "57,322", "10,914"), county = c("Autauga County", 
"Baldwin County", "Barbour County", "Bibb County", "Blount County", 
"Bullock County"), state = c(" Alabama", " Alabama", " Alabama", 
" Alabama", " Alabama", " Alabama"), x2010_census_urban_population = c("31,650", 
"105,205", "8,844", "7,252", "5,760", "5,307"), x2010_census_rural_population = c("22,921", 
"77,060", "18,613", "15,663", "51,562", "5,607"), x2010_census_percent_rural = c(42, 
42.3, 67.8, 68.4, 90, 51.4), classification = c("mostly urban", 
"completely rural", "completely rural", "completely rural", "mostly rural", 
"completely rural")), row.names = c(NA, 6L), class = "data.frame")
question from:https://stackoverflow.com/questions/65942100/how-to-use-info-in-a-dataframe-and-apply-it-to-another-dataframe-counties-are

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Merges and dplyr::*_join verbs require equality, and your data lacks that for several reasons.

  • "Blount county" != "Blount"`
  • "Blount" != "blount" (case)
  • " alabama" != "alabama" (blank-spaces)

It is perfectly reasonable to modify the state and county values in-place. However, I find it often useful to not replace them, but to add fields, so that (if needed) I have the original values untouched. So for now, new columns.

census$state_lc <- trimws(tolower(census$state))
census$county_lc <- trimws(gsub(" county", "", tolower(census$county)))
iNat$state_lc <- trimws(tolower(iNat$state))
iNat$county_lc <- trimws(tolower(iNat$county))

The above might be sufficient for your real data, but it does nothing with the data that you have supplied. (Please, when providing sample data, at least make sure you have the same state and county in both frames.) To fix this, I will arbitrarily change one row from census to match a county in iNat.

census$state_lc[1] <- iNat$state_lc[1]
census$county_lc[1] <- iNat$county_lc[1]

From here, run the merge.

merge(iNat, subset(census, select = c(state_lc, county_lc, classification)), by = c("state_lc", "county_lc"), all.x = TRUE)
#   state_lc county_lc      id observed_on Year Month Day quality_grade                                             url captive_cultivated latitude longitude coordinates_obscured   scientific_name taxon_id state  county classification
# 1    texas    denton 4343263  2016-10-13 2016    10  13      research http://www.inaturalist.org/observations/4343263              FALSE 33.15875 -97.04248                FALSE Amorpha juglandis    84023 texas  denton           <NA>
# 2    texas    denton 4378730  2016-10-10 2016    10  10      research http://www.inaturalist.org/observations/4378730              FALSE 33.17152 -97.15339                 TRUE Amorpha juglandis    84023 texas  denton           <NA>
# 3    texas   tarrant 1031950  2014-10-23 2014    10  23      research http://www.inaturalist.org/observations/1031950              FALSE 32.58692 -97.10220                FALSE Amorpha juglandis    84023 texas tarrant   mostly urban
# 4    texas   tarrant 2377237  2015-11-13 2015    11  13      research http://www.inaturalist.org/observations/2377237              FALSE 32.58704 -97.10205                FALSE Amorpha juglandis    84023 texas tarrant   mostly urban
# 5    texas   tarrant 2377432  2015-11-13 2015    11  13      research http://www.inaturalist.org/observations/2377432              FALSE 32.58695 -97.10186                FALSE Amorpha juglandis    84023 texas tarrant   mostly urban
# 6    texas    travis 4284321  2016-10-06 2016    10  06      research http://www.inaturalist.org/observations/4284321              FALSE 30.27109 -97.72142                FALSE Amorpha juglandis    84023 texas  travis           <NA>

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...