Welcome | Data | Analyses | Discussion | Maps |
Data wrangle methods
library(tidyverse)
The purpose of this page is to provide details on the methods of data wrangling.
Observations of western redcedar were downloaded from iNaturalist and urban heat data were downloaded from open data portals or provided by contacts in the City of Tacoma, King County (Washington) and Portland. Trees in WA were also evaluated based on EHD Ranks. HOLC data were also investigated for each city.
Note temperature data is different for each dataset and may have been collected slightly differently. Temperature data will need to be standardized (difference from mean) for each dataset, then temperatures can be compared region wide.
Note the below qGIS methods also join Washington Environmental Health Disparities Data and Home Owner Loan Corporation Data. However, these data were not analyzed in this present study.
City tree data were ‘joined by attribute’ separately so we have 3 different tree datasets to work with or merge.
Also, given the UHI data were extracted with shapefiles, the column names are limited to 10characters. Therefore, exported UHI data only include iNat ID numbers and UHI data. These were then re-merged (see below) with the iNat data to get the remaining columns with proper names.
inat.full <- read.csv("./WRDM-full-data-1.13.24.csv")
#tacoma.uhi <- read.csv("./WRC-TAC-UHI-Values.csv")
Filter data to only include necessary columns
[1] “id” [23] “latitude” [24] “longitude” [35] “place_town_name” [36] “place_county_name”
inat.full.qgis <- inat.full[c(1,23,24,35,36)]
kc.wrc.qgis <- inat.full.qgis %>% filter(place_county_name=="King")
port.wrc.qgis <- inat.full.qgis %>% filter(place_town_name=="Portland")
tac.wrc.qgis <- inat.full.qgis %>% filter(place_town_name=="Tacoma")
Export data for QGIS data joins
#write.csv(inat.full.qgis,file="./WRDM-full-data-1.13.24-qgis.csv")
#write.csv(kc.wrc.qgis,file="./WRDM-King-County-1.13.24-qgis.csv")
#write.csv(port.wrc.qgis,file="./WRDM-Portland-1.13.24-qgis.csv")
#write.csv(tac.wrc.qgis,file="./WRDM-Tacoma-1.13.24-qgis.csv")
Export final .shp files as .csv
tacoma.uhi.holc.ehd <- read.csv("./WRC.Tacoma.UHI.HOLC.EHD-1.13.24.csv")
#king.county.uhi <- read.csv("./WRC-KC-UHI-Values.csv")
king.county.uhi.holc.ehd <- read.csv("./WRC.KingCounty.UHI.HOLC.EHD-1.13.24.csv")
portland.uhi.holc <- read.csv("./WRC.Portland.UHI.HOLC-1.13.24.csv") # does not include ehd data because it is outside of WA
kc.wrc <- left_join(king.county.uhi.holc.ehd,inat.full,by="id")
tac.wrc <- left_join(tacoma.uhi.holc.ehd,inat.full,by="id")
pl.wrc <- left_join(portland.uhi.holc,inat.full,by="id")
Note we needed to convert Tacoma temps to F to match king county
tac.wrc <- tac.wrc %>% mutate(DN_AM1=((DN_AM1*1.8)+32),DN_AF1=((DN_AF1*1.8)+32),DN_PM1=((DN_PM1*1.8)+32)) %>% mutate(mean.temp.daily=((DN_AM1+DN_AF1+DN_PM1)/3)) %>% mutate(dist.from.mean.daily=mean.temp.daily-(mean(mean.temp.daily,na.rm=TRUE))) %>% mutate(Area="Tacoma")
kc.wrc <- kc.wrc %>% mutate(mean.temp.daily=((DN_AM1+DN_AF1+DN_PM1)/3)) %>% mutate(dist.from.mean.daily=mean.temp.daily-(mean(mean.temp.daily,na.rm=TRUE))) %>% mutate(Area="King County")
pl.wrc <- pl.wrc %>% mutate(DN_AM1=((DN_AM1*1.8)+32),DN_AF1=((DN_AF1*1.8)+32),DN_PM1=((DN_PM1*1.8)+32)) %>% mutate(mean.temp.daily=((DN_AM1+DN_AF1+DN_PM1)/3)) %>% mutate(dist.from.mean.daily=mean.temp.daily-(mean(mean.temp.daily,na.rm=TRUE))) %>% mutate(Area="Portland")
tac.wrc <- tac.wrc %>% mutate(dist.from.mean.am=DN_AM1-(mean(DN_AM1,na.rm=TRUE)))
kc.wrc <- kc.wrc %>% mutate(dist.from.mean.am=DN_AM1-(mean(DN_AM1,na.rm=TRUE)))
pl.wrc <- pl.wrc %>% mutate(dist.from.mean.am=DN_AM1-(mean(DN_AM1,na.rm=TRUE)))
tac.wrc <- tac.wrc %>% mutate(dist.from.mean.af=DN_AF1-(mean(DN_AF1,na.rm=TRUE)))
kc.wrc <- kc.wrc %>% mutate(dist.from.mean.af=DN_AF1-(mean(DN_AF1,na.rm=TRUE)))
pl.wrc <- pl.wrc %>% mutate(dist.from.mean.af=DN_AF1-(mean(DN_AF1,na.rm=TRUE)))
tac.wrc <- tac.wrc %>% mutate(dist.from.mean.pm=DN_PM1-(mean(DN_PM1,na.rm=TRUE)))
kc.wrc <- kc.wrc %>% mutate(dist.from.mean.pm=DN_PM1-(mean(DN_PM1,na.rm=TRUE)))
pl.wrc <- pl.wrc %>% mutate(dist.from.mean.pm=DN_PM1-(mean(DN_PM1,na.rm=TRUE)))
data <- bind_rows(pl.wrc,tac.wrc,kc.wrc)
Some of the iNat project questions changed since it was created so some we need to adjust the answers to be more consistent throughout the project.
data$field.number.of.additional.unhealthy.trees..of.same.species..in.area..within.sight. <- as.factor(data$field.number.of.additional.unhealthy.trees..of.same.species..in.area..within.sight.)
data$field.optional...what..other.factors..were.observed. <- as.factor(data$field.optional...what..other.factors..were.observed.)
data$field.tree.canopy.symptoms <- as.factor(data$field.tree.canopy.symptoms)
data$field.optional...slope.position <- as.factor(data$field.optional...slope.position)
data$field.optional...site.type <- as.factor(data$field.optional...site.type)
data$field.optional...site.location.description <- as.factor(data$field.optional...site.location.description )
data$field.optional...tree.size <-as.factor(data$field.optional...tree.size)
data$field.other.factors...are.there.signs.or.symptoms.of.insect..diseases..or.other.damage.[data$field.other.factors...are.there.signs.or.symptoms.of.insect..diseases..or.other.damage.==""] <- "Not sure"
data$field.other.factors...are.there.signs.or.symptoms.of.insect..diseases..or.other.damage.[data$field.other.factors...are.there.signs.or.symptoms.of.insect..diseases..or.other.damage.=="Unsure"] <- "Not sure"
data$field.number.of.additional.unhealthy.trees..of.same.species..in.area..within.sight.[data$field.number.of.additional.unhealthy.trees..of.same.species..in.area..within.sight.=="4"] <- "4-6"
data$field.number.of.additional.unhealthy.trees..of.same.species..in.area..within.sight.[data$field.number.of.additional.unhealthy.trees..of.same.species..in.area..within.sight.=="5"] <- "4-6"
data$field.number.of.additional.unhealthy.trees..of.same.species..in.area..within.sight.[data$field.number.of.additional.unhealthy.trees..of.same.species..in.area..within.sight.=="2"] <- "2-3"
data$field.tree.canopy.symptoms[data$field.tree.canopy.symptoms=="Multiple Symptoms"] <-"Multiple Symptoms (please list in Notes)"
data$field.tree.canopy.symptoms[data$field.tree.canopy.symptoms=="multiple symptoms"] <-"Multiple Symptoms (please list in Notes)"
data$field.tree.canopy.symptoms[data$field.tree.canopy.symptoms=="thinning foliage"] <-"Thinning Canopy"
data$field.tree.canopy.symptoms[data$field.tree.canopy.symptoms=="healthy"] <-"Healthy"
data$field.tree.canopy.symptoms[data$field.tree.canopy.symptoms=="dead top"] <-"Old Dead Top (needles already gone)"
data$field.optional...what..other.factors..were.observed.[data$field.optional...what..other.factors..were.observed.=="Fungal Activitiy (mycelial fans, mushrooms at base, or conks on trunk)"] <-"Fungal Activitiy (mycelial fans, bleeding cankers, mushrooms at base, or conks on trunk)"
data$field.optional...what..other.factors..were.observed.[data$field.optional...what..other.factors..were.observed.=="Needle disease (dieback, checking, blight, etc.)"] <- "Needle or leaf disease (dieback, checking, blight, etc.)"
data$field.optional...slope.position[data$field.optional...slope.position=="Upper 1/3rd of a slope"] <-"Top of slope"
data$field.optional...site.type[data$field.optional...site.type=="Urban Natural"] <-"Urban"
data$field.optional...site.type[data$field.optional...site.type=="Urban Landscaped"] <-"Urban"
data$field.optional...site.type[data$field.optional...site.type=="Suburban Natural"] <-"Suburban"
data$field.optional...site.type[data$field.optional...site.type=="Suburban Lanscaped"] <-"Suburban"
data$field.optional...site.type[data$field.optional...site.type=="Natural Forest"] <-"Rural"
data$field.optional...tree.size[data$field.optional...tree.size=="Large"] <- "Large (too big to wrap arms around trunk)"
data$field.optional...tree.size[data$field.optional...tree.size=="Medium"] <- "Medium (can wrap arms around trunk)"
data$field.optional...tree.size[data$field.optional...tree.size=="Small"] <- "Small (can wrap hands around trunk)"
data$field.optional...tree.size[data$field.optional...tree.size=="Very Large"] <- "Very Large (would take many people to wrap arms around trunk)"
data$field.optional...tree.size[data$field.optional...tree.size=="Very small (can wrap a single hand around stem)"] <- "Very Small (can wrap a single hand around stem)"
## Warning in `[<-.factor`(`*tmp*`, data$field.optional...tree.size == "Very small
## (can wrap a single hand around stem)", : invalid factor level, NA generated
data$field.optional...site.location.description [data$field.optional...site.location.description =="Yard or open park grounds"] <- "Urban yard or open park grounds"
data$field.percent.canopy.affected.... <- as.factor(data$field.percent.canopy.affected....)
data$field.percent.canopy.affected....[data$field.percent.canopy.affected....=="1-25% of the crown is unhealthy"] <- "1-29% of the canopy is unhealthy"
data$field.percent.canopy.affected....[data$field.percent.canopy.affected....=="Healthy (0%)"] <- "Healthy, no dieback(0%)"
data$field.percent.canopy.affected....[data$field.percent.canopy.affected....=="Healthy (0% is unhealthy)"] <- "Healthy, no dieback(0%)"
data$field.percent.canopy.affected....[data$field.percent.canopy.affected....=="more than 75% of the crown is unhealthy"] <- "60-99% of the canopy is unhealthy"
data <- data %>% droplevels()
data$Percent.Dieback.Modified[data$field.percent.canopy.affected....=="Healthy, no dieback(0%)"] <- 0
data$Percent.Dieback.Modified[data$field.percent.canopy.affected....=="1-29% of the canopy is unhealthy"] <- 1
data$Percent.Dieback.Modified[data$field.percent.canopy.affected....=="30-59% of the canopy is unhealthy"] <- 30
data$Percent.Dieback.Modified[data$field.percent.canopy.affected....=="60-99% of the canopy is unhealthy"] <- 60
data$Percent.Dieback.Modified[data$field.percent.canopy.affected....=="tree is dead"] <- 100
#data$field.dieback.percent[is.na(data$field.dieback.percent)] <- data$Percent.Dieback.Modified
data$field.percent.canopy.affected....[data$Percent.Dieback.Modified==100] <- "tree is dead" #there were a couple healthy trees with 100 dieback somehow..
has.dieback.percent <- data %>% filter(!is.na(field.dieback.percent))
does.not.have.dieback.percent <- data %>% filter(is.na(field.dieback.percent))
has.dieback.percent$user.estimated.dieback <- "Yes"
does.not.have.dieback.percent$user.estimated.dieback <- "No"
does.not.have.dieback.percent$field.dieback.percent <- does.not.have.dieback.percent$Percent.Dieback.Modified
data <- rbind(has.dieback.percent,does.not.have.dieback.percent)
data$field.dieback.percent[data$field.dieback.percent<0] <- 0 # not sure why but there was one -2 percent dieback value
data <- data %>% mutate(tree.size.simplified=field.optional...tree.size)
tree_size_level_key <- c("Very Small (can wrap a single hand around stem)" = "Small", "Small (can wrap hands around trunk)" = "Small", "Medium (can wrap arms around trunk)" = "Medium", "Large (too big to wrap arms around trunk)" = "Large", "Very Large (would take many people to wrap arms around trunk)" = "Large","Other"="Other","No selection"="No Selection")
data$tree.size.simplified <- recode_factor(data$tree.size.simplified, !!!tree_size_level_key)
data$tree.size.simplified <- as.factor(data$tree.size.simplified)
data <- data %>% filter(field.tree.canopy.symptoms!="Candelabra top or very old spike top (old growth)") %>% mutate(binary.tree.canopy.symptoms=field.tree.canopy.symptoms) %>% mutate(ordinal.tree.canopy.symptoms=field.tree.canopy.symptoms) %>% mutate(reclassified.tree.canopy.symptoms=field.tree.canopy.symptoms) %>% droplevels()
binary_level_key <- c("Healthy" = "Healthy", "Thinning Canopy" = "Unhealthy", "New Dead Top (red or brown needles still attached)" = "Unhealthy", "Old Dead Top (needles already gone)" = "Unhealthy", "Tree is dead" = "Unhealthy", "Multiple Symptoms (please list in Notes)" = "Unhealthy", "Extra Cone Crop" = "Unhealthy", "Browning Canopy" = "Unhealthy","Branch Dieback or 'Flagging'" = "Unhealthy", "Other (please describe in Notes)" = "Unhealthy", "Yellowing Canopy" = "Unhealthy")
data$binary.tree.canopy.symptoms <- recode_factor(data$binary.tree.canopy.symptoms, !!!binary_level_key)
#levels(binary$field.tree.canopy.symptoms)
data$binary.tree.canopy.symptoms <- as.factor(data$binary.tree.canopy.symptoms)
ordinal_level_key <- c("Healthy" = "Healthy", "Thinning Canopy" = "Unhealthy", "New Dead Top (red or brown needles still attached)" = "Unhealthy", "Old Dead Top (needles already gone)" = "Unhealthy", "Tree is dead" = "Dead", "Multiple Symptoms (please list in Notes)" = "Unhealthy", "Extra Cone Crop" = "Unhealthy", "Browning Canopy" = "Unhealthy","Branch Dieback or 'Flagging'" = "Unhealthy", "Other (please describe in Notes)" = "Unhealthy", "Yellowing Canopy" = "Unhealthy")
data$ordinal.tree.canopy.symptoms <- recode_factor(data$ordinal.tree.canopy.symptoms, !!!ordinal_level_key)
#levels(binary$field.tree.canopy.symptoms)
data$ordinal.tree.canopy.symptoms <- as.factor(data$ordinal.tree.canopy.symptoms)
reclassified_level_key <- c("Healthy" = "Healthy", "Thinning Canopy" = "Thinning Canopy", "New Dead Top (red or brown needles still attached)" = "Dead Top", "Old Dead Top (needles already gone)" = "Dead Top", "Tree is dead" = "Tree is Dead", "Multiple Symptoms (please list in Notes)" = "Other", "Extra Cone Crop" = "Other", "Browning Canopy" = "Other","Branch Dieback or 'Flagging'" = "Other", "Other (please describe in Notes)" = "Other", "Yellowing Canopy" = "Other")
data$reclassified.tree.canopy.symptoms <- recode_factor(data$reclassified.tree.canopy.symptoms, !!!reclassified_level_key)
#levels(binary$field.tree.canopy.symptoms)
data$reclassified.tree.canopy.symptoms <- as.factor(data$reclassified.tree.canopy.symptoms)
Create Binary response for dead top
data$top.dieback[data$reclassified.tree.canopy.symptoms=="Dead Top"] <- "Yes"
data$top.dieback[data$reclassified.tree.canopy.symptoms!="Dead Top"] <- "No"
data$top.dieback <- as.factor(data$top.dieback)
levels(data$top.dieback)
## [1] "No" "Yes"
summary(data$top.dieback)
## No Yes
## 1217 118
Create Binary response for thinning
data$thinning[data$reclassified.tree.canopy.symptoms=="Thinning Canopy"] <- "Yes"
data$thinning[data$reclassified.tree.canopy.symptoms!="Thinning Canopy"] <- "No"
data$thinning <- as.factor(data$thinning)
levels(data$thinning)
## [1] "No" "Yes"
summary(data$thinning)
## No Yes
## 1159 176
Create Binary response for dead
data$dead[data$reclassified.tree.canopy.symptoms=="Tree is Dead"] <- "Yes"
data$dead[data$reclassified.tree.canopy.symptoms!="Tree is Dead"] <- "No"
data$dead <- as.factor(data$dead)
levels(data$dead)
## [1] "No" "Yes"
summary(data$dead)
## No Yes
## 1280 55
data <- data %>% mutate(field.dieback.prop = (field.dieback.percent/100))
daily.error <- data %>% filter(is.na(data$mean.temp.daily))
33 trees did not have temperature data and were filtered out.
A few observations were not identified to species or were determined to be other species by the iNat community.
levels(as.factor(data$scientific_name))
## [1] "" "Callitropsis nootkatensis"
## [3] "Chamaecyparis lawsoniana" "Cupressaceae"
## [5] "Cupressoideae" "Pinales"
## [7] "Plantae" "Pseudotsuga menziesii"
## [9] "Sequoia sempervirens" "Thuja plicata"
## [11] "Tracheophyta"
other.species <- data %>% filter(scientific_name!="Thuja plicata") %>% droplevels()
data <- data %>% filter(scientific_name=="Thuja plicata") %>% droplevels()
41 observations were identified as other species by the iNaturalist community or were not identified to species.
Afternoon temperature had a handful of weird observations with less than -8 degrees from mean.
ggplot(data,aes(dist.from.mean.af,field.dieback.percent))+geom_point()+theme_bw()
## Warning: Removed 33 rows containing missing values or values outside the scale range
## (`geom_point()`).
Three trees were removed becasue they came from areas at less than 8 degrees F lower than the mean. Perhaps the geo-references were off.
data <- data %>% filter(dist.from.mean.af>(-8))
write.csv(data,file="./urban-data-modified.csv")
Please use this data in analyses. Please make any changes or corrections to the data in this R markdown so everyone is using the same dataset in the analyses.