Welcome Data Analyses Discussion Maps

Data wrangle methods

library(tidyverse)

Purpose

The purpose of this page is to provide details on the methods of data wrangling.

Data Descriptions

Observations of western redcedar were downloaded from iNaturalist and urban heat data were downloaded from open data portals or provided by contacts in the City of Tacoma, King County (Washington) and Portland. Trees in WA were also evaluated based on EHD Ranks. HOLC data were also investigated for each city.

iNaturalist

  • iNaturalist data were downloaded from the Western Redcedar Dieback Map on 1.13.24:
    • full data (query: quality_grade=any&identifications=any%projects%5B%5D=western-redcedar-dieback-map)
      • Note we needed to specify all fields related to place and all fields related to the project

Urban Heat Data

  • Urban Heat Island data were obtained for the following locations

Note temperature data is different for each dataset and may have been collected slightly differently. Temperature data will need to be standardized (difference from mean) for each dataset, then temperatures can be compared region wide.

EHD and HOLC Data

Note the below qGIS methods also join Washington Environmental Health Disparities Data and Home Owner Loan Corporation Data. However, these data were not analyzed in this present study.

Pre-GIS Data Wrangle Methods

City tree data were ‘joined by attribute’ separately so we have 3 different tree datasets to work with or merge.

Also, given the UHI data were extracted with shapefiles, the column names are limited to 10characters. Therefore, exported UHI data only include iNat ID numbers and UHI data. These were then re-merged (see below) with the iNat data to get the remaining columns with proper names.

Import Data

inat.full <- read.csv("./WRDM-full-data-1.13.24.csv")
#tacoma.uhi <- read.csv("./WRC-TAC-UHI-Values.csv")

Prep for QGIS Data Extracts

Filter data to only include necessary columns

[1] “id” [23] “latitude” [24] “longitude” [35] “place_town_name” [36] “place_county_name”

inat.full.qgis <- inat.full[c(1,23,24,35,36)]

Split Data into “Areas”

kc.wrc.qgis <- inat.full.qgis %>% filter(place_county_name=="King")
port.wrc.qgis <- inat.full.qgis %>% filter(place_town_name=="Portland")
tac.wrc.qgis <- inat.full.qgis %>% filter(place_town_name=="Tacoma")
  • Note: Alternative option is to export iNat observations in these locations rather than export all observations than filter to these locations
    • Tacoma (query: quality_grade=any&identifications=any&place_id=186123&projects%5B%5D=western-redcedar-dieback-map)
    • Portland Metro Area (query: quality_grade=any&identifications=any&place_id=122420&projects%5B%5D=western-redcedar-dieback-map)
    • King County (query: quality_grade=any&identifications=any&place_id=1282&projects%5B%5D=western-redcedar-dieback-map)

Export data for QGIS data joins

#write.csv(inat.full.qgis,file="./WRDM-full-data-1.13.24-qgis.csv")
#write.csv(kc.wrc.qgis,file="./WRDM-King-County-1.13.24-qgis.csv")
#write.csv(port.wrc.qgis,file="./WRDM-Portland-1.13.24-qgis.csv")
#write.csv(tac.wrc.qgis,file="./WRDM-Tacoma-1.13.24-qgis.csv")
  • Sample Sizes (note these numbers are much different after removing Hoyt trees and dead trees. Not sure what happens to all the king county trees though)
    • Tacoma - 357 Trees
    • Portland - 465 Trees
    • King County - 516 Trees
  • After all filters
    • Tacoma - 343 Trees
    • Portland - 341 Trees
    • King County - 420 Trees

QGIS Methods

Extracting Ancillary Data for iNaturalist Observations

  • Import into QGIS
    • Data Source Manager - Delimited Text - Browse to R working directory

Extract Heat Data for Trees

  • QGIS - Extract heat data for each point
    • Add Heat Data
      • Tacoma
        • tac_pm.tif
        • tac_am.tif
        • tac_af.tif
      • King County
        • pm_t_f_ranger.tiff
        • af_t_f_ranger.tiff
        • am_t_f_ranger.tiff
      • Portland
        • 825a_2 (am)
        • 825b_2 (af)
        • 825c_2 (pm)
    • Open processing tools panel and search for ‘sample raster values’
      • (View > Panels > processing tools)
      • Sample raster values for each temperature time series and each area (e.g. )

Extract HOLC Data

  • Add HOLC data to trees layers for each city
    • join attributes by location (Vector > Data Management Tools > Join Attributes By Location)
    • e.g. Join features in Portland Trees that Intersect by comparing to Portland HOLC 1.13.24
    • Advanced settings - do not filter (mark in in both layers) + May get warning: No spatial index exists.. + Right click each layer and click ‘Create Spatial index’ in Source Tab
    • Export temporary data for trees

Extract EHD Rank Data for Trees

  • Add EHD RAnk data downloaded on 6.17.23 from https://geo.wa.govsets/WADOH::full-environmental-health-disparities-version-2-extract/explore
  • Add EHD Data to trees layer
    • join attributes by location (Vector > Data Management Tools > Join Attributes By Location)
    • Advanced settings - do not filter (mark in in both layers)
      • May get warning: No spatial index exists..
        • Right click each layer and click ‘Create Spatial index’ in Source Tab
    • Export temp data for trees

Random Tree Selection - Community Hypothesis Test

  • Random Tree Selection for Portland Redhot Hypothesis Test.
    • Define Project CRS WGS 84 EPSG:4326
    • Add Portland Urban Heat Data
      • GIS > Urban Heat GIS Data > Portland > a, b,c tiffs
      • Convert UHI Raster to shp files x 3
      • Raster > Conversion > Polygonize > Default Settings (DN_AM for morning temps (825a), DN_AF for afternoon etc.)
      • Export temporary ‘vectorized’ layers to shp files with default crs Create Spatial Index for each layer (avoids warning in below joining steps)
      • Right click each layer and click ‘Create Spatial index’ in Source Tab Extract temp data for trees
    • Add Street Tree Data
    • Join attributs by location
      • Vector > Data Management Tools > Join Attributes by location
      • Trees that intersect with vectorized rasters
      • Advanced settings - do not filter (mark in both layers)
    • Limit trees to “DBH” > 30 and “DBH” < 40? (leaves 160 trees)
    • Randomly select 100 (drops 60 trees)

Export final .shp files as .csv

Extract Heat Data for Additional City Trees

  • Collecting City trees datasets
  • Extracting heat data for trees
    • City Tree data and urban Heat Data (used in previous qgis data extraction methods) were imported into qGIS
      • Note Seattle and Portland data had different Source CRS data
    • Trees were filtered to redcedar
      • Seattle combined trees - “Scientific Name” = ‘Thuja plicata’
      • Portland street trees - “SPECIES” = ‘Thuja plicata - western redcedar’
    • Air temperature data were extracted for each tree with the sample rasters value tool
      • Open processing tools panel and search for ‘sample raster values’ (View > Panels > processing tools)
    • (Optional) Convert C to F in new column (Portland data only)
      • Open attribute table, toggle editing, click new field, use decimal 10,3.
        • AF_F = click equation
          • (“DN_AF1” * 1.8)+32 - update all
      • Save, toggle editing
    • Add lat/lon columns for r
      • Open attribute table, toggle editing, click new field, name lat or lon, use decimal 10,10.
      • for lat - y(transform($geometry, layer_property(@layer, ‘crs’),‘EPSG:4326’))
      • fot lon - x(transform($geometry, layer_property(@layer, ‘crs’),‘EPSG:4326’))
    • exported to csv
  • Total Trees
    • Seattle - 205146
    • Portland Street - 243,283
    • Portland Park - 25,740
    • Portland Heritage - 324
  • Redcedar Trees
    • Seattle Combined - 2,690
    • Portland Street - 1,715
    • Portland Park - 964
    • Portland Heritage - 3

Post-GIS Data Wrangle Methods

Re-import Data after following QGIS Methods

tacoma.uhi.holc.ehd  <- read.csv("./WRC.Tacoma.UHI.HOLC.EHD-1.13.24.csv")
#king.county.uhi <- read.csv("./WRC-KC-UHI-Values.csv")
king.county.uhi.holc.ehd <- read.csv("./WRC.KingCounty.UHI.HOLC.EHD-1.13.24.csv")
portland.uhi.holc <- read.csv("./WRC.Portland.UHI.HOLC-1.13.24.csv") # does not include ehd data because it is outside of WA

Join UHI data with iNat Data

kc.wrc <- left_join(king.county.uhi.holc.ehd,inat.full,by="id")
tac.wrc <- left_join(tacoma.uhi.holc.ehd,inat.full,by="id")
pl.wrc <- left_join(portland.uhi.holc,inat.full,by="id")

Mutate Data (Per Area)

Note we needed to convert Tacoma temps to F to match king county

Daily Means

tac.wrc <- tac.wrc %>% mutate(DN_AM1=((DN_AM1*1.8)+32),DN_AF1=((DN_AF1*1.8)+32),DN_PM1=((DN_PM1*1.8)+32)) %>% mutate(mean.temp.daily=((DN_AM1+DN_AF1+DN_PM1)/3)) %>% mutate(dist.from.mean.daily=mean.temp.daily-(mean(mean.temp.daily,na.rm=TRUE))) %>% mutate(Area="Tacoma")

kc.wrc <- kc.wrc %>% mutate(mean.temp.daily=((DN_AM1+DN_AF1+DN_PM1)/3)) %>% mutate(dist.from.mean.daily=mean.temp.daily-(mean(mean.temp.daily,na.rm=TRUE))) %>% mutate(Area="King County")

pl.wrc <- pl.wrc %>% mutate(DN_AM1=((DN_AM1*1.8)+32),DN_AF1=((DN_AF1*1.8)+32),DN_PM1=((DN_PM1*1.8)+32)) %>% mutate(mean.temp.daily=((DN_AM1+DN_AF1+DN_PM1)/3)) %>% mutate(dist.from.mean.daily=mean.temp.daily-(mean(mean.temp.daily,na.rm=TRUE))) %>% mutate(Area="Portland")

Morning Means

tac.wrc <- tac.wrc %>% mutate(dist.from.mean.am=DN_AM1-(mean(DN_AM1,na.rm=TRUE)))

kc.wrc <- kc.wrc %>% mutate(dist.from.mean.am=DN_AM1-(mean(DN_AM1,na.rm=TRUE)))

pl.wrc <- pl.wrc %>% mutate(dist.from.mean.am=DN_AM1-(mean(DN_AM1,na.rm=TRUE)))

Afternoon Means

tac.wrc <- tac.wrc %>% mutate(dist.from.mean.af=DN_AF1-(mean(DN_AF1,na.rm=TRUE)))

kc.wrc <- kc.wrc %>% mutate(dist.from.mean.af=DN_AF1-(mean(DN_AF1,na.rm=TRUE)))

pl.wrc <- pl.wrc %>% mutate(dist.from.mean.af=DN_AF1-(mean(DN_AF1,na.rm=TRUE)))

Evening Means

tac.wrc <- tac.wrc %>% mutate(dist.from.mean.pm=DN_PM1-(mean(DN_PM1,na.rm=TRUE)))

kc.wrc <- kc.wrc %>% mutate(dist.from.mean.pm=DN_PM1-(mean(DN_PM1,na.rm=TRUE)))

pl.wrc <- pl.wrc %>% mutate(dist.from.mean.pm=DN_PM1-(mean(DN_PM1,na.rm=TRUE)))

Merge Data (Merge Areas)

data <- bind_rows(pl.wrc,tac.wrc,kc.wrc)

Clean Data

Some of the iNat project questions changed since it was created so some we need to adjust the answers to be more consistent throughout the project.

Clean iNaturalist Fields

data$field.number.of.additional.unhealthy.trees..of.same.species..in.area..within.sight. <- as.factor(data$field.number.of.additional.unhealthy.trees..of.same.species..in.area..within.sight.)
data$field.optional...what..other.factors..were.observed. <- as.factor(data$field.optional...what..other.factors..were.observed.)
data$field.tree.canopy.symptoms <- as.factor(data$field.tree.canopy.symptoms)
data$field.optional...slope.position <- as.factor(data$field.optional...slope.position)
data$field.optional...site.type <- as.factor(data$field.optional...site.type)
data$field.optional...site.location.description  <- as.factor(data$field.optional...site.location.description )
data$field.optional...tree.size <-as.factor(data$field.optional...tree.size)
data$field.other.factors...are.there.signs.or.symptoms.of.insect..diseases..or.other.damage.[data$field.other.factors...are.there.signs.or.symptoms.of.insect..diseases..or.other.damage.==""] <- "Not sure"
data$field.other.factors...are.there.signs.or.symptoms.of.insect..diseases..or.other.damage.[data$field.other.factors...are.there.signs.or.symptoms.of.insect..diseases..or.other.damage.=="Unsure"] <- "Not sure"
data$field.number.of.additional.unhealthy.trees..of.same.species..in.area..within.sight.[data$field.number.of.additional.unhealthy.trees..of.same.species..in.area..within.sight.=="4"] <- "4-6"
data$field.number.of.additional.unhealthy.trees..of.same.species..in.area..within.sight.[data$field.number.of.additional.unhealthy.trees..of.same.species..in.area..within.sight.=="5"] <- "4-6"
data$field.number.of.additional.unhealthy.trees..of.same.species..in.area..within.sight.[data$field.number.of.additional.unhealthy.trees..of.same.species..in.area..within.sight.=="2"] <- "2-3"
data$field.tree.canopy.symptoms[data$field.tree.canopy.symptoms=="Multiple Symptoms"] <-"Multiple Symptoms (please list in Notes)"
data$field.tree.canopy.symptoms[data$field.tree.canopy.symptoms=="multiple symptoms"] <-"Multiple Symptoms (please list in Notes)"
data$field.tree.canopy.symptoms[data$field.tree.canopy.symptoms=="thinning foliage"] <-"Thinning Canopy"
data$field.tree.canopy.symptoms[data$field.tree.canopy.symptoms=="healthy"] <-"Healthy"
data$field.tree.canopy.symptoms[data$field.tree.canopy.symptoms=="dead top"] <-"Old Dead Top (needles already gone)"
data$field.optional...what..other.factors..were.observed.[data$field.optional...what..other.factors..were.observed.=="Fungal Activitiy (mycelial fans, mushrooms at base, or conks on trunk)"] <-"Fungal Activitiy (mycelial fans, bleeding cankers, mushrooms at base, or conks on trunk)"
data$field.optional...what..other.factors..were.observed.[data$field.optional...what..other.factors..were.observed.=="Needle disease (dieback, checking, blight, etc.)"] <- "Needle or leaf disease (dieback, checking, blight, etc.)"
data$field.optional...slope.position[data$field.optional...slope.position=="Upper 1/3rd of a slope"] <-"Top of slope"
data$field.optional...site.type[data$field.optional...site.type=="Urban Natural"] <-"Urban"
data$field.optional...site.type[data$field.optional...site.type=="Urban Landscaped"] <-"Urban"
data$field.optional...site.type[data$field.optional...site.type=="Suburban Natural"] <-"Suburban"
data$field.optional...site.type[data$field.optional...site.type=="Suburban Lanscaped"] <-"Suburban"
data$field.optional...site.type[data$field.optional...site.type=="Natural Forest"] <-"Rural"
data$field.optional...tree.size[data$field.optional...tree.size=="Large"] <- "Large (too big to wrap arms around trunk)"
data$field.optional...tree.size[data$field.optional...tree.size=="Medium"] <- "Medium (can wrap arms around trunk)"
data$field.optional...tree.size[data$field.optional...tree.size=="Small"] <- "Small (can wrap hands around trunk)"
data$field.optional...tree.size[data$field.optional...tree.size=="Very Large"] <- "Very Large (would take many people to wrap arms around trunk)"
data$field.optional...tree.size[data$field.optional...tree.size=="Very small (can wrap a single hand around stem)"] <- "Very Small (can wrap a single hand around stem)"
## Warning in `[<-.factor`(`*tmp*`, data$field.optional...tree.size == "Very small
## (can wrap a single hand around stem)", : invalid factor level, NA generated
data$field.optional...site.location.description [data$field.optional...site.location.description =="Yard or open park grounds"] <- "Urban yard or open park grounds"

Reclassify co-factors

data$field.percent.canopy.affected.... <- as.factor(data$field.percent.canopy.affected....)
data$field.percent.canopy.affected....[data$field.percent.canopy.affected....=="1-25% of the crown is unhealthy"] <- "1-29% of the canopy is unhealthy"
data$field.percent.canopy.affected....[data$field.percent.canopy.affected....=="Healthy (0%)"] <- "Healthy, no dieback(0%)"
data$field.percent.canopy.affected....[data$field.percent.canopy.affected....=="Healthy (0% is unhealthy)"] <- "Healthy, no dieback(0%)"
data$field.percent.canopy.affected....[data$field.percent.canopy.affected....=="more than 75% of the crown is unhealthy"] <- "60-99% of the canopy is unhealthy"
data <- data %>% droplevels()

data$Percent.Dieback.Modified[data$field.percent.canopy.affected....=="Healthy, no dieback(0%)"] <- 0
data$Percent.Dieback.Modified[data$field.percent.canopy.affected....=="1-29% of the canopy is unhealthy"] <- 1
data$Percent.Dieback.Modified[data$field.percent.canopy.affected....=="30-59% of the canopy is unhealthy"] <- 30
data$Percent.Dieback.Modified[data$field.percent.canopy.affected....=="60-99% of the canopy is unhealthy"] <- 60
data$Percent.Dieback.Modified[data$field.percent.canopy.affected....=="tree is dead"] <- 100
#data$field.dieback.percent[is.na(data$field.dieback.percent)] <- data$Percent.Dieback.Modified

data$field.percent.canopy.affected....[data$Percent.Dieback.Modified==100] <- "tree is dead" #there were a couple healthy trees with 100 dieback somehow..

has.dieback.percent <- data %>% filter(!is.na(field.dieback.percent))
does.not.have.dieback.percent <- data %>% filter(is.na(field.dieback.percent)) 

has.dieback.percent$user.estimated.dieback <- "Yes"
does.not.have.dieback.percent$user.estimated.dieback <- "No"

does.not.have.dieback.percent$field.dieback.percent <- does.not.have.dieback.percent$Percent.Dieback.Modified

data <- rbind(has.dieback.percent,does.not.have.dieback.percent)

data$field.dieback.percent[data$field.dieback.percent<0] <- 0 # not sure why but there was one -2 percent dieback value
data <- data %>% mutate(tree.size.simplified=field.optional...tree.size) 
tree_size_level_key <- c("Very Small (can wrap a single hand around stem)" = "Small", "Small (can wrap hands around trunk)" = "Small", "Medium (can wrap arms around trunk)" = "Medium", "Large (too big to wrap arms around trunk)" = "Large", "Very Large (would take many people to wrap arms around trunk)" = "Large","Other"="Other","No selection"="No Selection")

data$tree.size.simplified <- recode_factor(data$tree.size.simplified, !!!tree_size_level_key)
data$tree.size.simplified <- as.factor(data$tree.size.simplified)

Reclassify response category variables

data <- data %>% filter(field.tree.canopy.symptoms!="Candelabra top or very old spike top (old growth)") %>% mutate(binary.tree.canopy.symptoms=field.tree.canopy.symptoms) %>% mutate(ordinal.tree.canopy.symptoms=field.tree.canopy.symptoms) %>% mutate(reclassified.tree.canopy.symptoms=field.tree.canopy.symptoms) %>% droplevels()
binary_level_key <- c("Healthy" = "Healthy", "Thinning Canopy" = "Unhealthy", "New Dead Top (red or brown needles still attached)" = "Unhealthy", "Old Dead Top (needles already gone)" = "Unhealthy", "Tree is dead" = "Unhealthy", "Multiple Symptoms (please list in Notes)" = "Unhealthy", "Extra Cone Crop" = "Unhealthy", "Browning Canopy" = "Unhealthy","Branch Dieback or 'Flagging'" = "Unhealthy", "Other (please describe in Notes)" = "Unhealthy", "Yellowing Canopy" = "Unhealthy")

data$binary.tree.canopy.symptoms <- recode_factor(data$binary.tree.canopy.symptoms, !!!binary_level_key)
#levels(binary$field.tree.canopy.symptoms)
data$binary.tree.canopy.symptoms <- as.factor(data$binary.tree.canopy.symptoms)
ordinal_level_key <- c("Healthy" = "Healthy", "Thinning Canopy" = "Unhealthy", "New Dead Top (red or brown needles still attached)" = "Unhealthy", "Old Dead Top (needles already gone)" = "Unhealthy", "Tree is dead" = "Dead", "Multiple Symptoms (please list in Notes)" = "Unhealthy", "Extra Cone Crop" = "Unhealthy", "Browning Canopy" = "Unhealthy","Branch Dieback or 'Flagging'" = "Unhealthy", "Other (please describe in Notes)" = "Unhealthy", "Yellowing Canopy" = "Unhealthy")

data$ordinal.tree.canopy.symptoms <- recode_factor(data$ordinal.tree.canopy.symptoms, !!!ordinal_level_key)
#levels(binary$field.tree.canopy.symptoms)
data$ordinal.tree.canopy.symptoms <- as.factor(data$ordinal.tree.canopy.symptoms)
reclassified_level_key <- c("Healthy" = "Healthy", "Thinning Canopy" = "Thinning Canopy", "New Dead Top (red or brown needles still attached)" = "Dead Top", "Old Dead Top (needles already gone)" = "Dead Top", "Tree is dead" = "Tree is Dead", "Multiple Symptoms (please list in Notes)" = "Other", "Extra Cone Crop" = "Other", "Browning Canopy" = "Other","Branch Dieback or 'Flagging'" = "Other", "Other (please describe in Notes)" = "Other", "Yellowing Canopy" = "Other")

data$reclassified.tree.canopy.symptoms <- recode_factor(data$reclassified.tree.canopy.symptoms, !!!reclassified_level_key)
#levels(binary$field.tree.canopy.symptoms)
data$reclassified.tree.canopy.symptoms <- as.factor(data$reclassified.tree.canopy.symptoms)

Create Binary response for dead top

data$top.dieback[data$reclassified.tree.canopy.symptoms=="Dead Top"] <- "Yes"
data$top.dieback[data$reclassified.tree.canopy.symptoms!="Dead Top"] <- "No"
data$top.dieback <- as.factor(data$top.dieback)
levels(data$top.dieback)
## [1] "No"  "Yes"
summary(data$top.dieback)
##   No  Yes 
## 1217  118

Create Binary response for thinning

data$thinning[data$reclassified.tree.canopy.symptoms=="Thinning Canopy"] <- "Yes"
data$thinning[data$reclassified.tree.canopy.symptoms!="Thinning Canopy"] <- "No"
data$thinning <- as.factor(data$thinning)
levels(data$thinning)
## [1] "No"  "Yes"
summary(data$thinning)
##   No  Yes 
## 1159  176

Create Binary response for dead

data$dead[data$reclassified.tree.canopy.symptoms=="Tree is Dead"] <- "Yes"
data$dead[data$reclassified.tree.canopy.symptoms!="Tree is Dead"] <- "No"
data$dead <- as.factor(data$dead)
levels(data$dead)
## [1] "No"  "Yes"
summary(data$dead)
##   No  Yes 
## 1280   55

Mutate Merged Data

Convert Percent to Proportion

data <- data %>% mutate(field.dieback.prop = (field.dieback.percent/100))

Filter Merged Data

Filter trees wihtout temperature data

daily.error <- data %>% filter(is.na(data$mean.temp.daily))

33 trees did not have temperature data and were filtered out.

Remove other species

A few observations were not identified to species or were determined to be other species by the iNat community.

levels(as.factor(data$scientific_name))
##  [1] ""                          "Callitropsis nootkatensis"
##  [3] "Chamaecyparis lawsoniana"  "Cupressaceae"             
##  [5] "Cupressoideae"             "Pinales"                  
##  [7] "Plantae"                   "Pseudotsuga menziesii"    
##  [9] "Sequoia sempervirens"      "Thuja plicata"            
## [11] "Tracheophyta"
other.species <- data %>% filter(scientific_name!="Thuja plicata") %>% droplevels()
data <- data %>% filter(scientific_name=="Thuja plicata") %>% droplevels()

41 observations were identified as other species by the iNaturalist community or were not identified to species.

Remove outliers

Afternoon temperature had a handful of weird observations with less than -8 degrees from mean.

ggplot(data,aes(dist.from.mean.af,field.dieback.percent))+geom_point()+theme_bw()
## Warning: Removed 33 rows containing missing values or values outside the scale range
## (`geom_point()`).

Three trees were removed becasue they came from areas at less than 8 degrees F lower than the mean. Perhaps the geo-references were off.

data <- data %>% filter(dist.from.mean.af>(-8))

Export Modified Data

write.csv(data,file="./urban-data-modified.csv")

urban-data-modified.csv

Please use this data in analyses. Please make any changes or corrections to the data in this R markdown so everyone is using the same dataset in the analyses.