Introduction

Welcome to the fifth session of our workshop series. The purpose of this page is to serve as a guide for the material we will cover during the session. We will recap on methods to summarize and transform data using the dplyr package and have a quick introduction to R markdown.

library(tidyverse)

Cheatsheets

Remember there is a nice cheatsheet for dplyr available here: dplyr cheatsheet - click to download.

More cheatsheets for other packages and updates can be found here: https://www.rstudio.com/resources/cheatsheets/.

Index our data

data <- read.csv('Silver Tree Study.csv')

Filter

data.filtered <- data %>% filter(Photosynthesis<300)
nrow(data)
## [1] 1722
nrow(data.filtered)
## [1] 1712

Now we can see 10 rows have been effectively filtered off.

Mutate

data.filtered.new.column <- data %>% mutate(new.column = Photosynthesis/Conductance)
ncol(data.filtered)
## [1] 14
ncol(data.filtered.new.column)
## [1] 15

Now we can see there is an additional column

Summarize

Treatment.mean <- data.filtered %>% group_by(Treatment,Species) %>% summarize(n=n(), mean=mean(Photosynthesis),sd=sd(Photosynthesis),se = sd / sqrt(n),lowse = (mean-se),highse = (mean+se))
knitr::kable(Treatment.mean,align="c") ## note the kniter:: is telling r to look for the kable command in the knitter package. If you index(load) the knittr package at the beginning of your session, you can just write kable()
Treatment Species n mean sd se lowse highse
Drought Both Pathogens 52 0.5825541 0.7350891 0.1019385 0.4806156 0.6844927
Drought Control 246 1.5864346 1.7284649 0.1102029 1.4762317 1.6966375
Drought Exotic Pathogen 249 1.0877087 2.3577691 0.1494175 0.9382912 1.2371263
Drought Indigenous Pathogen 274 1.0465248 3.0850090 0.1863722 0.8601526 1.2328970
Wet Both Pathogens 20 2.7250411 2.5390778 0.5677551 2.1572860 3.2927961
Wet Control 317 6.7635461 3.0075976 0.1689235 6.5946226 6.9324696
Wet Exotic Pathogen 259 4.3032425 4.1998195 0.2609641 4.0422784 4.5642066
Wet Indigenous Pathogen 295 6.4291194 3.4054379 0.1982723 6.2308471 6.6273917

Merge datasets

Lets calculate the proportion of plants in each treatment on each day measured.

First we need to calculate the total number of plants on each day

Total.plants.per.day <- data.filtered %>% group_by(Days.after.inoculation) %>% summarize(Total=n_distinct(Unique.Sample.Number))

Now lets calculate the number of plants in each treatment per day

Plants.per.treatment.per.day <- data.filtered %>% group_by(Treatment,Days.after.inoculation) %>% summarize(Number.of.Plants=n_distinct(Unique.Sample.Number))

Now we can merge the summary tables based on days after inoculation

Plants.overall <- left_join(Plants.per.treatment.per.day,Total.plants.per.day,by="Days.after.inoculation") #join matching values from total.plants.per.day to plants.per.treatment.per.day)

Now we can caclulate the proportions

Plants.overall <- Plants.overall %>% mutate(Proportion=Number.of.Plants/Total)
kable(Plants.overall,align="c")
Treatment Days.after.inoculation Number.of.Plants Total Proportion
Drought 3 1 2 0.5000000
Drought 5 1 6 0.1666667
Drought 6 7 15 0.4666667
Drought 9 15 27 0.5555556
Drought 13 5 12 0.4166667
Drought 17 6 13 0.4615385
Drought 22 14 29 0.4827586
Drought 35 14 26 0.5384615
Drought 37 15 30 0.5000000
Wet 3 1 2 0.5000000
Wet 5 5 6 0.8333333
Wet 6 8 15 0.5333333
Wet 9 12 27 0.4444444
Wet 13 7 12 0.5833333
Wet 17 7 13 0.5384615
Wet 22 15 29 0.5172414
Wet 35 12 26 0.4615385
Wet 36 1 1 1.0000000
Wet 37 15 30 0.5000000

hmm lets round the propotion values by adding the round() command to above code

Plants.overall <- Plants.overall %>% mutate(Proportion=round(Number.of.Plants/Total,2))
kable(Plants.overall,align="c")
Treatment Days.after.inoculation Number.of.Plants Total Proportion
Drought 3 1 2 0.50
Drought 5 1 6 0.17
Drought 6 7 15 0.47
Drought 9 15 27 0.56
Drought 13 5 12 0.42
Drought 17 6 13 0.46
Drought 22 14 29 0.48
Drought 35 14 26 0.54
Drought 37 15 30 0.50
Wet 3 1 2 0.50
Wet 5 5 6 0.83
Wet 6 8 15 0.53
Wet 9 12 27 0.44
Wet 13 7 12 0.58
Wet 17 7 13 0.54
Wet 22 15 29 0.52
Wet 35 12 26 0.46
Wet 36 1 1 1.00
Wet 37 15 30 0.50

Other handy packages in tidyverse

Stringr

“The stringr package provides an easy to use toolkit for working with strings, i.e. character data, in R. This cheatsheet guides you through stringr’s functions for manipulating strings. The back page provides a concise reference to regular expresssions, a mini-language for describing, finding, and matching patterns in strings.” - RStudio.

data.with.new.column <- data.filtered %>% mutate(and.and.and = "one & two & three") #and.and.and is the name of the new column, which is just a 'string' of text copied in every row.
levels(data.with.new.column$and.and.and)
## NULL
data.new.rows <-separate_rows(data.with.new.column, and.and.and, sep = "&")
levels(data.new.rows$and.and.and)
## NULL

forcats

“Factors are R’s data structure for categorical data. The forcats package makes it easy to work with factors. This cheatsheet reminds you how to make factors, reorder their levels, recode their values, and more” - RStudio.

R Markdown

“R Markdown is an authoring format that makes it easy to write reusable reports with R. You combine your R code with narration written in markdown (an easy-to-write plain text format) and then export the results as an html, pdf, or Word file. You can even use R Markdown to build interactive documents and slideshows.” - RStudio