Welcome to the fifth session of our workshop series. The purpose of this page is to serve as a guide for the material we will cover during the session. We will recap on methods to summarize and transform data using the dplyr package and have a quick introduction to R markdown.
library(tidyverse)
Remember there is a nice cheatsheet for dplyr available here: dplyr cheatsheet - click to download.
More cheatsheets for other packages and updates can be found here: https://www.rstudio.com/resources/cheatsheets/.
data <- read.csv('Silver Tree Study.csv')
data.filtered <- data %>% filter(Photosynthesis<300)
nrow(data)
## [1] 1722
nrow(data.filtered)
## [1] 1712
Now we can see 10 rows have been effectively filtered off.
data.filtered.new.column <- data %>% mutate(new.column = Photosynthesis/Conductance)
ncol(data.filtered)
## [1] 14
ncol(data.filtered.new.column)
## [1] 15
Now we can see there is an additional column
Treatment.mean <- data.filtered %>% group_by(Treatment,Species) %>% summarize(n=n(), mean=mean(Photosynthesis),sd=sd(Photosynthesis),se = sd / sqrt(n),lowse = (mean-se),highse = (mean+se))
knitr::kable(Treatment.mean,align="c") ## note the kniter:: is telling r to look for the kable command in the knitter package. If you index(load) the knittr package at the beginning of your session, you can just write kable()
Treatment | Species | n | mean | sd | se | lowse | highse |
---|---|---|---|---|---|---|---|
Drought | Both Pathogens | 52 | 0.5825541 | 0.7350891 | 0.1019385 | 0.4806156 | 0.6844927 |
Drought | Control | 246 | 1.5864346 | 1.7284649 | 0.1102029 | 1.4762317 | 1.6966375 |
Drought | Exotic Pathogen | 249 | 1.0877087 | 2.3577691 | 0.1494175 | 0.9382912 | 1.2371263 |
Drought | Indigenous Pathogen | 274 | 1.0465248 | 3.0850090 | 0.1863722 | 0.8601526 | 1.2328970 |
Wet | Both Pathogens | 20 | 2.7250411 | 2.5390778 | 0.5677551 | 2.1572860 | 3.2927961 |
Wet | Control | 317 | 6.7635461 | 3.0075976 | 0.1689235 | 6.5946226 | 6.9324696 |
Wet | Exotic Pathogen | 259 | 4.3032425 | 4.1998195 | 0.2609641 | 4.0422784 | 4.5642066 |
Wet | Indigenous Pathogen | 295 | 6.4291194 | 3.4054379 | 0.1982723 | 6.2308471 | 6.6273917 |
Lets calculate the proportion of plants in each treatment on each day measured.
First we need to calculate the total number of plants on each day
Total.plants.per.day <- data.filtered %>% group_by(Days.after.inoculation) %>% summarize(Total=n_distinct(Unique.Sample.Number))
Now lets calculate the number of plants in each treatment per day
Plants.per.treatment.per.day <- data.filtered %>% group_by(Treatment,Days.after.inoculation) %>% summarize(Number.of.Plants=n_distinct(Unique.Sample.Number))
Now we can merge the summary tables based on days after inoculation
Plants.overall <- left_join(Plants.per.treatment.per.day,Total.plants.per.day,by="Days.after.inoculation") #join matching values from total.plants.per.day to plants.per.treatment.per.day)
Now we can caclulate the proportions
Plants.overall <- Plants.overall %>% mutate(Proportion=Number.of.Plants/Total)
kable(Plants.overall,align="c")
Treatment | Days.after.inoculation | Number.of.Plants | Total | Proportion |
---|---|---|---|---|
Drought | 3 | 1 | 2 | 0.5000000 |
Drought | 5 | 1 | 6 | 0.1666667 |
Drought | 6 | 7 | 15 | 0.4666667 |
Drought | 9 | 15 | 27 | 0.5555556 |
Drought | 13 | 5 | 12 | 0.4166667 |
Drought | 17 | 6 | 13 | 0.4615385 |
Drought | 22 | 14 | 29 | 0.4827586 |
Drought | 35 | 14 | 26 | 0.5384615 |
Drought | 37 | 15 | 30 | 0.5000000 |
Wet | 3 | 1 | 2 | 0.5000000 |
Wet | 5 | 5 | 6 | 0.8333333 |
Wet | 6 | 8 | 15 | 0.5333333 |
Wet | 9 | 12 | 27 | 0.4444444 |
Wet | 13 | 7 | 12 | 0.5833333 |
Wet | 17 | 7 | 13 | 0.5384615 |
Wet | 22 | 15 | 29 | 0.5172414 |
Wet | 35 | 12 | 26 | 0.4615385 |
Wet | 36 | 1 | 1 | 1.0000000 |
Wet | 37 | 15 | 30 | 0.5000000 |
hmm lets round the propotion values by adding the round() command to above code
Plants.overall <- Plants.overall %>% mutate(Proportion=round(Number.of.Plants/Total,2))
kable(Plants.overall,align="c")
Treatment | Days.after.inoculation | Number.of.Plants | Total | Proportion |
---|---|---|---|---|
Drought | 3 | 1 | 2 | 0.50 |
Drought | 5 | 1 | 6 | 0.17 |
Drought | 6 | 7 | 15 | 0.47 |
Drought | 9 | 15 | 27 | 0.56 |
Drought | 13 | 5 | 12 | 0.42 |
Drought | 17 | 6 | 13 | 0.46 |
Drought | 22 | 14 | 29 | 0.48 |
Drought | 35 | 14 | 26 | 0.54 |
Drought | 37 | 15 | 30 | 0.50 |
Wet | 3 | 1 | 2 | 0.50 |
Wet | 5 | 5 | 6 | 0.83 |
Wet | 6 | 8 | 15 | 0.53 |
Wet | 9 | 12 | 27 | 0.44 |
Wet | 13 | 7 | 12 | 0.58 |
Wet | 17 | 7 | 13 | 0.54 |
Wet | 22 | 15 | 29 | 0.52 |
Wet | 35 | 12 | 26 | 0.46 |
Wet | 36 | 1 | 1 | 1.00 |
Wet | 37 | 15 | 30 | 0.50 |
“The stringr package provides an easy to use toolkit for working with strings, i.e. character data, in R. This cheatsheet guides you through stringr’s functions for manipulating strings. The back page provides a concise reference to regular expresssions, a mini-language for describing, finding, and matching patterns in strings.” - RStudio.
data.with.new.column <- data.filtered %>% mutate(and.and.and = "one & two & three") #and.and.and is the name of the new column, which is just a 'string' of text copied in every row.
levels(data.with.new.column$and.and.and)
## NULL
data.new.rows <-separate_rows(data.with.new.column, and.and.and, sep = "&")
levels(data.new.rows$and.and.and)
## NULL
“Factors are R’s data structure for categorical data. The forcats package makes it easy to work with factors. This cheatsheet reminds you how to make factors, reorder their levels, recode their values, and more” - RStudio.
“R Markdown is an authoring format that makes it easy to write reusable reports with R. You combine your R code with narration written in markdown (an easy-to-write plain text format) and then export the results as an html, pdf, or Word file. You can even use R Markdown to build interactive documents and slideshows.” - RStudio