R for Biology Data Science - Session 5 - Introduction to dplyr part 2

Workshop Series

Introduction

Welcome to the fifth session of our workshop series. The purpose of this page is to serve as a guide for the material we will cover during the session. We will recap on methods to summarize and transform data using the dplyr package and have a quick introduction to R markdown.

library(tidyverse)

Cheatsheets

Remember there is a nice cheatsheet for dplyr available here: dplyr cheatsheet - click to download.

More cheatsheets for other packages and updates can be found here: https://www.rstudio.com/resources/cheatsheets/.

Index our data

data <- read.csv('Silver Tree Study.csv')

Filter

data.filtered <- data %>% filter(Photosynthesis<300)

nrow(data)

## [1] 1722

nrow(data.filtered)

## [1] 1712

Now we can see 10 rows have been effectively filtered off.

Mutate

data.filtered.new.column <- data %>% mutate(new.column = Photosynthesis/Conductance)

ncol(data.filtered)

## [1] 14

ncol(data.filtered.new.column)

## [1] 15

Now we can see there is an additional column

Summarize

Treatment.mean <- data.filtered %>% group_by(Treatment,Species) %>% summarize(n=n(), mean=mean(Photosynthesis),sd=sd(Photosynthesis),se = sd / sqrt(n),lowse = (mean-se),highse = (mean+se))
knitr::kable(Treatment.mean,align="c") ## note the kniter:: is telling r to look for the kable command in the knitter package. If you index(load) the knittr package at the beginning of your session, you can just write kable()

Treatment	Species	n	mean	sd	se	lowse	highse
Drought	Both Pathogens	52	0.5825541	0.7350891	0.1019385	0.4806156	0.6844927
Drought	Control	246	1.5864346	1.7284649	0.1102029	1.4762317	1.6966375
Drought	Exotic Pathogen	249	1.0877087	2.3577691	0.1494175	0.9382912	1.2371263
Drought	Indigenous Pathogen	274	1.0465248	3.0850090	0.1863722	0.8601526	1.2328970
Wet	Both Pathogens	20	2.7250411	2.5390778	0.5677551	2.1572860	3.2927961
Wet	Control	317	6.7635461	3.0075976	0.1689235	6.5946226	6.9324696
Wet	Exotic Pathogen	259	4.3032425	4.1998195	0.2609641	4.0422784	4.5642066
Wet	Indigenous Pathogen	295	6.4291194	3.4054379	0.1982723	6.2308471	6.6273917

Merge datasets

Lets calculate the proportion of plants in each treatment on each day measured.

First we need to calculate the total number of plants on each day

Total.plants.per.day <- data.filtered %>% group_by(Days.after.inoculation) %>% summarize(Total=n_distinct(Unique.Sample.Number))

Now lets calculate the number of plants in each treatment per day

Plants.per.treatment.per.day <- data.filtered %>% group_by(Treatment,Days.after.inoculation) %>% summarize(Number.of.Plants=n_distinct(Unique.Sample.Number))

Now we can merge the summary tables based on days after inoculation

Plants.overall <- left_join(Plants.per.treatment.per.day,Total.plants.per.day,by="Days.after.inoculation") #join matching values from total.plants.per.day to plants.per.treatment.per.day)

Now we can caclulate the proportions

Plants.overall <- Plants.overall %>% mutate(Proportion=Number.of.Plants/Total)
kable(Plants.overall,align="c")

Treatment	Days.after.inoculation	Number.of.Plants	Total	Proportion
Drought	3	1	2	0.5000000
Drought	5	1	6	0.1666667
Drought	6	7	15	0.4666667
Drought	9	15	27	0.5555556
Drought	13	5	12	0.4166667
Drought	17	6	13	0.4615385
Drought	22	14	29	0.4827586
Drought	35	14	26	0.5384615
Drought	37	15	30	0.5000000
Wet	3	1	2	0.5000000
Wet	5	5	6	0.8333333
Wet	6	8	15	0.5333333
Wet	9	12	27	0.4444444
Wet	13	7	12	0.5833333
Wet	17	7	13	0.5384615
Wet	22	15	29	0.5172414
Wet	35	12	26	0.4615385
Wet	36	1	1	1.0000000
Wet	37	15	30	0.5000000

hmm lets round the propotion values by adding the round() command to above code

Plants.overall <- Plants.overall %>% mutate(Proportion=round(Number.of.Plants/Total,2))
kable(Plants.overall,align="c")

Treatment	Days.after.inoculation	Number.of.Plants	Total	Proportion
Drought	3	1	2	0.50
Drought	5	1	6	0.17
Drought	6	7	15	0.47
Drought	9	15	27	0.56
Drought	13	5	12	0.42
Drought	17	6	13	0.46
Drought	22	14	29	0.48
Drought	35	14	26	0.54
Drought	37	15	30	0.50
Wet	3	1	2	0.50
Wet	5	5	6	0.83
Wet	6	8	15	0.53
Wet	9	12	27	0.44
Wet	13	7	12	0.58
Wet	17	7	13	0.54
Wet	22	15	29	0.52
Wet	35	12	26	0.46
Wet	36	1	1	1.00
Wet	37	15	30	0.50

Other handy packages in tidyverse

Stringr

“The stringr package provides an easy to use toolkit for working with strings, i.e. character data, in R. This cheatsheet guides you through stringr’s functions for manipulating strings. The back page provides a concise reference to regular expresssions, a mini-language for describing, finding, and matching patterns in strings.” - RStudio.

data.with.new.column <- data.filtered %>% mutate(and.and.and = "one & two & three") #and.and.and is the name of the new column, which is just a 'string' of text copied in every row.
levels(data.with.new.column$and.and.and)

## NULL

data.new.rows <-separate_rows(data.with.new.column, and.and.and, sep = "&")
levels(data.new.rows$and.and.and)

## NULL

forcats

“Factors are R’s data structure for categorical data. The forcats package makes it easy to work with factors. This cheatsheet reminds you how to make factors, reorder their levels, recode their values, and more” - RStudio.

R Markdown

“R Markdown is an authoring format that makes it easy to write reusable reports with R. You combine your R code with narration written in markdown (an easy-to-write plain text format) and then export the results as an html, pdf, or Word file. You can even use R Markdown to build interactive documents and slideshows.” - RStudio

Here are some critical links
- R Markdown cheatsheet
- R Markdown: The Definitive Guide