In this workshop we will focus primarily on two workflow tools. We will use:
RStudio Projects to create a project-oriented workflow in your R scripts, and
R Markdown for literate programming. We will review R Markdown documents, which you can use to write complete papers. We will also look at the new features in Visual R Markdown, such as citations and technical writing.
This is the third of four sessions in our series “Reproducible Research Practices: Make Your Research Life Easier.” The remaining session is “Sharing Your Data for Transparent and Reproducible Research.”
Teaching Slides are available, and mirror the content in this document.
In this workshop, you will learn:
Note: This is not a complete workshop on how to use the many options provided within R Markdown documents. We will cover some details, but not all.
The working directory is the place that R looks for your files. There is always one set. You can check with getwd()
.
A lot of R scripts start off like this:
setwd("C:/Users/jah2ax.ESERVICES/Box Sync/_R/research/penguins")
It uses the setwd()
function to set the working directory. The problem with this is that there is a 0% chance that this will work on someone else’s computer, and will very likely break on your computer if (and when) you move your directory around. setwd()
is fragile - it breaks! If you actually want this to work on someone else’s computer - like an advisor or a collaborator - then that line of code is not going to do it’s job (i.e., set the working directory to the proper folder) because the path won’t be any good on the other person’s computer. Same is true if you move your files around. This is annoying for everyone!
Similarly, you may have seen this line, often at the top of scripts: rm(list = ls())
. That line is good at removing objects from your environment, but doesn’t give your a fresh R process, specifically, rm(list = ls())
does not detach libraries. There is a better way!
References: (Bryan and Hester, n.d.a)(Bryan and Hester, n.d.b)(Wickham and Grolemund, n.d.)
Aka “little p projects.”
First, let’s make sure we set up our work into projects.
Step 1: Organize your work into “projects.” That means all the files in a project are in a single directory, aka folder. Check out UVA Library’s workshop Organize Your Files and Metadata for Transparent and Reproducible Research for a detailed discussion of how to set up “little p projects.”
Step 2: When working on a project, set your working directory to that project’s main directory. Be intentional about using that working directory. Use the tools that we will cover below, and avoid using setwd()
and absolute paths
Step 3: Be disciplined about using relative file paths. The working directory is the main directory, and all other paths are relative to that directory.
Aka “capital P Projects.”
RStudio can help you solve all of these problems with RStudio Projects. When you launch an RStudio Project, RStudio will:
This solves the problems posed by both setwd()
and rm(list = ls())
!
Let’s say you are working on one project, and you want to switch to another project. When you want to open a new Project, RStudio will:
If you want to run multiple projects at the same time, you can do that!
As an example, we will imagine you are working on two projects: worms
and penguins
.
You can designate a new or existing folder as an RStudio Project. All that means is that RStudio leaves behind a helper file, e.g., worms.Rproj
. This file stores specific settings for that project.
In RStudio, use File…New Project…New Directory or click on the R cube in the upper right to get started. Create a fresh project called worms
.
Now click on the little down arrow next to the R cube in the upper right, click Open Project, and open the penguins
project you downloaded.
Notice that you get a fresh environment when you switch back and forth.
Using Windows Explorer or Mac Finder, go to where you saved the worms
project. Double click on the worms.rproj
file. RStudio will launch your project, and you can run multiple instances of R at the same time. You should have both worms
and penguins
projects open at the same time, using two RStudio instances. They are running in completely separate environments.
You might be used to cutting tables or plots from your statistical package, and pasting them into a Word document. Now, imagine that your advisor suggests you make changes to your analysis. Did you remember to update the table correctly in Word? What code generated that table in the first place?
Manual copy-and-paste is a huge opportunity for errors and confusion. It’s super tedious and no one wants to spend their time doing it or figuring out what code made what table. R Markdown solves that problem for you. R Markdown allows you to embed R code into narrative text, and then format that text into PDF, HTML, Word, and so on. Now when you need to make changes, you can make changes directly to the code embedded in the R Markdown document. No more cutting and pasting! This is better for reproducibility and transparency, and more simply, makes your life easier.
As noted in R Markdown: The Definitive Guide, with R Markdown, you can:
Visual markdown editing is available in RStudio v1.4 or higher. This brings some of the features and ease-of-use that GUIs provide, while still always maintaining the source code that is the .Rmd file (R Markdown file).
To switch to visual editing, click on the compass in the upper right of the editor pane.
One of the highlights of the new Visual Editor is how easy it is to insert citations with Zotero.
References: (Grolemund, n.d.)(RStudio, n.d.a)(RStudio, n.d.b)(RStudio, n.d.c)
The here
package enables easy file referencing with project-oriented workflows. Paths are created relative to the top-level directory. It is most useful when you work with R Markdown documents a lot, as it smoothes out some of the idiosyncrasies of how R Markdown looks for files.
The here package always points to the root directory, which is typically where we set the working directory with our RStudio Project.
here::i_am("repro_analysis_R_RStudio.Rmd")
## here() starts at C:/Users/jah2ax.ESERVICES/Box Sync/_R/workshops/workshops_teaching/repro_analysis_documentation_R/website
References: (Barrett 2018)(Richmond, n.d.)(Müller and Bryan, n.d.a)(Müller and Bryan, n.d.b)
We are going to use the penguins
demo project to explore R Markdown documents. penguins
has the following structure:
## C:/Users/jah2ax.ESERVICES/Documents/GitHub/penguins
## +-- data
## | \-- empty.csv
## +-- documents
## | +-- apa.csl
## | +-- references.bib
## | \-- report.Rmd
## +-- penguins.Rproj
## +-- README.md
## \-- scripts
## \-- 01-penguins.R
Note that this implements the essential “little p” project structure, by separating content into clear subdirectories.
Let’s take a look at the penguins
project now. The instructions for downloading the project are in “Get Ready” section of this page. (Some of the penguins demo project is taken from the here()
project demo files, specifically the file structure, and the sections related to here
.)
Reach out to our statistical consultants! statlab@virginia.edu
General Reproducible Research in R and RStudio:
RStudio Projects:
R Markdown:
R Markdown Cheat Sheets:
Bonus: here
package: