In this workshop we will focus primarily on two workflow tools. The first is using RStudio Projects and the ‘here’ package to create a project-oriented workflow in your R scripts. The second is using R Markdown for literate programming. We will briefly review R Markdown documents, which you can use to write complete papers. We will take a look at the new features in Visual R Markdown, such as citations and technical writing.
This is the third of four sessions in our series “Reproducible Research for Early Graduate Students.” The remaining session is “Sharing Your Data for Transparent and Reproducible Research.”
Teaching Slides are available, and mirror the content in this document.
In this workshop, you will learn:
This is not a complete workshop on how to use the many options provided within R Markdown documents.
herepackage. Installing and updating instructions are here. When reviewing the instructions, substitute the
herepackage whenever you see the
The working directory is the place that R looks for your files. There is always one set. You can check with
A lot of R scripts start off like this:
It uses the
setwd() function to set the working directory. The problem with this is that there is a 0% chance that this will work on someone else’s computer, and will very likely break on your computer if (and when) you move your directory around.
setwd() is fragile - it breaks! If you actually want this to work on someone else’s computer - like an advisor or a collaborator - then that line of code is not going to do it’s job (i.e., set the working directory to the proper folder) because the path won’t be any good on the other person’s computer. Same is true if you move your files around. This is annoying for everyone!
Similarly, you may have seen this line, often at the top of scripts:
rm(list = ls()). That line is good at removing objects from your environment, but doesn’t give your a fresh R process, specifically,
rm(list = ls()) does not detach libraries. There is a better way!
Aka “little p projects.”
First, let’s make sure we set up our work into projects.
Step 1: Organize your work into “projects.” That means all the files in a project are in a single directory, aka folder. Check out UVA Library’s workshop Organize Your Files and Metadata for Transparent and Reproducible Research for more details.
Step 2: When working on a project, set your working directory to that project’s main directory. Be intentional about using that working directory. Use the tools that we will cover below, and avoid using
setwd() and absolute paths
Step 3: Be disciplined about using your files paths. The working directory is the main directory, and all other paths are relative to that directory.
Aka “capital P Projects.”
RStudio can help you solve all of these problems with RStudio Projects. When you launch an RStudio Project, RStudio will:
This solves the problems posed by both
rm(list = ls())!
Let’s say you are working on one project, and you want to switch to another project. When you want to open a new Project, RStudio will:
If you want to run multiple projects at the same time, you can do that!
As an example, we will imagine you are working on two projects:
You can designate a new or existing folder as an RStudio Project. All that means is that RStudio leaves behind a file, e.g.,
worms.Rproj. This file stores specific settings for that project.
In RStudio, use File…New Project or click on the R cube in the upper right to get started. Create a fresh project called
Once you do that, close RStudio. Now go to where you saved the
worms project folder using Windows Explorer or Mac Finder. Double click on your
Now, click on the little down arrow next to the R cube in the upper right, and open the
penguins project you downloaded.
Notice that you get a fresh environment when you switch back and forth.
Using Windows Explorer or Mac Finder, go to where you saved your new
worms project and double click on the
worms.rproj file. It will launch your project, and you can run multiple instances of R at the same time. You should have both
penguins projects open at the same time. They are running in completely separate environments.
You might be used to cutting tables or plots from your statistical package, and pasting them into a Word document. Now, imagine that your advisor suggests you make changes to your analysis. Did you remember to update the table correctly in Word? What code generated that table in the first place?
Manual copy-and-paste is a huge opportunity for errors and confusion. It’s super tedious and no one wants to spend their time doing it or figuring out what code made what table. R Markdown solves that problem for you. R Markdown allows you to embed R code into narrative text, and then format that text into PDF, HTML, Word, and so on. Now when you need to make changes, you can make changes directly to the code embedded in the R Markdown document. No more cutting and pasting! This is better for reproducibility and transparency, and more simply, makes your life easier.
As noted in R Markdown: The Definitive Guide, with R Markdown, you can:
Visual markdown editing is available in RStudio v1.4 or higher. This brings some of the features and ease-of-use that GUIs provide, while still always maintaining the source code that is the .Rmd file (R Markdown file).
To switch to visual editing, click on the compass in the upper right of the editor pane.
One of the highlights of the new Visual Editor is how easy it is to insert citations with Zotero.
here package enables easy file referencing with project-oriented workflows. Paths are created relative to the top-level directory. It is most useful when you work with R Markdown documents a lot, as it smoothes out some of the idiosyncrasies of how R Markdown looks for files.
The here package always points to the root directory, which is typically where we set the working directory with our RStudio Project.
## here() starts at C:/Users/jah2ax.ESERVICES/Box Sync/_R/workshops/workshops_teaching/repro_analysis_documentation_R/website
We are going to use the
penguins demo project to explore the
here package and R Markdown documents simultaneously.
penguins has the following structure:
## C:/Users/jah2ax.ESERVICES/Box Sync/_R/workshops/workshops_teaching/penguins ## +-- analysis ## | +-- apa.csl ## | +-- references.bib ## | +-- report-fail.Rmd ## | \-- report-success.Rmd ## +-- data ## +-- penguins.Rproj ## +-- prepare ## | \-- penguins.R ## \-- README.md
Note that this implements the essential “little p” project structure, by separating content into clear subdirectories.
Let’s take a look at the
penguins project now. The instructions for downloading the project are in “Get Ready” section of this page. (Some of the penguins demo project is taken from the
here() project demo files, specifically the file structure, and the sections related to
Reach out! firstname.lastname@example.org