These are the goals for today’s workshop:
The intended audience is beginner-level, with no previous experience using R.
We will focus on using R for data analysis throughout this series.
R is the underlying statistical computing environment. You can think of this like the engine of a car. That makes RStudio like the dashboard1.
Basically, RStudio makes it easier to use R because it is easier to run and execute code. After you install R and RStudio, you only need to run RStudio.
This is what RStudio looks like when you first open it:
RStudio shows four panes by default. The two most important are the Console (bottom left) and the Script Editor (top left).
>
and press Enter to execute.?
before a function name in the console to get info in the Help section.Use scripts to save your work for future analysis. Scripts are an essential part of reproducibility, either for collaborators, or your future self. You should rely on them rather than clicking through a graphical user interface. R script files end with “.R”
We will start by looking at two ways to set your working directory: with the setwd()
function and with RStudio projects.
R always has a working directory set. The working directory is where R looks for your files, that is, where it looks for your scripts and data. R will look for other files and directories by starting at the root of your working directory. The working directory can be any directory (aka folder) – it doesn’t have to be the same folder where you installed R.
We want to set our working directory to wherever you saved the R script and the dataset for this workshop. You can do this via point-and-click: Session…Set Working Directory…Choose Directory.
You can also set the working directory within the script or in the console. Use RStudio to copy the file path that points to where you saved the workshop files: Go to the Files pane (lower right box in RStudio) and navigate to the desired directory (you might have to click on the little square with the “…” on the upper right corner of that pane for Go To Directory). Click More (gear icon)…Copy Folder Path to Clipboard. That always produces a path with forward slashes.
This is an example of how I set my working directory using the console: setwd("C:/Users/jah2ax.ESERVICES/Box Sync/_R/workshops/intro_R")
Ideally your scripts will not include setwd()
because it will not work on someone else’s computer. Also, if I ever move the folder around, it won’t work on my computer either!
There is a better way! RStudio includes RStudio Projects which sets the working directory, provides you with a command history, and gives you a fresh environment (which means it clears out any objects and libraries you may have recently loaded). You can create an RStudio Project that points to the directory (aka folder) that has all your scripts and data.
Let’s open RStudio, then create a RStudio Project that points to the folder where you have the workshop files saved.
Recall that functions are the “verbs” that allow us to manipulate data. Packages contain functions, and all functions belong to packages. For example, read.csv
belongs to the utils
package.
R comes with about 30 packages (“base R”). There are over 10,000 user-contributed packages; you can discover these packages online in Comprehensive R Archive Network (CRAN). You can find more in active development on GitHub, etc. CRAN packages are validated to a certain degree; “buyer beware” with packages on GitHub.
A prevalent collection of packages is tidyverse
. “The tidyverse
is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.”2. We will take a look at that package in our script.
In order to use a package, you must install it first.
tidyverse
(or a different package name) then click on Install.install.packages("tidyverse")
Once you have installed the package, you must load the package in any new R session when you want to use that package: type library(tidyverse)
to load the tidyverse package.
A package is like a lightbulb: you install it once, but you need to turn it on every time you want to use it!
We will be working with Albemarle County Office of Geographic Services real estate data.
You can import almost any kind of data into R. Your best bet for figuring out how to import a dataset is to google “how to import [file type] into R.” You will likely have to install a package in order to do it. For example, haven
is a popular package for importing Stata, SPSS, and SAS files. Remember you can find the official documentation of packages online at CRAN: https://cran.r-project.org/web/packages/haven/index.html. The documentation will help you figure out how to use the package.
Jump back to our R script “Clear our workspace” (~line 237).
Since most people in our workshop have freshly installed R and RStudio, we hopefully didn’t encounter any issues with old versions today. But eventually, you will have to update R.
Remember that at the top of the Console, you will see session info, e.g. R version 4.1.1 (2021-11-01) – “Bird Hippie”
This tells us what version of R that RStudio is using.
You can also check the version with the version command:
version
## _
## platform x86_64-w64-mingw32
## arch x86_64
## os mingw32
## system x86_64, mingw32
## status
## major 4
## minor 1.2
## year 2021
## month 11
## day 01
## svn rev 81115
## language R
## version.string R version 4.1.2 (2021-11-01)
## nickname Bird Hippie
After you load packages, sometimes you will see a Warning Message in red text below the any conflicts:
Warning message: package 'tidyverse' was built under R version [x.y.z]
If you see a message like that, it is time to update R. Updating R means that you have to download and install R again. By default, your computer will keep your old version of R, and you can decide if you want to delete it or not. RStudio will automatically recognize the new version of R. When you install a new version of R, you have to re-install your packages. Windows users can try the installr package.
It is a good idea to occasionally check for package updates (e.g., tidyverse): Tools…Check for Package Updates.
It is a good idea to occasionally check for RStudio updates: Help…Check for Updates.
For those of you looking to build community or just want one-on-one support, we are lucky that we have plenty of in-person/local opportunities:
The great thing about R is that you can very often find an answer to your question online.
Don’t forget the “official” help resources from R/RStudio.
?
before a function or packageCredit to Modern Dive for the R and RStudio analogies, and to Marieke Jones and David Martin’s HSL Intro to R Workshop.↩︎
Credit to https://www.tidyverse.org/.↩︎