Zehui Yin
First-year PhD Student in Geography at SEES

  • Email: yinz39@mcmaster.ca
    • Please use your McMaster email address.
    • Please put the course code ENVSOCTY 4GA3 in the subject line.
    • Please include your name and student number in the body of the email.
    • I will try to reply within 24 hours (please expect longer delays during weekends/holidays).
  • Personal Website: zehuiyin.github.io
  • Research Interests: Spatial Analysis, Transportation, Travel Behaviour, Public Transit, Shared Mobility

Agenda for today

  • Introduction to basic concepts for coding in R
  • Getting a flavor of R syntax and style
  • Setting up R and reproducible environment on your personal computer

R and RStudio

   

  • R is a free and open-source programming language for statistical computing and graphics.
  • RStudio is an integrated development environment (IDE) for coding in R.
    • An IDE is a set of tools that helps you code.
  • We use RStudio to write our R codes.

R packages

  • R packages are the fundamental units of reproducible R code.
  • They can include functions, data, or both, along with documentation.
  • Think of them as plug-ins that enhance the functionality of existing software.
  • For example, web browser extensions like ad blockers add additional features that the original browser doesn’t have.

Reproducible environment

  • An environment is the system where a program is run, including hardware and software such as operating system dependencies, programming language, packages, their configuration, and versions.
  • Just as running 1000 meters affects individuals differently, running code on different computers or with different package versions can produce varied results.
  • A reproducible environment ensures that everyone gets the same result by keeping the environment consistent.

renv package

  • renv is an R package that helps create reproducible environments for R projects.
  • It records the R version and all R packages along with their versions in a lockfile.
    • A lockfile is a text file that stores all the environment information.

Code hosting and Github

  • Code hosting involves storing code online to facilitate sharing, management, and collaboration with others.
  • One of the most popular code hosting platforms is GitHub (owned by Microsoft).
  • Both the textbook and the companion R package used in this course are hosted on GitHub.
  • GitHub is like a cloud drive (similar to OneDrive or Dropbox) but specialized for storing code, including R scripts.

R Markdown vs. R

  • R Markdown is a file format that combines R code, its results, and accompanying text.
  • It uses the file extension .Rmd and essentially is a plain text file integrating markdown and R.
  • You can start by creating an Rmd file on the lab computer.

R Markdown syntax

1---
title: "Untitled"
author: "Zehui Yin"
output: html_document
---

2# Heading level 1
## Heading level 2
text...text...text
**bold**   __bold__
*italic*   _italic_

3```{r}
print("Hello world!)
```
1
YAML header: stores settings or meta information
2
Markdown text: contains plain text in markdown format.
3
R code chunk: contains R code to be executed

R basics: arthmatic operations

You can start by trying out R on the lab computer. Later, we’ll set it up on your personal computer.

R can be used as a calculator, using intuitive symbols for these operations:

1 + 5
[1] 6
8 - 3
[1] 5
3 * 4
[1] 12
9 / 3
[1] 3

R basics: assigning values

One of the cornerstones of programming languages is assignment. You can assign a value/object to a name using <- (suggested R style) or = (“Python” style).

a <- 1
b <- 3
a + b
[1] 4
c = 7
d = 5
c * d
[1] 35

R basics: built-in functions

R comes with many built-in functions. The calling syntax is function(parameter1, parameter2, ...). Additionally, with extra R packages, there are even more functions you can use.

values <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
sum(values)
[1] 55
mean(values)
[1] 5.5
library(MASS)
# integrate the sin function from 0 to pi.
area(sin, 0, pi)
[1] 2

R basics: indexing

Indexing is the process of selecting specific values from an object based on their index location. Whenever you see [] or $ in R, some form of indexing is happening.

v <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
1 2 3 4 5 6 7 8 9 10
v[2]
[1] 2
v[2:4]
[1] 2 3 4
v[c(TRUE, T, T, T, T, TRUE,
    FALSE, F, FALSE, F)]
[1] 1 2 3 4 5 6

R basics: indexing

df <- data.frame(col1 = c(1, 2, 3),
                 col2 = c(4, 5, 6))
col1 col2
1 4
2 5
3 6
df[1, 2]
[1] 4
df[, "col2"]
[1] 4 5 6
df$col1
[1] 1 2 3

R basics: flow control

Flow control is an important component of any programming language. In R, the if-else statement and loops work as follows:

x <- 6
if (x > 5) { 
  print("Greater than 5") 
} else {
  print("Less or equal to 5")
}
[1] "Greater than 5"
for (i in 1:3) {
  print(i)
}
[1] 1
[1] 2
[1] 3

R basics: custom functions

To define your own function in R, you can use the following syntax. Note that the last line of code is automatically returned by R, though a “Python” style return statement is also valid in R.

add <- function(a, b) { 
  a + b
}

add(1, 4)
[1] 5
add <- function(a, b) { 
  return(a + b)
}

add(1, 4)
[1] 5

Download R version 4.4.2

mirror.csclub.uwaterloo.ca/CRAN

Download RStudio

posit.co/download/rstudio-desktop

Restoring the environment

Download the Applied-Spatial-Statistics zip file from Avenue and unzip it. You should then have a folder with the following structure:

Applied-Spatial-Statistics/
├── renv/
├── .gitignore
├── .Rprofile
├── Applied-Spatial-Statistics.Rproj
├── README.md
├── README.Rmd
└── renv.lock
  • Double-click the Applied-Spatial-Statistics.Rproj file to open the R project.

RTools 4.4 for Windows users

mirror.csclub.uwaterloo.ca/CRAN/bin/windows/Rtools

  • Ensure you download the Rtools version that matches your installed R version.
  • Rtools is a set of programs required on Windows to build R packages from source.
  • Note: If you are using Mac or Linux, Rtools is not required.

Xcode and GNU Fortran for Mac users

mac.r-project.org/tools

  • In order to compile R for macOS, you will need both Xcode and GNU Fortran compiler.

Homebrew and GDAL for Mac users

Next, use macOS Terminal to install Homebrew, and subsequently, GDAL.

brew.sh

formulae.brew.sh/formula/gdal

Restoring the environment

  • Navigate to the bottom right panel.
  • Click the Packages tab, then the renv button, and finally select the Restore Library option.
  • Click the Restore button in the pop-up panel.

Install \(\LaTeX\)

\(\LaTeX\) is a high-quality typesetting system. While it may seem as a language, understanding it isn’t necessary for our purposes. We will use it to export Rmd files with results into PDF files.

If you already use \(\LaTeX\) and have it installed through MiKTeX or TeX Live, you can skip this step.

If you are unfamiliar with \(\LaTeX\) and don’t have it installed yet, simply run the following R code to install it:

tinytex::install_tinytex()

Lab slides

References