R Package sifr

R package

Spatial analysis

I developed an R Package sifr for conducting basic spatial interpolation and calculating land-use entropy. You can download sifr from GitHub and install it in your R environment.

Author

Zehui Yin

Published

December 2, 2023

I created an R package called sifr that allows users to perform basic spatial interpolation of sociodemographic and built-environment variables. These variables are useful for studying the characteristics and patterns of urban spaces, especially subway stations. The package contains functions that can take point data, such as the coordinates of subway stations, and interpolate the values of variables based on the surrounding areas such as calculating population density fall inside the subway station geometry buffers. These interpolated variables can then be used as inputs for cluster analysis or econometric analysis to explore the relationships between subway stations and their context such as built environments. The package is mainly designed for point data, but it may also work with other types of spatial data (which have not been thoroughly tested). You can find the source code of the package here. Below is a short example that demonstrates some of the core functionality of the package.

Intro

if(!require(devtools)){
  install.packages("devtools")
  library(devtools)
}

Loading required package: devtools

Loading required package: usethis

if(!require(sifr)){
  devtools::install_github("zehuiyin/sifr")
  library(sifr)
}

Loading required package: sifr

library(sf)

Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE

library(mapview)

In this illustration, I provided some example usage for the sifr package. Some sociodemographic and built environment variables are calculated for a small random sample of schools in Toronto. Throughout this example, I used projection NAD83 / UTM Zone 18N (EPSG:26918).

Load data

All types of School Locations in Toronto

# school locations in Toronto
schools <- st_read("https://github.com/zehuiyin/sifr/raw/main/extdata/toronto_schools.gpkg")

Reading layer `School locations-all types data - 4326' from data source 
  `https://github.com/zehuiyin/sifr/raw/main/extdata/toronto_schools.gpkg' 
  using driver `GPKG'
Simple feature collection with 1194 features and 24 fields
Geometry type: MULTIPOINT
Dimension:     XY
Bounding box:  xmin: -79.61918 ymin: 43.59022 xmax: -79.13372 ymax: 43.83037
Geodetic CRS:  WGS 84

# force the geometry column to name geometry
st_geometry(schools) <- "geometry"

# for reproductivity
set.seed(123)

# take a random 30 sample to simplify calculation as this is just an example
schools <- schools[sample(1:nrow(schools), 30),]

mapview(schools)

Some variables of interest

# road intersections in Toronto
intersections <- st_read("https://github.com/zehuiyin/sifr/raw/main/extdata/centreline_intersection.gpkg")

Reading layer `Centreline Intersection - 4326' from data source 
  `https://github.com/zehuiyin/sifr/raw/main/extdata/centreline_intersection.gpkg' 
  using driver `GPKG'
Simple feature collection with 55149 features and 20 fields
Geometry type: MULTIPOINT
Dimension:     XY
Bounding box:  xmin: -79.63926 ymin: 43.581 xmax: -79.11545 ymax: 43.85545
Geodetic CRS:  WGS 84

# force the geometry column to name geometry
st_geometry(intersections) <- "geometry"

# road centrelines in Toronto
roads <- st_read("https://github.com/zehuiyin/sifr/raw/main/extdata/centreline.gpkg")

Reading layer `Centreline - Version 2 - 4326' from data source 
  `https://github.com/zehuiyin/sifr/raw/main/extdata/centreline.gpkg' 
  using driver `GPKG'
Simple feature collection with 71297 features and 40 fields
Geometry type: MULTILINESTRING
Dimension:     XY
Bounding box:  xmin: -79.63926 ymin: 43.5809 xmax: -79.11545 ymax: 43.85545
Geodetic CRS:  WGS 84

# force the geometry column to name geometry
st_geometry(roads) <- "geometry"

# 2021 Canadian census data
census_data <- readRDS(url("https://github.com/zehuiyin/sifr/raw/main/data/census_data.rds", "rb"))

# land use data in Toronto
landuse <- st_read("/vsicurl/https://github.com/zehuiyin/sifr/raw/main/extdata/landuse.shp")

Reading layer `landuse' from data source 
  `/vsicurl/https://github.com/zehuiyin/sifr/raw/main/extdata/landuse.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 12723 features and 33 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 7201961 ymin: 918078.1 xmax: 7243302 ymax: 954703.3
Projected CRS: PCS_Lambert_Conformal_Conic

Calculate Number of Intersections within 300-meter buffers of the schools

# generate what intersection falls within 300 meter buffers
school_int_sec <- what_within_each_stops(schools, intersections, 26918, 300)

# create holder to record how many intersections falls within
school_int_sec$input_rows_within_count <- 0

for (i in 1:nrow(school_int_sec)) { # loop over stop rows
  if(!is.na(str_to_num(i,"input_rows_within",school_int_sec))[1]){ # check whether there is intersection falls within
    school_int_sec$input_rows_within_count[i] <- length(str_to_num(i,"input_rows_within",school_int_sec)) # record the number of intersections
  }
}

schools$Rd_conn <- school_int_sec$input_rows_within_count
mapview(schools, cex = "Rd_conn")

Calculate Length of Roads within 300-meter buffers of the schools

# generate what intersection falls within 300 meter buffers
road_length <- length_in_buffer(schools, roads, 26918, 300)

Warning: attribute variables are assumed to be spatially constant throughout
all geometries

schools$road_len <- road_length$total_length_within_buffer

mapview(schools, cex = "road_len")

Calculate Average Population Density within 300-meter buffers of the schools

# get rid of the long name
census_data$pop_den <- census_data$`v_CA21_6: Population density per square kilometre`

pop_den <- average_value_in_buffer(schools, census_data, 26918, 300, 
                                   "pop_den", FALSE)

Warning: attribute variables are assumed to be spatially constant throughout
all geometries

schools$pop_den <- pop_den$pop_den

mapview(schools, cex = "pop_den")

Assign the nearest Census Tract’s Population Density to the schools

pop_den_median <- nearest_median_value(schools, census_data, "pop_den", 26918)

Warning: st_centroid assumes attributes are constant over geometries

schools$pop_den_median <- pop_den$pop_den

mapview(schools, cex = "pop_den_median")

Calculate land use mix (entropy) within 3000-meter buffers of the schools

\[Entropy = -\sum_{k=1}^nP_k\times\frac{ln(P_k)}{ln(n)}\]

Filter to only 4 types of land use types (reduce computing time for illustration purposes).

landuse <- landuse[which(landuse$Class_name %in% c("ApartmentNeighbourhoods",
                                                   "CoreEmploymentAreas",
                                                   "Natural areas",
                                                   "Institutional")),]
summary(as.factor(landuse$Class_name))

ApartmentNeighbourhoods     CoreEmploymentAreas           Institutional 
                    221                     168                      59 
          Natural areas 
                    174

entropy <- calculate_entropy(schools, landuse, 26918, 3000, "Class_name", FALSE)

Warning: attribute variables are assumed to be spatially constant throughout
all geometries

Warning: attribute variables are assumed to be spatially constant throughout
all geometries

Warning: attribute variables are assumed to be spatially constant throughout
all geometries

Warning: attribute variables are assumed to be spatially constant throughout
all geometries

schools$PCT_resi <- entropy$Class_name_ApartmentNeighbourhoods
schools$PCT_comm <- entropy$Class_name_CoreEmploymentAreas
schools$entropy <- entropy$entropy

mapview(schools, zcol = "entropy")