Open Source Approach: R

Assuming you have GDAL and GEOS available on your system, we can start geospatial data analysis in R! In the R spatial packages universe, sitting at the top of the hierarchy is sp. The sp package provides classes to describe vector/geometry and raster data, and also includes basic methods. It is my understanding that sp decided how we model geospatial data in R and other R packages follow the models. However, sp does not provide versatile methods/tools we will need, so we augment with additional R spataial packages. First, we should obtain rgdal to read/write geosptail data in various formats and to reproject data (i.e. spatial reference transformation). There are other options for data reading/writing but I think rgdal is the best. Rgeos is also useful in supplying vector anlaysis functions. See the tutorial below for more on spatial packages and their functions.

Note - this tutorial is also posted on RPubs (http://rpubs.com/cmaene/opensourceapproach_r) - in there, outputs/graphics are also included, so it might be easier to follow than this blog post. Also, I am making an accompanying dataset, data.zip, available - please download it from my homepage as noted below..

The first part starts with simple operations - reading and plotting geospatial data.

download.file("http://home.uchicago.edu/~cmaene/data.zip", destfile = "data.zip")
unzip("data.zip")

setwd("data")
install.packages(c("sp","rgdal")) # skip this if you installed these previously
library(sp)

# create spatial point data from XY (GPS-like) table
GPSdata <- read.csv("GPS.csv")
class(GPSdata)              # data frame
GPSpt1 <- SpatialPoints(GPSdata[,c("x","y")])
class(GPSpt1)               # SpatialPoints class - no attributes
summary(GPSpt1)
GPSpt2 <- SpatialPointsDataFrame(GPSdata[,c("x","y")], GPSdata, match.ID=FALSE)
summary(GPSpt2)             # SpatialPointsDataFrame - with attributes
proj4string(GPSpt2) <- CRS("+proj=longlat +datum=NAD83")
summary(GPSpt2)             # SpatialPointsDataFrame - with attributes & SRS

# read shapefiles with GDAL's OGR
library(rgdal)            # upload rgdal!
roads <- readOGR(dsn=".", layer="roads", verbose=TRUE)
class(roads)              # see what consists of the roads object
summary(roads)            # OGR reads .prj/projection info

# read grid/raster data with GDAL
raster <- readGDAL("elevation.asc")
class(raster)             # SpatialGridDataFrame
#plot(raster)             # not what I expected
image(raster)             # better..

# add GPS points and road lines to the raster plot
plot(roads, col="green", add=T)
plot(GPSpt2,  col="blue", add=T)

Let's try classic GIS functions - buffer & overlay.  In this scenario, we will identify apartments within a half-mile distance from the prefered train stations.

install.packages("rgeos", repos="http://cran.us.r-project.org") # skip if installed previously
library(rgeos)
setwd("data")
sts <- readOGR(dsn=".", layer="CTAbrown", verbose=FALSE)
# check spatial/coodinate reference system (srs/crs) information of "sts"
# because the subsequent buffer unit will be determined by it!
proj4string(sts)           #  unit=US feet (us-ft). Hard to tell but it's Illinois State Plane CS, Zone East.
# create buffer - define half-mile (2640 ft) from the train stations
stsBuffer <- gBuffer(sts, width=2640)

# overlay to find how many apartments fall in the train station buffer zones
apts <- readOGR(dsn=".", layer="apartments", verbose=FALSE)
proj4string(apts)           # in WGS84
# reproject "apts" in the same CRS as "sts"
apts <-spTransform(apts, CRS(proj4string(sts)))

# 1: using normal subset with sp objects! kinda cool, I found..
aptsSubset <- apts[stsBuffer,]
class(aptsSubset)           # SpatialPointsDataFrame
head(aptsSubset@data)
plot(stsBuffer)
plot(aptsSubset, col="red", pch=17, add=TRUE)

# 2: using sp's over
aptsOverIndex <-over(apts,stsBuffer)
aptsOver      <-apts[!is.na(aptsOverIndex),]
class(aptsOver)             # SpatialPointsDataFrame

# 3: using rgeos'gIntersection
aptsInside  <-gIntersection(apts, stsBuffer)
aptsOutside <-gDifference(apts, stsBuffer)
# rgeos' gInteresection returns SpatialPOints objects (without data frame/attributes)
class(aptsInside)

# 4: using rgeos' gIntersects - it returns a logical (T/F) matrix
#    which we can use then to create a new SpatialPointsDataFrame (SPDF) object
aptsInsideTF<-gIntersects(apts, stsBuffer, byid=TRUE)
aptsInside  <-apts[as.vector(aptsInsideTF),]  # note: turn matrix to vector for subsetting
class(aptsInside)            # now aptsInside is SPDF

# plot to see if worked
plot(stsBuffer, axes=TRUE)
plot(sts, pch=19, cex=0.5, add=TRUE)
plot(aptsInside, col="red", pch=8, add=TRUE)
plot(aptsOutside, col="blue", pch=3, add=TRUE)
legend("topright", legend=c("CTA stations","Inside","Outside"), cex=0.8, bty="n", lwd=2, col=c("black","red","blue"), pch=c(19,8,3), lty=c(NA,NA,NA))
# add a label - Apartment IDs
labelxy <- coordinates(aptsInside)
text(labelxy,labels=aptsInside@data$APTID, cex=0.8)
# add a title
title(main="Apartments Selected \nwithin 1/2 mile from CTA Brown")

Since we have an appropriate datasets, let's calculate distance with rGEOS - for each of the selected apartment, we want to know the closest CTA Brown station and the distance to it.

# rgeos' gDistance calculates cartesian/Euclidean minimum distance.
# for geodesic distance - consider SDMTools (vincenty method) or other tools.
proj4string(sts)
proj4string(aptsInside)
# calculate distance to the nearest CTA Brown stations
dist <- gDistance(aptsInside, sts, byid=TRUE)
# get the nearest station ID and the distance value (in US-feet)
distMinNearSts <- apply(dist, 2, function(x) which.min(x))
distMinValue   <- apply(dist, 2, function(x) min(x))
distancedata   <- cbind(aptsInside@data, distMinNearSts, distMinValue)
distancedata  # check the result

Next - let's create some maps - choropleth & dot density maps. In addition to the previous two libraries, sp and rgdal, I am adding a color palette library, RColorBrewer - a must have library for pretty cartographic works. We will also load classInt to divide values into classes and maptools to create a dot density map. maptools is a Swiss Army knife like GIS toolset - it often comes in handy to do familiar GIS operations.

install.packages(c("RColorBrewer","classInt","maptools"), repos="http://cran.us.r-project.org") # skip if installed previously
library(classInt)
library(maptools)
library(RColorBrewer)
# here are the available color palette
par(mar = c(1, 3, 1, 1))
display.brewer.all()

setwd("data")
# Chicago census tracts
tracts <-readOGR(dsn=".", layer="tracts", verbose=TRUE)
# define equal-frequency interval class
age_under5 <- tracts@data$AGEU5
nclr <- 5
colors <- brewer.pal(nclr, "YlOrBr")
class <- classIntervals(age_under5, nclr, style="quantile")
colorcode <- findColours(class, colors)

# plot a chropleth map
plot(tracts, col = colorcode)
title(main = "Age Under 5: Quantile (Equal-Frequency)")
legend("topright", legend=names(attr(colorcode, "table")), fill=attr(colorcode, "palette"), cex=0.6, bty="n")

# prepare for a dot density map
hispanic <- tracts@data$HISPANIC
dotper <- hispanic/500
dothispanic <- dotsInPolys(tracts, as.integer(dotper), f="random")
plot(tracts)
plot(dothispanic, col="brown", pch=19, cex=0.2, add=TRUE)
title(main="Dot Density Map: dot=500 persons")

# add dots for black & a legend
black <- tracts@data$BLK
dotper2 <- black/500
dotblack <- dotsInPolys(tracts, as.integer(dotper2), f="random")
plot(dotblack, col="blue", pch=19, cex=0.2, add=T)
legend("bottomleft", legend = c("Hispanic", "Black"), fill=c("brown", "blue"), bty="n")

Raster analysis examples - based on weather station records, we (1) interpolate to create precipitation surface map and then (2) summarize the value by state to calculate the wettest and driest states. We will add raster and gstat packages.

install.packages(c("raster","gstat"), repos="http://cran.us.r-project.org") # skip if installed previously
library(raster)
library(gstat)
setwd("data")
# Part 1: interpolate to create precipitation surface map
weathersts <-readOGR(dsn=".", layer="weatherst", verbose=FALSE)
# create a regular interval (1-degree) grid
ext <-bbox(weathersts)
grid <- expand.grid(x=seq(ext[1,1],ext[1,2],by=1),y=seq(ext[2,1],ext[2,2],by=1))
coordinates(grid)<-c("x","y")   # turn grid1 to a spatial point object
proj4string(grid) <- proj4string(weathersts)
# idw predicts values for all regularly spaced grid by power 2 interpolation
idw <- idw(PYR ~ 1, weathersts, grid, idp=2)
# turn the result into raster grid
idwgrid <- rasterFromXYZ(as.data.frame(cbind(grid$x, grid$y, idw@data$var1.pred)))
# add state lines
states <- readOGR(dsn=".", layer="states48", verbose=FALSE)
plot(idwgrid)
plot(states, add=T)

# Part 2: summarize the interpolated values by state to find the driest and wettest states
meanPYR <- extract(idwgrid, states, fun=mean)
states@data<-cbind(states@data,meanPYR)
nclr<-11
color<-brewer.pal(nclr, "RdYlGn")
class <- classIntervals(states@data$meanPYR, nclr, style="quantile")
colorcode<-findColours(class, color)
plot(states, col=colorcode, axes=TRUE)
# top 5 driest states
sort<-states@data[order(states@data$meanPYR),]
head(sort)
# top 5 wettest states
sort<-states@data[order(states@data$meanPYR, decreasing=TRUE),]
head(sort)

That's all for today!

Add new comment

Sign in with your CNETID and password to post a comment, or submit your comment using the form below.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.