Assuming you have GDAL and GEOS available on your system, we can start geospatial data analysis in R! In the R spatial packages universe, sitting at the top of the hierarchy is sp. The sp package provides classes to describe vector/geometry and raster data, and also includes basic methods. It is my understanding that sp decided how we model geospatial data in R and other R packages follow the models. However, sp does not provide versatile methods/tools we will need, so we augment with additional R spataial packages. First, we should obtain rgdal to read/write geosptail data in various formats and to reproject data (i.e. spatial reference transformation). There are other options for data reading/writing but I think rgdal is the best. Rgeos is also useful in supplying vector anlaysis functions. See the tutorial below for more on spatial packages and their functions.
Note - this tutorial is also posted on RPubs (http://rpubs.com/cmaene/opensourceapproach_r) - in there, outputs/graphics are also included, so it might be easier to follow than this blog post. Also, I am making an accompanying dataset, data.zip, available - please download it from my homepage as noted below..
The first part starts with simple operations - reading and plotting geospatial data.
download.file("http://home.uchicago.edu/~cmaene/data.zip", destfile = "data.zip") unzip("data.zip") setwd("data") install.packages(c("sp","rgdal")) # skip this if you installed these previously library(sp) # create spatial point data from XY (GPS-like) table GPSdata <- read.csv("GPS.csv") class(GPSdata) # data frame GPSpt1 <- SpatialPoints(GPSdata[,c("x","y")]) class(GPSpt1) # SpatialPoints class - no attributes summary(GPSpt1) GPSpt2 <- SpatialPointsDataFrame(GPSdata[,c("x","y")], GPSdata, match.ID=FALSE) summary(GPSpt2) # SpatialPointsDataFrame - with attributes proj4string(GPSpt2) <- CRS("+proj=longlat +datum=NAD83") summary(GPSpt2) # SpatialPointsDataFrame - with attributes & SRS # read shapefiles with GDAL's OGR library(rgdal) # upload rgdal! roads <- readOGR(dsn=".", layer="roads", verbose=TRUE) class(roads) # see what consists of the roads object summary(roads) # OGR reads .prj/projection info # read grid/raster data with GDAL raster <- readGDAL("elevation.asc") class(raster) # SpatialGridDataFrame #plot(raster) # not what I expected image(raster) # better.. # add GPS points and road lines to the raster plot plot(roads, col="green", add=T) plot(GPSpt2, col="blue", add=T)
Let's try classic GIS functions - buffer & overlay. In this scenario, we will identify apartments within a half-mile distance from the prefered train stations.
install.packages("rgeos", repos="http://cran.us.r-project.org") # skip if installed previously library(rgeos) setwd("data") sts <- readOGR(dsn=".", layer="CTAbrown", verbose=FALSE) # check spatial/coodinate reference system (srs/crs) information of "sts" # because the subsequent buffer unit will be determined by it! proj4string(sts) # unit=US feet (us-ft). Hard to tell but it's Illinois State Plane CS, Zone East. # create buffer - define half-mile (2640 ft) from the train stations stsBuffer <- gBuffer(sts, width=2640) # overlay to find how many apartments fall in the train station buffer zones apts <- readOGR(dsn=".", layer="apartments", verbose=FALSE) proj4string(apts) # in WGS84 # reproject "apts" in the same CRS as "sts" apts <-spTransform(apts, CRS(proj4string(sts))) # 1: using normal subset with sp objects! kinda cool, I found.. aptsSubset <- apts[stsBuffer,] class(aptsSubset) # SpatialPointsDataFrame head(aptsSubset@data) plot(stsBuffer) plot(aptsSubset, col="red", pch=17, add=TRUE) # 2: using sp's over aptsOverIndex <-over(apts,stsBuffer) aptsOver <-apts[!is.na(aptsOverIndex),] class(aptsOver) # SpatialPointsDataFrame # 3: using rgeos'gIntersection aptsInside <-gIntersection(apts, stsBuffer) aptsOutside <-gDifference(apts, stsBuffer) # rgeos' gInteresection returns SpatialPOints objects (without data frame/attributes) class(aptsInside) # 4: using rgeos' gIntersects - it returns a logical (T/F) matrix # which we can use then to create a new SpatialPointsDataFrame (SPDF) object aptsInsideTF<-gIntersects(apts, stsBuffer, byid=TRUE) aptsInside <-apts[as.vector(aptsInsideTF),] # note: turn matrix to vector for subsetting class(aptsInside) # now aptsInside is SPDF # plot to see if worked plot(stsBuffer, axes=TRUE) plot(sts, pch=19, cex=0.5, add=TRUE) plot(aptsInside, col="red", pch=8, add=TRUE) plot(aptsOutside, col="blue", pch=3, add=TRUE) legend("topright", legend=c("CTA stations","Inside","Outside"), cex=0.8, bty="n", lwd=2, col=c("black","red","blue"), pch=c(19,8,3), lty=c(NA,NA,NA)) # add a label - Apartment IDs labelxy <- coordinates(aptsInside) text(labelxy,labels=aptsInside@data$APTID, cex=0.8) # add a title title(main="Apartments Selected \nwithin 1/2 mile from CTA Brown")
Since we have an appropriate datasets, let's calculate distance with rGEOS - for each of the selected apartment, we want to know the closest CTA Brown station and the distance to it.
# rgeos' gDistance calculates cartesian/Euclidean minimum distance. # for geodesic distance - consider SDMTools (vincenty method) or other tools. proj4string(sts) proj4string(aptsInside) # calculate distance to the nearest CTA Brown stations dist <- gDistance(aptsInside, sts, byid=TRUE) # get the nearest station ID and the distance value (in US-feet) distMinNearSts <- apply(dist, 2, function(x) which.min(x)) distMinValue <- apply(dist, 2, function(x) min(x)) distancedata <- cbind(aptsInside@data, distMinNearSts, distMinValue) distancedata # check the result
Next - let's create some maps - choropleth & dot density maps. In addition to the previous two libraries, sp and rgdal, I am adding a color palette library, RColorBrewer - a must have library for pretty cartographic works. We will also load classInt to divide values into classes and maptools to create a dot density map. maptools is a Swiss Army knife like GIS toolset - it often comes in handy to do familiar GIS operations.
install.packages(c("RColorBrewer","classInt","maptools"), repos="http://cran.us.r-project.org") # skip if installed previously library(classInt) library(maptools) library(RColorBrewer) # here are the available color palette par(mar = c(1, 3, 1, 1)) display.brewer.all() setwd("data") # Chicago census tracts tracts <-readOGR(dsn=".", layer="tracts", verbose=TRUE) # define equal-frequency interval class age_under5 <- tracts@data$AGEU5 nclr <- 5 colors <- brewer.pal(nclr, "YlOrBr") class <- classIntervals(age_under5, nclr, style="quantile") colorcode <- findColours(class, colors) # plot a chropleth map plot(tracts, col = colorcode) title(main = "Age Under 5: Quantile (Equal-Frequency)") legend("topright", legend=names(attr(colorcode, "table")), fill=attr(colorcode, "palette"), cex=0.6, bty="n") # prepare for a dot density map hispanic <- tracts@data$HISPANIC dotper <- hispanic/500 dothispanic <- dotsInPolys(tracts, as.integer(dotper), f="random") plot(tracts) plot(dothispanic, col="brown", pch=19, cex=0.2, add=TRUE) title(main="Dot Density Map: dot=500 persons") # add dots for black & a legend black <- tracts@data$BLK dotper2 <- black/500 dotblack <- dotsInPolys(tracts, as.integer(dotper2), f="random") plot(dotblack, col="blue", pch=19, cex=0.2, add=T) legend("bottomleft", legend = c("Hispanic", "Black"), fill=c("brown", "blue"), bty="n")
Raster analysis examples - based on weather station records, we (1) interpolate to create precipitation surface map and then (2) summarize the value by state to calculate the wettest and driest states. We will add raster and gstat packages.
install.packages(c("raster","gstat"), repos="http://cran.us.r-project.org") # skip if installed previously library(raster) library(gstat) setwd("data") # Part 1: interpolate to create precipitation surface map weathersts <-readOGR(dsn=".", layer="weatherst", verbose=FALSE) # create a regular interval (1-degree) grid ext <-bbox(weathersts) grid <- expand.grid(x=seq(ext[1,1],ext[1,2],by=1),y=seq(ext[2,1],ext[2,2],by=1)) coordinates(grid)<-c("x","y") # turn grid1 to a spatial point object proj4string(grid) <- proj4string(weathersts) # idw predicts values for all regularly spaced grid by power 2 interpolation idw <- idw(PYR ~ 1, weathersts, grid, idp=2) # turn the result into raster grid idwgrid <- rasterFromXYZ(as.data.frame(cbind(grid$x, grid$y, idw@data$var1.pred))) # add state lines states <- readOGR(dsn=".", layer="states48", verbose=FALSE) plot(idwgrid) plot(states, add=T) # Part 2: summarize the interpolated values by state to find the driest and wettest states meanPYR <- extract(idwgrid, states, fun=mean) states@data<-cbind(states@data,meanPYR) nclr<-11 color<-brewer.pal(nclr, "RdYlGn") class <- classIntervals(states@data$meanPYR, nclr, style="quantile") colorcode<-findColours(class, color) plot(states, col=colorcode, axes=TRUE) # top 5 driest states sort<-states@data[order(states@data$meanPYR),] head(sort) # top 5 wettest states sort<-states@data[order(states@data$meanPYR, decreasing=TRUE),] head(sort)
That's all for today!