Open Source Approach: GDAL & GEOS

If you decide to try out spatial analyses outside the proprietary GIS software, like ArcGIS, the first thing you want to do is to install GDAL and GEOS (come with QGIS!) These are the must-have geospatial libraries for those who are planning to analyze spatia data. Many of the geospatial packages we use in R and Python, for example, actually rely on these two libraries - meaning, we pretty much need the libraries first before using spatial packages in R and Python!

GDAL, Geospatial Data Analysis Library, is first and foremost the library we want, in my opinion. GDAL was initially developed by Frank Warmerdam (now with Google I believe) as a suite of tools to handle and analyze the wide range of spatial data file formats via a collection of drivers, or libraries. It comes in, a tiny bit confusingly, two parts, GDAL for raster data and OGR for vector data manipulation. It also comes with a spatial reference class which makes a link to a projection library for defining and transforming between coordinate systems. GDAL also comes with handy utilities we can use to manipulate geospatial data. Below are just a few examples that showcase GDAL/OGR's capabilities.

# check version
gdalinfo --version

# list GDAL supported formats available to you
gdalinfo --formats

# learn about your data - raster example
gdalinfo elevation.asc

# generate tif from raster ASCII ArcInfo Grid
gdal_translate raster.txt raster.tif

# check version (though, it would be the same as GDAL)
ogrinfo  --version

# list OGR supported formats available to you
ogrinfo --formats

# learn about your data - shapefile features summary example
ogrinfo -so Counties.shp -sql "SELECT * FROM Counties"

# translate/convert ASCII ArcInfo generate vector to Shapefiles
ogr2ogr -overwrite -f "ESRI Shapefile" lines.shp lines.txt

# translate/convert Shapefiles to Google KML
ogr2ogr -skipfailures -overwrite -f "KML" lines1.kml lines.shp

# translate/convert Openstreemap OSM format to SQLite/Spatialite DB
# note: this works only with added spatialite capacity
# you also need OSM data downloaded from Openstreetmap 
# + change projection/spatial reference system
# + keep foreign character encoding
ogr2ogr --config OSM_USE_CUSTOM_INDEXING=NO -skipfailures -f "SQLite" -dsco SPATIALLITE=YES overpass.db overpass.osm -overwrite -lco ENCODING=UTF-8  -s_srs "EPSG: 4326" -t_srs "EPSG: 3095"

# spatial analysis with OGR!
# e.g. overlay/intersect analysis - GPS.shp (points) over Counties.shp (polygons)
#      assign census boundary ID's to all GPS points.
#      results.shp will be a point shapefile with attributes from both inputs
ogr2ogr -dialect SQLITE -sql "SELECT ST_Intersection(A.geometry, B.geometry) AS geometry, A.*, B.* FROM Counties A, GPS B WHERE ST_Intersects(A.geometry, B.geometry)" . . -nln results

# get an average apartment lease fee for each census tract (if the tract contains any apartments)
ogr2ogr -dialect SQLITE -sql "SELECT A.geometry AS geometry, A.*, SUM(B.PRICE)/COUNT(B.PRICE) AS meanprice FROM tracts A, apartmentsNEW B WHERE ST_Intersects(A.geometry, B.geometry) GROUP BY A.CTIDFP00" . .  -nln results -overwrite

# calculate the shortest distance to the nearest CTA Brown Line station
ogr2ogr -dialect SQLITE -sql "SELECT A.geometry AS geometry, A.*, MIN(Distance(A.geometry, B.geometry)) AS distance FROM apartmentsNEW A, CTAbrown B GROUP BY A.APTID" . .  -nln results -overwrite

# buffer (1) no merging/dissolving overlapping buffers, keep all individual buffers
ogr2ogr -dialect SQLITE -sql "SELECT Buffer(A.geometry,2640) AS geometry, A.* FROM CTAbrown A GROUP BY A.STATION_ID" . .  -nln results -overwrite

# buffer (2) dissolve overlapping buffers
ogr2ogr -dialect SQLITE -sql "SELECT ST_Union(Buffer(A.geometry,2640)) AS geometry FROM CTAbrown A" . .  -nln results -overwrite


GEOS, Geometry Engine - Open Source, handles complex vector/geometric objects and adds advanced GIS functionalities (topological operations), not available in GDAL, and thus nicely compliments GDAL. It is the force behind the major spatial operations - buffer, overlay, distance calculation from spatial objects, etc. - that make "spatial" analysis truly "special". GEOS doesn't include stand alone utilities (aside from geos-config) unlike GDAL. Instead, its functionalities are accessed through a variety of applications including R packages (e.g. rgeos) and Python packages (e.g. Shapely).

Installing these libraries can be tricky as there are so many options to do so  - from compiling and installing from source (most flexible) to downloading and installing binaries or packages from repositories. The easiest way, I found, is to simply install QGIS desktop, the most popular open source desktop GIS which will automatically install GDAL and GEOS libraries under the QGIS installation path. To see if GDAL and GEOS are properly installed, test the following commands in R or Python. If you don't get an error message, you are good to go!

  • R:  rgdal & rgeos are the R binding for the GDAL & GEOS libraries
    library(rgdal); library(rgeos)	
  • Python (version 2.7 is the current version of QGIS Python console):
    from osgeo import gdal, ogr
    #(or simply: import gdal, ogr)

Disclaimer: please install the libraries and applications at your own risk as individual situations can vary widely depending on operating systems, versions, etc. Luckily, there are many helpful information on the web on this subject but I am also adding a few of my personal notes relating to the installation issues below.

  • Windows users: after installing QGIS, if the commands above causes errors in R or Python, check (and/or set) PATH (C:\\Program Files\QGIS xxx\bin) and PYTHONPATH (C:\\Program Files\QGIS xxx\apps\Python27\Lib\site-packages) environment variables. Note: be aware that setting PYTHONPATH may affect ArcGIS tools if ArcGIS is on the same Windows (remove/modify PYTHONPATH env.variable accordingly if you need to use ArcGIS tools.)
  • In Installing on Windows, keep in mind on which system your libraries and applications (R/Python) are. i.e. 32-bit vs. 64-bit? I.e. we can't use 64-bit version of libraries on 32-bit applications..
  • Linux users may want to cmpile and install GDAL from source to maximize its capabilities, such as adding more format drivers or choosing a version. I installed from source to read/write File Geodatabases (FileGDB), spatialite/sqlite, and advanced google KML files (LIBKML).

I use 64-bit Windows 7 and Ubuntu/Linux. Let me know if you encounter problems on these two systems..

Add new comment

Sign in with your CNETID and password to post a comment, or submit your comment using the form below.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.