Patrick Gray (patrick.c.gray at duke) - https://github.com/patrickcgray

Chapter 4: Importing and using vector data -- the OGR library

Introduction

The OGR library is a companion library to GDAL that handles vector data capabilities, including information queryies, file conversions, rasterization of polygon features, polygonization of raster features, and much more. It handles popular formats including the ESRI Shapefile, Keyhole Markup Language, PostGIS, and SpatiaLite. For more information on how OGR came about and how it relates to GDAL, see here: http://trac.osgeo.org/gdal/wiki/FAQGeneral#WhatisthisOGRstuff.

In this tutorial we'll be working with:

Let's explore shapely a bit by creating some shapes:

With these two lines created we can run some geospatial operations on them like an intersection:

We can buffer shapes too.

We can do intersections:

Or we can do a union instead of an intersection.

So far we've just been using the default plotting in Jupyter notebooks. Let's shift now to plotting with matplotlib.

Now let's do something a little more complicated. We're going to plot a line and then buffer around that line in Subplot 1 and then in Subplot 2 we'll plot that buffer and then erode the buffer and plot it on top of the full buffer.

Here we'll use descartes to plot these polygon patches in matplotlib:

This is of course just a taste. You can do all sorts of cool geometric operations with shapely.

We'll now use an ESRI Shapefile that contains training data I collected for the example image we've been working on.

Opening an ESRI Shapefile

Just like GDAL in rasterio, OGR in fiona abstracts the file formats so that we can use the same code for any format. It employs the same concept of a dataset object which we can gather information from:

Using fiona to import shapefiles

With our Shapefile read in, we can look at some of its properties:

The shapefile is a list of features, which can be accessed like any python list

As all dictionaries in Python, there are keys and values.

If you want to transform this geometry into a shapely geometry use the shape function that we imported earlier

We'll come back to this set of training features later!

Bringing in the real power tools: geopandas

Geopandas takes the tools we have seen so far to the next level.

The goal of GeoPandas is to make working with geospatial data in python easier. It combines the capabilities of pandas and shapely, providing geospatial operations in pandas and a high-level interface to multiple geometries to shapely. GeoPandas enables you to easily do operations in python that would otherwise require a spatial database such as PostGIS.

From the docs:

GeoPandas implements two main data structures, a GeoSeries and a GeoDataFrame. These are subclasses of pandas Series and DataFrame, respectively.

A GeoSeries is essentially a vector where each entry in the vector is a set of shapes corresponding to one observation.

A GeoDataFrame is a tabular data structure that contains a GeoSeries.

The most important property of a GeoDataFrame is that it always has one GeoSeries column that holds a special status. This GeoSeries is referred to as the GeoDataFrame‘s “geometry”. When a spatial method is applied to a GeoDataFrame (or a spatial attribute like area is called), this commands will always act on the “geometry” column

Let's show a simple example.

Okay so that is a GeoSeries let's look at a GeoDataFrame using one of the datasets that is packaged with geopandas: a GeoDataFrame of the New York City Boroughs.

Now let's plot that GeoDataFrame

Pretty cool! A plot of the NYC Boroughs just like that!

We can do all the same cool geometric operations to these GeoDataFrames as we could in shapely. Here we'll apply convex hull and color each borough differently.

Let's look at a dataset with some more attributes. GeoPandas comes pre-packaged with a world dataset that'll do.

With geopandas you can do filtering just like in any pandas dataframe

We can filter all latitudes greater than 0 leaving only the southern hemisphere

We can do more advanced filtering like combining the countries from each continent and then sorting continents by population.

It is also really simple to create Chloropleth maps (maps where the color of each shape is based on the value of an associated variable).

Simply use the plot command with the column argument set to the column whose values you want used to assign colors.

Let's calculate and plot by GDP per capita

Let's take a look at some ocean and glacier data

Let's overlay the oceans, countries, and glaciers

Let's look at Anchorage Alaska at 61.2181° N, 149.9003° W

Note there are some mismatches between the glacier and land datasets because of the different resolutions

Now let's combine this new vector knowledge with our raster dataset

For now, let's visualize our raster and shapefile datasets together

Read our shapefile straight into a geopandas GeoDataFrame

We could filter the DF down to the columns we care about if we wanted to:

Okay now let's try to find the bounding box of the raster and visualize it

These look like lat lon coordinates but let's make sure:

Not very human readable... let's use pyproj to make sure:

Now let's check our shapefile crs:

They are different!

We could project the shapefile to lat long or we could change the raster to epsg:32618 which is UTM 18N. For now since, we are only worried about the raster bounds, we'll change the crs of those four points to UTM 18N.

Let's find all four corner points by using dataset.transform * (<pixel location>) which allows you to input a pixel location and returns a coordinate pair in the current CRS:

Great now we've got our raster bounds and shapefile in matching coordinate reference systems!

let's overlay our raster bounds and shapefile features

hmm challenging to see, let's zoom in a bit

still not extremely useful, let's make it interactive and add some class labels

Again you may've installed these two packges in chapter 2, but just in case not let's run it here:

Now that is actually useful for exploring the dataset!

Wrapup

Now that we have all these tools in our toolkit, we can proceed to use them for pairing our labeled polygons with the matching pixels in our Planet image to train a classifier for image classification. We continue this step in the next chapter (link to webpage or Notebook).

For more visualization goodies I strongly suggest checking out some of the following:


BONUS KNOWLEDGE!!!

Read from OGC WFS GeoJSON response into a GeoDataFrame

Don't worry too much about the specifics here, this is an example just to show the power of these common formats for sharing data and getting really informative datasets from all sorts of databases

Use an Open Geospatial Consortium (OGC) Web Feature Service (WFS) request to obtain geospatial data from a remote source. OGC WFS is an open geospatial standard.

We won’t go into all details about what’s going on. Suffice it to say that we issue an OGC WFS request for all features from the layer named “oa:goainv” found in a GeoServer instance from NANOOS, requesting the response in GeoJSON format. Then we use the geojson package to “load” the raw response (a GeoJSON string) into a geojson feature object (a dictionary-like object).

The “oa:goainv” layer is a global dataset of monitoring sites and cruises where data relevant to ocean acidification are collected. It’s a work in progress from the Global Ocean Acidification Observation Network (GOA-ON); for additional information see the GOA-ON Data Portal.

Let’s examine the general characteristics of this GeoJSON object, including its geo_interface interface, which we discussed earlier.

Now use the from_features constructor method to create a GeoDataFrame directly from the geojson.feature.FeatureCollection object.

Finally, let’s visualize the data set as a simple map overlay plot; and as an example, display the values for the last feature.

What kind of data is contained in each of these points? Well let's take a glimpse: