Patrick Gray (patrick.c.gray at duke) - https://github.com/patrickcgray

Chapter 3: Plotting and visualizing your data with matplotlib

Introduction

matplotlib is a very powerful plotting library for making amazing visualizations for publications, personal use, or even web and desktop applications. matplotlib can create almost any two dimensional visualization you can think of, including histograms, scatter plots, bivariate plots, and image displays. For some inspiration, check out the matplotlib example gallery which includes the source code required to generate each example.

matplotlib API - state-machine versus object-oriented

One part of matplotlib that may be initially confusing is that matplotlib contains two main methods of making plots - the object-oriented method, and the state-machine method.

A very good overview of the difference between the two usages is provided by Jake Vanderplas. Specifically,

If you need a primer on matplotlib beyond what is here I suggest: Python Like you Mean It or the matplotlib users guide.

In general, I think you should use the object-oriented API. While more complicated, is a much more powerful way of creating plots and should be used when developing more complicated visualizations. I always recommend the OO API.

Image display

We will begin by reading our example image into a NumPy memory array as shown in Chapter 2

Let's check out some of this datasets characteristics

Note that if you have the full dataset read in with you can do:

This pulls out the band at index 2 which is the 3rd band because python indexing starts at 0

Which is equal to simply doing:

If you have a large dataset the second dataset.read(3) may be preferable because you save on memory.

Basic plotting

First thing to do is to import matplotlib into our namespace. I will be using a special feature of the IPython utility which allows me to "inline" matplotlib figures by entering the %matplotlib inline command.

With matplotlib imported, we can summon up a figure and make our first plot:

As I mentioned earlier, we'll be using Matplotib’s object oriented API, instead of its function-based API. Let’s briefly draw a distinction between these two APIs:

Although the code that invokes the functional API is simpler, it is far less powerful and flexible than the object-oriented API, which produces figure (fig) and axes (ax) objects that we can leverage to customize our plot. You will likely see tutorials utilize the functional API in their examples, so it is useful to understand the distinction here. Shortly, you will learn how to leverage Matplotlib’s object-oriented API in powerful ways.

Plotting 2D arrays

One typical thing that we might want to do would be to plot one band against another. In order to do this, we will need to transform, or flatten, our 2 dimensional arrays of each band's values into 1 dimensional arrays:

We have retained the number of entries in each of these raster bands, but we have flattened them from 2 dimensions into 1.

Now we can plot them. Since we just want points, we can use scatter for a scatterplot. Since there are no lines in a scatterplot, it has a slightly different syntax.

If we wanted the two axes to have the same limits, we can calculate the limits and apply them

What if we want to view average intensities of each band?

It is common to have this high reflectance in the NIR band (band 4 here)

Plotting 2D arrays - images

With so much data available to look at, it can be hard to understand what is going on with the mess of points shown above. Luckily our datasets aren't just a mess of points - they have a spatial structure.

To show the spatial structure of our images, we could make an image plot of one of our bands using imshow to display an image on the axes:

Well, it looks like there is something going on - maybe a river in the center and some bright vegetation to the bottom left of the image. What's lacking is any knowledge of what the colors mean.

Luckily, matplotlib can provide us a colorbar.

If we want a greyscale image, we can manually specify a colormap:

We'll also begin showing some more advanced plotting capabilities in matplotlib such as putting multiple plots in a single output. Here we'll compare the NIR to Red in greyscale.

Plotting 3D arrays - multispectral images

Greyscale images are nice, but the most information we can receive comes from looking at the interplay among different bands. To accomplish this, we can map different spectral bands to the Red, Green, and Blue channels on our monitors.

Before we can do this, the matplotlib imshow help tells us that we need to normalize our bands into a 0 - 1 range. To do so, we will perform a simple linear scale fitting 0 reflectance to 0 and 80% reflectance to 1, clipping anything larger or smaller.

Remember:

If we are going from a Int16 datatype (e.g., reflectance scaled by 10,000x) to a decimal between 0 and 1, we will need to use a Float!

We've got a correctly shaped and normalized image now.

Let's calculate NDVI on this dataset to compare it to the color image.

Now let's plot it all:

Pretty informative now!

There is a slighly easier way to do this

rasterio has its own show function that is built to handle rasters. We saw this in chapter 1 briefly.

And if we want to show a color image we can use rasterio's built in adjust_band function

displaying all three bands side by side

If we want to see a histogram of the data we use the show_hist function

Let's look at an overlapping histogram, maybe that'll be more informative:

Let's look at another raster

Check out more documentation for the rasterio plotting functions here: https://rasterio.readthedocs.io/en/latest/api/rasterio.plot.html#rasterio.plot.show

Wrapup

We've seen how matplotlib can be combined with numpy and rasterio to easily visualize and explore our remote sensing data. In the next chapter (link to webpage or Notebook) we will cover how to open and read vector data.