How to calculate the area of a certain class in a GeoTIFF file using NumPy

Dr. Huidae Cho
Institute for Environmental and Spatial Analysis...University of North Georgia

1   Introduction

Try these exercises first:

In this exercise, we want to calculate the area of a certain class in a GeoTIFF file. Let’s calculate the area of waterbodies in an NLCD dataset. Use nlcd2016_clipped.tif.

waterbodies.png

2   Python code

from osgeo import gdal
import numpy as np
import matplotlib.pyplot as plt

def read_geotiff(filename):
    ds = gdal.Open(filename)
    band = ds.GetRasterBand(1)
    arr = band.ReadAsArray()
    return arr, ds

def write_geotiff(filename, arr, in_ds):
    if arr.dtype == np.float32:
        arr_type = gdal.GDT_Float32
    else:
        arr_type = gdal.GDT_Int32

    driver = gdal.GetDriverByName("GTiff")
    out_ds = driver.Create(filename, arr.shape[1], arr.shape[0], 1, arr_type)
    out_ds.SetProjection(in_ds.GetProjection())
    out_ds.SetGeoTransform(in_ds.GetGeoTransform())
    band = out_ds.GetRasterBand(1)
    band.WriteArray(arr)
    band.FlushCache()
    band.ComputeStatistics(False)

nlcd16_arr, nlcd16_ds = read_geotiff("nlcd2016_clipped.tif")

nlcd16_water_arr = np.where(nlcd16_arr == 11, 1, 0)
nlcd16_water_ncells = np.sum(nlcd16_arr == 11)

# https://gdal.org/tutorials/geotransforms_tut.html
dx = nlcd16_ds.GetGeoTransform()[1]
dy = -nlcd16_ds.GetGeoTransform()[5]

print("Total area of waterbodies is", nlcd16_water_ncells * dx * dy)

write_geotiff("nlcd16_water.tif", nlcd16_water_arr, nlcd16_ds)

plt.subplot(211)
plt.imshow(nlcd16_arr)

plt.subplot(212)
plt.imshow(nlcd16_water_arr)

plt.show()

3   Explanation

Import necessary modules and define the read and write functions.

from osgeo import gdal
import numpy as np
import matplotlib.pyplot as plt

def read_geotiff(filename):
    ds = gdal.Open(filename)
    band = ds.GetRasterBand(1)
    arr = band.ReadAsArray()
    return arr, ds

def write_geotiff(filename, arr, in_ds):
    if arr.dtype == np.float32:
        arr_type = gdal.GDT_Float32
    else:
        arr_type = gdal.GDT_Int32

    driver = gdal.GetDriverByName("GTiff")
    out_ds = driver.Create(filename, arr.shape[1], arr.shape[0], 1, arr_type)
    out_ds.SetProjection(in_ds.GetProjection())
    out_ds.SetGeoTransform(in_ds.GetGeoTransform())
    band = out_ds.GetRasterBand(1)
    band.WriteArray(arr)
    band.FlushCache()
    band.ComputeStatistics(False)

Read the data into a NumPy array and get the GDAL dataset of the file. Remember, read_geotiff() returns two items.

nlcd16_arr, nlcd16_ds = read_geotiff("nlcd2016_clipped.tif")

Based on the NLCD classification table, we extract open water cells only using the nlcd16_arr == 11 condition where 11 indicates the class of open water.

nlcd16_water_arr = np.where(nlcd16_arr == 11, 1, 0)

Using the same condition nlcd16_arr == 11, count the number of open water cells. If a cell value is equal to 11, its value in the boolean array nlcd16_arr == 11 is True. If not, its value is False. The shape of this boolean array is the same as that of the original nlcd16_arr array. Now, using np.sum(), we add all the cell values in the boolean array. For this summation, True is equivalent to 1 and False to 0. In other words, we add 1 as many times as there are True cells and 0 as many times as there are False cells. However, since 0 does not really contribute to the summation result and 1 only contributes to it once, the result of this summation is the number of True cells. Which cells are True again? Yes, open water cells. Finally, we have the number of open water cells in the variable nlcd16_water_ncells.

nlcd16_water_ncells = np.sum(nlcd16_arr == 11)

By referring to the geotransform tutorial, we extract the resolution of the NLCD dataset using the GetGeoTransform() function in the input dataset (nlcd16_ds). At index 1, we can find the x-resolution. The y-resolution is at index 5, but it is negative for a north-up image and we need to invert that (-nlcd16_ds.GetGeoTransform()[5] instead of just nlcd16_ds.GetGeoTransform()[5]).

# https://gdal.org/tutorials/geotransforms_tut.html
dx = nlcd16_ds.GetGeoTransform()[1]
dy = -nlcd16_ds.GetGeoTransform()[5]

The area of each cell can be calculated by multiplying the x- and y-resolutions. Then, the total area of waterbodies is calculated by multiplying the number of waterbody cells and the cell area.

print("Total area of waterbodies is", nlcd16_water_ncells * dx * dy)

Save the waterbodies array.

write_geotiff("nlcd16_water.tif", nlcd16_water_arr, nlcd16_ds)

Finally, plot all the arrays.

plt.subplot(211)
plt.imshow(nlcd16_arr)

plt.subplot(212)
plt.imshow(nlcd16_water_arr)

plt.show()