How to read a GeoTIFF file as a NumPy array using GDAL

Dr. Huidae Cho
Institute for Environmental and Spatial Analysis...University of North Georgia

1   Introduction

In this exercise, we’ll learn how to read a GeoTIFF file as a NumPy array using the GDAL module. NumPy arrays are a very useful data format for raster maps because NumPy offers powerful syntax for raster operations.

2   Python code

# we'll use the gdal module
from osgeo import gdal

# read in elevation.tif as a GDAL dataset object; of course, use your path to
# elevation.tif; it'll lock elevation.tif
ds = gdal.Open(r"P:\tmp\elevation.tif")
type(ds)

# see the list of attributes
dir(ds)

# let's see how many bands we have in elevation.tif
ds.RasterCount

# get the first and only band; note here that band numbering is 1-based
band = ds.GetRasterBand(1)
type(band)

# read the band as a matrix
arr = band.ReadAsArray()
type(arr)

# now, release elevation.tif; no more access to ds and band
del ds, band

# what is its dimension?
arr.shape

3   Explanation

In Python, modules are containers of similar functionalities including classes, functions, and constants. For example, if we need math functions and constants, we need to import math. In the above example, since the default Python interpreter doesn’t know how to read GeoTIFF files, we’ll use the GDAL module. That’s why we need to import gdal.

# we'll use the gdal module
from osgeo import gdal

In programming, there is a concept called namespace. Let’s say different modules happened to have functions with the same name and we need to import both modules. Then, there will be no way of telling which function belongs to which module because it has the same name in both modules. So to avoid this naming conflict, we use namespaces. In Python, we use the module name as a namespace. The GDAL module has a function called Open() that can be used to open a GeoTIFF file. Since it belongs to the gdal namespace (module), we call it gdal.Open(), not just Open(). In fact, Open() won’t work because the plain Python interpreter doesn’t implement this function. Pass the file path to this function to open elevation.tif and grab its output. Why do we grab its output? Without storing whatever is returned from gdal.Open(), we won’t be able to access the opened file. Think of this as the key or door handle to the file. We often call it a file handle because we use it to access the file like opening the door to it. Where did we store the file handle to elevation.tif? Yes, ds (dataset).

For file paths, we can use three different formats: r"P:\tmp\elevation.tif", "P:\\tmp\\elevation.tif", or "P:/tmp/elevation.tif". Windows uses backslashes (\) to separate path components while other operating systems use forward slashes (/), but Python supports forward slashes as well on Windows. Do you remember escape sequences? Since backslashes have special meanings depending on the character that follows it, we need to suppress them to treat characters after a backslash as is. There are two ways of achieving that: raw strings or double backslashes. Raw strings don’t recognize escape sequences and return all characters as is. By simply prefixing r before the starting quote, we can create a raw string. For example, r"P:\tmp\elevation.tif". Double backslashes in a non-raw string mean a single backslash, so we can also use "P:\\tmp\\elevation.tif". Last, we can use forward slashes like in "P:/tmp/elevation.tif".

# read in elevation.tif as a GDAL dataset object; of course, use your path to
# elevation.tif; it'll lock elevation.tif
ds = gdal.Open(r"P:\tmp\elevation.tif")

We can check the data type of ds by typing type(ds).

type(ds)

We can inspect what attributes are inside ds by typing dir(ds). Any Python objects (class instances, functions, variables, etc.) have attributes including attribute functions and variables. We can check what functions and variables are available inside the ds object by typing dir(ds).

# see the list of attributes
dir(ds)

One of them is RasterCount. Did you know that raster datasets can have multiple bands such as red, green, blue, and near infrared? This attribute variable tells us how many bands are available in elevation.tif.

# let's see how many bands we have in elevation.tif
ds.RasterCount

Let’s take the first band from elevation.tif using ds.GetRasterBand(1) and assign this band to band.

# get the first and only band; note here that band numbering is 1-based
band = ds.GetRasterBand(1)

Check the data type of band using type(band).

type(band)

This band object is not directly manipulable, so let’s extract the elevation data as a matrix form using band.ReadAsArray() attribute function and assign this matrix to arr.

# read the band as a matrix
arr = band.ReadAsArray()

Check the data type of arr using type(arr).

type(arr)

Once we read the data as a matrix, we can destroy the ds and band objects using del ds, band. That’s deleting ds and band from the memory.

# now, release elevation.tif; no more access to ds and band
del ds, band

Since we didn’t destroy the arr object, it’s still alive. Let’s check its dimension using arr.shape.

# what is its dimension?
arr.shape