$\newcommand\si{\mathrm{#1}} \newcommand\SI{#1\,\si{#2}} \newcommand\matr{\mathbf{#1}} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min}$

# How to study deforestation in GRASS GIS

Institute for Environmental and Spatial Analysis...University of North Georgia

## 1   Introduction

We want to study deforestation in the state of Georgia. We will use the National Land Cover Database (NLCD) provided by the Multi-Resolution Land Characteristics (MRLC) Consortium. Our objective is to

1. Calculate the average annual deforestation rate overall (%/year) and per capita (mi2/year/capita) for all the counties in Georgia from 2001 and 2016.
2. Find which county has lost the most forest overall (%/year) and per capita (mi2/year/capita).
3. Find which county has lost the least or gained the most forest overall (%/year) and per capita (mi2/year/capita).
4. Calculate the statewide annual average deforestation rate in percentage (%/year) and per capita (mi2/year/capita).
5. Finally, report how many and percentage of counties are worse than the two statewide average rates.

We can download the following land cover data sets and Georgia counties shapefile:

## 2   Creating an XY location

Remember that locations in GRASS can only have one projection. However, it is very rare that different data products we download from different providers are all in the same projection. What it means is that we most likely need to create multiple locations even for the same study. It can be sometimes very challenging to deal with different projections in GRASS because we have to know the projection of each data product we try to import. Without this information, we just cannot create a new location for the data. We can use some external utilities to figure out data projections, but GRASS also provides useful modules. To use these GRASS modules, we have to execute GRASS first obviously, but in what projection? The most basic and simplest projection in GRASS is the XY projection, which is not really any projection at all. This projection does not perform any coordinate transforms when importing/exporting data. However, we can use a location in the XY projection to access GRASS modules.

Now, start GRASS in the new xy location.

## 3   Figuring out the projection of the NLCD data sets

Any data providers should provide metadata about their data. NLCD data sets also come with XML metadata files. For example, for the 2001 NLCD data, we can find NLCD_2001_Land_Cover_L48.xml. The <spref> tag in this metadata file is what we need. NLCD uses its own custom projection. This search will give a list of spatial references that mention NLCD. All these spatial references use the Geodetic Reference System 1980 (GRS80) ellipsoid, but I found that NLCD has recently (not sure when) changed the ellipsoid from GRS80 to the World Geodetic System 1984 (WGS84). These two ellipsoids are almost identical, but they are different. To import NLCD data, we need to create a new location in the Albers Conical Equal Area projection with the WGS84 ellipsoid. There are some more details in the metadata file. We have two options: (1) manually construct projection parameters for GRASS or (2) use a GRASS module to create a new location in the NLCD projection. Which method do I prefer? Well... option (2) of course unless I know I already have a location in the same projection.

There are three important modules in GRASS that can import raster data:

• r.in.gdal: Imports raster data into the current or a new location
• r.import: Imports and reprojects raster data on the fly into the current location
• r.external: Links external raster data sources as a pseudo raster map

Since the NLCD data sets we downloaded are huge (40 GB uncompressed), we just want to create links to these files using r.external and extract the Georgia area later for our study. Both r.in.gdal and r.external provide the -j flag that checks the projection and exits. However, only r.in.gdal can create a new location using the -c flag and the projection information. We’ll check the projection first.

r.in.gdal -j input=NLCD_2001_Land_Cover_L48_20190424.img
WARNING: Projection of dataset does not appear to match current location.

Dataset PROJ_INFO is:
name: Albers_Conical_Equal_Area
datum: wgs84
ellps: wgs84
proj: aea
lat_1: 29.5
lat_2: 45.5
lat_0: 23
lon_0: -96
x_0: 0
y_0: 0
towgs84: 0,0,0,-0,-0,-0,0
no_defs: defined

ERROR: proj

## 4   Creating a new location for the NLCD data sets

We can create a new location using r.in.gdal. Since creating a new location won’t import any data, the output parameter’s value doesn’t matter.

r.in.gdal -c input=NLCD_2001_Land_Cover_L48_20190424.img output=dummy location=aea-nlcd-wgs84

Note that we try to use an informative location name “aea-nlcd-wgs84” in the command above because some old NLCD data sets may be in GRS80.

## 5   Importing the Georgia counties shapefile

Similarly, GRASS has three vector modules that we can use to import vector data:

• v.in.ogr
• v.import
• v.external

Let’s check the projection of Counties_Georgia.shp.

v.in.ogr -j Counties_Georgia.shp
WARNING: Projection of dataset does not appear to match current location.

Dataset PROJ_INFO is:
name: WGS 84
datum: wgs84
ellps: wgs84
proj: ll
no_defs: defined

ERROR: proj

Based on the proj parameter, we know that this projection is unprojected and is in lat/long in the WGS84 datum and ellipsoid. This projection is EPSG:4326, so we’ll create a new location in EPSG:4326 and import the shapefile into the new location.

v.in.ogr input=Counties_Georgia.shp output=ga_counties location=epsg4326-latlong

Since v.in.ogr uses the input name for output by default, we don’t need to specify the output parameter. Again, we used an informative location name “epsg4326-latlong” instead of just “latlong” or “epsg4326.”

## 6   Linking the NLCD data sets

Now, we need to link the NLCD data sets as pseudo GRASS raster maps. For this, we need to exit the xy location and start GRASS again in the aea-nlcd-wgs84 location. Once we’re in this location, we can link the external data sources.

r.external -e input=NLCD_2001_Land_Cover_L48_20190424.img output=nlcd2001
r.external -e input=NLCD_2016_Land_Cover_L48_20190424.img output=nlcd2016

We used the -e flag to extend the extent of the location to each data set.

## 7   Importing the Georgia vector map

We need to import the Georgia counties map into the aea-nlcd-wgs84 location where we’ll perform most of our analysis.

v.proj location=epsg4326-latlong input=ga_counties

## 8   Setting the computational region

In GRASS, there is a concept called computational region. A computational region is a region where any raster computations occur. It can save a lot of computational time especially when the full extent of data set is larger than the area of interest. We’ll set the computational region to the extent of Georgia.

g.region -ap vector=ga_counties res=30

In this command, we use the ga_counties vector map to calculate the extent and res=30 to set the correct resolution. Without the -a flag, the extent will be set first and the resolution will be adjusted slightly. With this flag, the resolution will be forced first and the extent will be adjusted later. Since we know the resolution of NLCD data set, we want to force the correct resolution of 30 meters. The -p flag will print the final settings.

## 9   Extracting the NLCD data sets for the state of Georgia

The CONUS NLCD data sets cover the entire Continental United States (CONUS), but our area of interest is only Georgia. We want to extract only this area to make further analysis more efficient. We can use r.mask and r.mapcalc.

r.mask vector=ga_counties
r.mapcalc expression="ga_nlcd2001=nlcd2001"
r.mapcalc expression="ga_nlcd2016=nlcd2016"
r.mask -r

## 10   Creating a user mapset

We have imported all the raster and vector maps that we need for our study into the same location that can be used for any other projects as well. Now, we want to isolate these “clean” maps from other maps that we’ll create later for our analysis. This way of map organization is just my personal preference to separate out “project-specific” data from “global” data. You can name this new mapset in any way you want. By default, GRASS will suggest your user name. You may use the study name “ga-deforestation.” Create your own mapset and restart GRASS in that mapset.

Make sure to set the computational region for the new mapset.

g.region -p raster=ga_nlcd2001

## 11   Reclassifying the NLCD data sets

NLCD data sets have many classes (2001 and 2016), but we’re only interested in forest area. There are three forest classes including 41, 42, and 43, but, in general, we can say any 40s classes are forest. We can either extract these classes as 1 and the other classes as 0 using r.mapcalc, or reclassify them using r.reclass. In most cases, reclassifying will be much faster than performing map algebra, so we want to use r.reclass. First, let’s reclassify the 2001 land cover map. This module expects a rule file, but we’ll just type in the console (a special filename -). The “*” symbol means the rest of cell values, so we don’t want to use it in the first line.

r.reclass input=ga_nlcd2001 output=ga_forest2001 rules=-
Enter rule(s), "end" when done, "help" if you need it
CELL: Data range is 0 to 95
> 40 thru 49 = 1 forest
> * = 0 non-forest
> end

Then, repeat it for the 2016 map.

r.reclass input=ga_nlcd2016 output=ga_forest2016 rules=-
Enter rule(s), "end" when done, "help" if you need it
CELL: Data range is 0 to 95
> 40 thru 49 = 1 forest
> * = 0 non-forest
> end

## 12   Displaying the maps together

Using the highlighted icon below, we can add multiple maps at once. Zoom to the study area by selecting ga_counties and clicking the zoom to layer button. Go to the properties window of ga_counties, and change the feature and area fill colors to red and transparent, respectively. ## 13   Calculating the average annual deforestation rate in percentage by county

### 13.1   Raster approach

#### 13.1.1   Counting the number of forest cells by county

We will use r.stats.zonal to calculate forest statistics per county. To use this module, we need to create a base raster map that contains a unique cell value per county. To create this map, we need to examine the attribute table of the ga_counties vector map first. Right click on the ga_counties layer and show the attribute data.

In the attribute table, we can see the COUNTYFP10 (county FIPS code) and totpop10 (total population in 2010) columns. For now, we’ll use the COUNTYFP10 column to create a base raster map.

v.to.rast input=ga_counties output=ga_counties use=attr attrcol=COUNTYFP10

Did it work? The module complains that the COUNTYFP10 column is not of a numeric type. County FIPS codes are actually text and raster maps cannot contain text values.

We need to go back to the PERMANENT mapset in the aea-nlcd-wgs84 location to add a new integer column to ga_counties. Leave the current mapset open and start a new GRASS session in the PERMANENT mapset.

v.db.addcolumn map=ga_counties columns="cofips_int int"
v.db.update map=ga_counties column=cofips_int query_column=COUNTYFP10
v.db.select map=ga_counties columns=COUNTYFP10,cofips_int

Close the PERMANENT mapset and go back to your user mapset.

We can now create a new county base map using the cofips_int column.

v.to.rast input=ga_counties output=ga_counties use=attr attrcol=cofips_int

Using the ga_counties raster map, calculate zonal statistics using r.stats.zonal. We use the sum method because forest cells are 1 and non-forest cells are 0 in ga_forest2001 and ga_forest2016 and, by adding those cell values within each county, we can count the number of forest cells per county.

r.stats.zonal base=ga_counties cover=ga_forest2001 method=sum output=ga_forest2001_cosum
r.stats.zonal base=ga_counties cover=ga_forest2016 method=sum output=ga_forest2016_cosum

#### 13.1.2   Calculating the forest area in square miles by county

We use r.mapcalc to calculate the forest area in square miles. Each raster cell is 900 square meters, which is 0.000347491943 square miles.

r.mapcalc expression="ga_forest2001_cosqmi=0.000347491943*ga_forest2001_cosum"
r.mapcalc expression="ga_forest2016_cosqmi=0.000347491943*ga_forest2016_cosum"

#### 13.1.3   Calculating the annual average deforestation rate by county

Use r.mapcalc to calculate the annual average deforestation rate in percentage by county. We’ll assign a positive rate to deforestation and a negative rate to reforestation.

r.mapcalc expression="ga_deforate0116=(ga_forest2001_cosqmi-ga_forest2016_cosqmi)/ga_forest2001_cosqmi/15*100"

#### 13.1.4   Transferring the results to the Georgia vector map

For further analysis, we want to use a vector approach because it will be more efficient than raster computation. We’ll use country centroids to read raster maps and transfer cell values to ga_counties.

v.to.points input=ga_counties output=ga_counties_ct
v.what.rast map=ga_counties_ct raster=ga_forest2001_cosqmi column=forest2001_sqmi
v.what.rast map=ga_counties_ct raster=ga_forest2016_cosqmi column=forest2016_sqmi
v.what.rast map=ga_counties_ct raster=ga_deforate0116 column=deforate0116

Now, we can join these statistics columns to ga_counties. We know that both ga_counties and ga_counties_ct share many common columns, but we’ll use cofips_int, which is unique per county.

v.db.join map=ga_counties column=cofips_int othertable=ga_counties_ct othercolumn=cofips_int

Oops! It doesn’t work? The error message says “Table <ga_counties_ct> not found in database.” Really? Let’s check.

v.info map=ga_counties_ct
+----------------------------------------------------------------------------+
| Name:            ga_counties_ct                                            |
| Mapset:          hcho                                                      |
| Location:        aea-nlcd-wgs84                                            |
| Database:        P:\grassdata                                              |
| Title:                                                                     |
| Map scale:       1:1                                                       |
| Name of creator: hcho                                                      |
| Organization:                                                              |
| Source date:     Mon Nov 18 15:54:14 2019                                  |
| Timestamp (first layer): none                                              |
|----------------------------------------------------------------------------|
| Map format:      native                                                    |
|----------------------------------------------------------------------------|
|   Type of map: vector (level: 2)                                           |
|                                                                            |
|   Number of points:       165             Number of centroids:  0          |
|   Number of lines:        0               Number of boundaries: 0          |
|   Number of areas:        0               Number of islands:    0          |
|                                                                            |
|   Map is 3D:              No                                               |
|   Number of dblinks:      2                                                |
|                                                                            |
|   Projection: Albers_Conical_Equal_Area                                    |
|                                                                            |
|               N:  1389404.23521533    S:   920914.13351305                 |
|               E:  1395209.47978692    W:   950312.65155499                 |
|                                                                            |
|   Digitization threshold: 0                                                |
|   Comment:                                                                 |
|                                                                            |
+----------------------------------------------------------------------------+

OK, there are two linked tables (number of dblinks = 2). How do we their names?

db.tables -p

This command prints ga_counties_ct_1 and ga_counties_ct_2. Which table contains those new columns?

db.describe table=ga_counties_ct_1
db.describe table=ga_counties_ct_2

We know it’s ga_counties_ct_1, so let’s join this table.

v.db.join map=ga_counties column=cofips_int othertable=ga_counties_ct_1 othercolumn=cofips_int

### 13.2   Vector approach

The raster approach can be very slow because of a high number of cells (about 271 million cells). In fact, our analysis lumps statistics per county and all those cells in each county will have the same statistics value. We can do the same statistic calculations more efficiently in the vector format.

#### 13.2.1   Counting the number of forest cells by county

First, we need to copy the ga_counties vector map from the PERMANENT mapset to your user mapset because the PERMANENT mapset is read-only and we cannot add new columns to the map.

g.copy vector=ga_counties@PERMANENT,ga_counties

In this command, the first ga_counties refers to the existing map in PERMANENT and the latter to a new map that will be created in the current mapset.

v.rast.stats map=ga_counties raster=ga_forest2001 colpre=forest2001 method=sum
v.rast.stats map=ga_counties raster=ga_forest2016 colpre=forest2016 method=sum

#### 13.2.2   Calculating the forest area in square miles by county

Add new columns to store the forest area and calculate these columns.

v.db.addcolumn map=ga_counties columns="forest2001_sqmi double, forest2016_sqmi double"
v.db.update map=ga_counties column=forest2001_sqmi query_column="0.000347491943*forest2001_sum"
v.db.update map=ga_counties column=forest2016_sqmi query_column="0.000347491943*forest2016_sum"

#### 13.2.3   Calculating the annual average deforestation rate by county

Add a new column and calculate the deforestation rate per year. We’ll assign a positive rate to deforestation and a negative rate to reforestation.

v.db.addcolumn map=ga_counties columns="deforate0116 double"
v.db.update map=ga_counties column=deforate0116 query_column="(forest2001_sqmi-forest2016_sqmi)/forest2001_sqmi/15*100"

## 14   Calculating the average annual deforestation rate in square miles per capita by county

For this calculation, let’s add a new column again and calculate the per-capita deforestation rate in square miles per year per person.

v.db.addcolumn map=ga_counties columns="deforatecap0116 double"
v.db.update map=ga_counties column=deforatecap0116 query_column="(forest2001_sqmi-forest2016_sqmi)/totpop10/15"

## 15   Calculating the statewide deforestation statistics

We can sum the forest2001_sqmi, forest2016_sqmi, and totpop10 columns.

v.db.univar map=ga_counties column=forest2001_sqmi
Data element 'vector/ga_counties' was found in more mapsets (also found in
<PERMANENT>)
Using <ga_counties@hcho>...
Number of values: 159
Minimum: 38.819714920302
Maximum: 358.915740626125
Range: 320.096025705823
Mean: 166.786571147858
Arithmetic mean of absolute values: 166.786571147858
Variance: 5859.22424062829
Standard deviation: 76.5455696995475
Coefficient of variation: 0.458943242089252
Sum: 26519.0648125094
v.db.univar map=ga_counties column=forest2016_sqmi
Data element 'vector/ga_counties' was found in more mapsets (also found in
<PERMANENT>)
Using <ga_counties@hcho>...
Number of values: 159
Minimum: 35.553638148045
Maximum: 360.937796242442
Range: 325.384158094397
Mean: 161.264425883254
Arithmetic mean of absolute values: 161.264425883254
Variance: 5863.93401199839
Standard deviation: 76.5763280132861
Coefficient of variation: 0.474849475288014
Sum: 25641.0437154375
v.db.univar map=ga_counties column=totpop10
Data element 'vector/ga_counties' was found in more mapsets (also found in
<PERMANENT>)
Using <ga_counties@hcho>...
Number of values: 159
Minimum: 1717
Maximum: 920581
Range: 918864
Mean: 60928.6352201258
Arithmetic mean of absolute values: 60928.6352201258
Variance: 16051276953.0493
Standard deviation: 126693.63422465
Coefficient of variation: 2.07937751710546
Sum: 9687653

From the above outputs, we found that the total forest areas in 2001 and 2016 in the state are 26,519 and 25,641 square miles, respectively, and the total population in 2010 is 9,687,653 people. It was not really intended, but we’re using the total population in 2010, which is somewhere inbetween 2001 and 2016. That’s not too bad! What is the annual average statewide deforestation rate in percentage? That is, \begin{equation} \frac{\SI{26,519}{mi^2}-\SI{25,641}{mi^2}}{\SI{26,519}{mi^2}}\times\frac{1}{\SI{15}{years}}\times 100\%=\SI{0.22}{\%/year}. \label{eq:deforestation_rate} \end{equation} What is the annual average statewide deforestation rate in square miles per capita? That is, \begin{equation} \frac{\SI{26,519}{mi^2}-\SI{25,641}{mi^2}}{\SI{9,687,653}{people}}\times\frac{1}{\SI{15}{years}}=\SI{6.04\times 10^{-6}}{mi^2/year/person}. \label{eq:deforestation_rate_per_capita} \end{equation}

## 16   Answering the final questions

### 16.1   Using the attribute table

Now, we have two statewide deforestation statistics from Eqs. \eqref{eq:deforestation_rate} and \eqref{eq:deforestation_rate_per_capita}, and can compare individual counties to these statistics in the attribute table of ga_counties. Right click on the ga_counties layer and open the attribute table. Sort the table by the deforate0116 or deforatecap0116 column, and compare the results.

### 16.2   Using v.db.select

We can use v.db.select.

v.db.select map=ga_counties columns="NAME10,max(deforate0116)"

This command will give you the worst county in terms of the deforestation rate, which is Brantley County.

v.db.select map=ga_counties columns="NAME10,min(deforate0116)"

This command will give you the best county in terms of the deforestation rate, which is Irwin County.

Similarly, the following two commands will give you the worst and best counties in terms of deforestation per capita, which are Clinch and Taliaferro Counties, respectively.

v.db.select map=ga_counties columns="NAME10,max(deforatecap0116)"
v.db.select map=ga_counties columns="NAME10,min(deforatecap0116)"

How many counties are worse than the statewide average in percentage?

v.db.select map=ga_counties column="count(*)" where="deforate0116 > 0.22"
Data element 'vector/ga_counties' was found in more mapsets (also found in
<PERMANENT>)
Using <ga_counties@hcho>...
count(*)
73

How many are better than the statewide average in percentage?

v.db.select map=ga_counties column="count(*)" where="deforate0116 < 0.22"
Data element 'vector/ga_counties' was found in more mapsets (also found in
<PERMANENT>)
Using <ga_counties@hcho>...
count(*)
86

There are total 159 counties in Georgia, so these two numbers add up correctly.

Similarly, how many counties are worse or better than the statewide average in square miles per capita?

v.db.select map=ga_counties where="deforatecap0116 > 6.04e-6" column="count(*)"
Data element 'vector/ga_counties' was found in more mapsets (also found in
<PERMANENT>)
Using <ga_counties@hcho>...
count(*)
74
v.db.select map=ga_counties where="deforatecap0116 < 6.04e-6" column="count(*)"
Data element 'vector/ga_counties' was found in more mapsets (also found in
<PERMANENT>)
Using <ga_counties@hcho>...
count(*)
85

## 17   Summary

To summarize our study,

1. Done in raster and vector maps
2. Brantley County has lost the most forest in percentage (2.07%) and Clinch County in square miles per capita (0.00051 mi2/person).
3. Irwin County has gained the most forest in percentage (0.48%) and Taliaferro County in square miles per capita (0.00024 mi2/person).
4. The statewide annual average deforestation rate in percentage and square miles per capita were $\SI{0.22}{\%/year}$ and $\SI{6.04\times 10^{-6}}{mi^2/year/person}$, respectively.
5. Total 73 counties out of 159 (46%) were worse than the statewide average in percentage and total 74 out of 159 (47%) were worse than the statewide average in square miles per capita.

This concludes the study of deforestation in Georgia. Super easier than ArcGIS Pro, right? ;-)