GeoComputation 99 Logo

 

A comparison of supervised imagery classification using analyst-chosen and geostatistically-chosen training sets

James A. Shine and Gery I. Wakefield
U.S. Army Topographic Engineering Center, ATTN: CETEC-TR-G, 7701 Telegraph Road, Alexandria VA 22315-3864 U.S.A.
E-Mail: jshine@tec.army.mil

Abstract

A continuing challenge in image processing is the classification of spatial imagery into categories. Examples of these categories are: roads, urban areas, evergreen trees, deciduous trees, water, and grasslands. The accurate classification of images has a wide range of applications, including reconnaissance, assessment of environmental damage, land use monitoring, urban planning, and growth regulation. One classification approach is supervised classification. The imagery is divided into training data and test data. The correct categories are known for the training data, and some classification approach is specified based on this data. This approach is then used to classify the test data. Some approaches include classification trees, minimum distance statistical approaches, and neural networks. The choice of a good training set can have significant influence on the success of a classification approach. A common technique in imagery classification is selection of good test data points by an experienced analyst. An image or set of registered images is viewed in image processing software such as ERDAS Imagine, and some pixels that are unambiguous in each of the desired classification categories are selected and used for training data. The entire image is then classified based on the training data metrics. In cases where ground truth is available, classification accuracy can be assessed by use of an error matrix. This method can be time consuming and requires expertise on the part of the analyst, something not available in all classification settings. A recent approach uses spatial variation scales from geostatistical analysis to choose the training points rather than an analyst=s choices. A semivariogram is computed on the pixel values of an image, and a spatial variation scale is determined from this semivariogram. A grid of points chosen from this scale (usually 50 percent of the scale) is then selected for the training data and is then used for the classification. This approach does not require an experienced analyst. Experiments comparing these two approaches have been conducted using several registered images of Fort A.P. Hill, Virginia, which have accompanying accurate ground truth. The experiments were performed in ERDAS Imagine 8.3 using a maximum distance supervised classifier. The results of the error matrices for the two approaches were not statistically different. Geostatistically chosen training data has the potential to reduce the need for experienced image analysts to perform imagery classification. Further developments may make further automation of the imagery classification process possible.

1. Introduction

A continuing challenge in image processing is the classification of spatial imagery into categories. Categories of interest can vary depending on the application; some common categories include urban areas, roads, bodies of water, grasslands, scrub brush, and various tree species (deciduous hardwoods, evergreen pines, mixed). The generation of data products resulting from a category classification of imagery can be used for a variety of applications. Some of these applications include reconnaissance, assessment of environmental damage, land use monitoring , radiation monitoring (Badr 1993, Cressie 1991), urban planning, growth regulation, soil evaluation, and crop yield assessment (Oliver and Webster 1989).

Geocomputational software packages such as ERDAS Imagine include various classification algorithms as part of their basic menu of functions and tools, or as attachable modules. One such classification approach involves selecting a subset of the data set for which the correct classification categories are already known. The classification algorithm uses these values to "train" its parameters on this subset in an optimal manner, usually by minimizing some error metric. The trained algorithm is then used to classify unknown data values ( "test" values) into these same categories. If the correct categories are also known for these test values, the accuracy of the classification can be computed. Designed experiments on the same data can be used to compare classification accuracy for different approaches.

Selection of training points has traditionally been performed by an analyst who visually categorizes data points and chooses a sufficient sample of points from each category to train the algorithm. This has two drawbacks: it is generally time-consuming, and it requires an analyst capable of performing such a visual classification, an expertise that not all image analysts have in abundance. It would thus be desirable to find any approaches which would minimize or eliminate the necessity of the human selection of training points, without sacrificing classification accuracy on the test data. A sampling strategy from geostatistical analysis was considered a promising effort in this direction, and the rest of this paper describes some initial efforts to evaluate the potential of this idea.

2. Geostatistics

Geostatistics is a term commonly used to describe a set of techniques that model spatial variation in data and use these models to estimate or classify other data based on these models. Geostatistics developed out of empirical approaches developed by South African mining engineers in the 1950s and 1960s (Krige 1989) and were given theoretical validity by the development of random function theory in the 1960s (Matheron 1970).

The basic concept of geostatistics is that of scales of spatial variation. Data which is spatially independent show the same variability regardless of the location of data points. However, spatial data in most cases is not spatially independent. Data values which are close spatially show less variability than data values which are farther away from each other. The exact nature of this pattern varies from data set to data set; each set of data has its own unique function of variability and distance between data points. This variability is generally computed as a function called semi-variance, which can be described by

Where z denotes a data value at a particular location, h is the distance between data values, and n(h) are the number of pairs of data values a distance of h apart.

A plot of the semi-variance versus distance between data values is known as a semi-variogram, or simply as a variogram; we will use the latter term in this paper. The variogram is the central analytical tool used in geostatistics. A sample variogram is given in Figure 1:

Figure 1 shows that at short distances, the variation is small, and the variation increases with distance until it stabilizes at a certain distance. This distance is a scale of spatial variation which can be used for several purposes. In standard geostatistics, various models are fitted to the variogram and the best model is chosen to estimate data values at unknown locations, a process known as kriging. There are various forms of kriging estimation, but since kriging is not part of the work reported in this paper, they will not be discussed. The main use of geostatistics in the work reported here is the acquisition of a primary scale of variation to sample points for supervised classification.

(It should also be noted that the variogram example shown here is a very simple case; some variograms show no relationship between semi-variance and distance, and others show more than one stabilization points for multiple scales of variation. However, again this does not apply to the work reported here.)

The hypothesis which is being examined here is that since the variogram gives the scale or scales of the data’s spatial variation, these scales can be used to determine sampling strategies for various applications. In this case, the hypothesis is that if the variogram of the data indicates a specific spatial variation scale, sampling at less than this scale (typically half the distance) all variation will be detected. This would mean that training points could be chosen on a random grid based on this scale rather than by the more traditional, time-consuming and expertise-requiring approach of hand-picking data points for each category from the image.

3. Study area: Fort A.P. Hill

The area chosen for this paper is Fort A.P. Hill, a U.S. military reservation in central Virginia. The U.S. Army Topographic Engineering Center has several sources of imagery for this area, including 20-meter resolution multispectral SPOT imagery, and 1-meter resolution Computerized Airborne Multicamera Imaging System (CAMIS) imagery. (Resolution is the distance represented by one pixel of the imagery.) There is also accurate ground truth information available for significant segments of the Fort A.P. Hill area, which permits both training for supervised classification, and accuracy assessment of the classification testing.

Fort A.P. Hill’s geographic location is shown in Figure 2A; a mosaic photo of Fort. A.P. Hill is shown in Figure 2B.

FIGURE 2A

FIGURE 2B

4. Method

Previous geostatistical analysis of Fort A.P. Hill has revealed a scale of spatial variation at approximately 320 meters (Oliver & Webster, 1998). This distance was halved as is the current wisdom in geostatistical analysis to create a grid of 99 points 160 meters apart. These points were then used to train a maximum distance supervised classifier; the classifier was then tested on 256 points outside of the training region. For the analyst-chosen approach, four or five points representing each of the categories to be classified were chosen by analysts and these were used to train the same classifier, which then was also tested on the same 256 points.

The metrics used for comparison of approaches were the error matrix and the kappa coefficient. The error matrix (also called the confusion matrix) is a k x k matrix where k is the number of classification categories. The error matrix gives the counts of how each of the test points was classified. The rows represent the actual classified data by category, and the columns represent the reference data by category. Correct classifications will be recorded in the matrix diagonals, while incorrect classifications will go in off-diagonal positions. The error matrix allows measure ment of overall accuracy, category accuracy, producer’s accuracy (percentage correct in the columns) and user’s accuracy (percentage correct in the rows). Error matrices for all experiments are given in the next section (Congalton 1991).

For this work, the kappa coefficient is of more interest. The kappa coefficient is a measure of association between two categorical variables. It is widely used in remote sensing classification to assess the degree of success of a classification approach. In more general categorical data analysis, the kappa coefficient is used to measure the agreement between two observers on the same data; for remote sensing, it is used to measure the agreement between the classification approach and the actual answers.

A value of 0 indicates no agreement between the two observers except that expected by chance; a value of 1 indicates perfect agreement, with all the values falling on the diagonals (Agresti 1990).

If n(i,j) represents the error matrix count in the ith row and jth column, n(i,+) represents the sum of the ith row, n(+,j) represents the sum of the jth column, and n represents the total count in all cells of the error matrix, the estimate for kappa is

where

and

Since we are interested here more in comparison between two different kappas, we also need an estimated variance for kappa; this is a long formula which may be found in (Congalton 1999), p. 50. The derivation of the variance formula can be found in (Fleiss 1969). If k1 is the estimated kappa for one approach, k1var is its estimated variance, k2 is the estimated value for the second approach, and k2var is its estimated variance, then

Will be a standardized normal variable, and we can test the hypothesis that the two kappas are equal versus the alternative that they are not by comparing Z against normal distribution functions and rejecting if |Z| is greater than a certain amount (1.96 for a 95% test).

Initial kappa coefficients were computed in ERDAS Imagine; the error matrices were re-entered into SAS statistical software to obtain the variance estimates necessary to compare the different tests.

5. Results

Two independent experiments were conducted by the authors. Seven categories were tested: urban, water, grass, evergreen, hardwood, scrub and road. Because none of the results achieved a classification in scrub and road, it was necessary to collapse the categories; scrub and grass were combined into field, and road was combined into urban.

The error matrices for the two experiments are shown in Tables 1 and 2.

TABLE 1:

TEST 1, GEOSTATISTICALLY-CHOSEN

ERROR MATRIX

URBAN

WATER

FIELD

EVERGRN

HARDWD

ROWSUM

URBAN

2

2

3

5

0

12

WATER

0

0

0

1

0

1

FIELD

0

3

17

9

10

39

EVERGR

1

4

4

85

27

121

HRDWD

0

0

0

18

65

83

COLSUM

3

9

24

118

102

256

             
             
             

TEST 1, ANALYTICALLY-CHOSEN

ERROR MATRIX

URBAN

WATER

FIELD

EVERGRN

HARDWD

ROWSUM

URBAN

2

0

10

16

4

32

WATER

0

3

0

5

0

8

FIELD

1

2

13

5

9

30

EVERGR

0

2

1

66

14

83

HRDWD

0

2

0

26

75

103

COLSUM

3

9

24

118

102

256

 

TABLE 2:

TEST 2, ANALYTICALLY-CHOSEN

ERROR MATRIX

URBAN

WATER

FIELD

EVERGRN

HARDWD

ROWSUM

URBAN

3

1

12

25

5

46

WATER

0

3

0

3

0

6

FIELD

0

1

10

1

6

18

EVERGR

0

4

2

68

24

98

HRDWD

0

0

0

21

67

88

COLSUM

3

9

24

118

102

256

             
             
             

TEST 2, GEOSTATISTICALLY-CHOSEN

ERROR MATRIX

URBAN

WATER

FIELD

EVERGRN

HARDWD

ROWSUM

URBAN

3

0

6

7

0

16

WATER

0

3

1

18

0

22

FIELD

0

1

14

5

8

28

EVERGR

0

4

3

65

21

93

HRDWD

0

1

0

23

73

97

COLSUM

3

9

24

118

102

256

 

Results of comparing kappas are shown in Table 3. It can be seen that the geostatistically-chosen method is not significantly different from the analyst-chosen method; indeed it had higher kappas in both cases, although not at a statistically significant level.

TABLE 3:

TEST 1

ANALYST

GEOSTAT

KAPPA

0.44

0.467

KAPPA VARIANCE

0.001681

0.001936

KAPPA STD ERROR

0.041

0.044

95% CONF INT

(.359,.521)

(.380,.554)

Z VALUE

-0.4489

SIGNIFICANTLY DIFFERENT?

NO

     
     

TEST 2

ANALYST

GEOSTAT

KAPPA

0.394

0.427

KAPPA VARIANCE

0.001764

,001748

KAPPA STD ERROR

,042

0.043

95% CONF INT

(.312,.477)

(.342,.511)

Z VALUE

-0.549

SIGNIFICANTLY DIFFERENT?

NO

6. Conclusions and future work directions

The results in this paper show preliminary evidence that choosing test points in supervised classification on a regular grid using a geostatistically-chosen training data, based on a spatial variation scale determined from a data variogram, produces results which are comparable to those produced by analyst-chosen test points. Advantages of the geostatistically-chosen approach are that image interpretation experience is not necessary to choose test points, and the results reported in this paper indicate a small savings in total processing time. A disadvantage is that more sparse categories may not fall on the random points, and some collapsing of categories may be necessary as was the case in these experiments. However, this may actually turn out to be an improvement since depending on the sample size, extra categories may not be statistically justified.

Some directions for future work are: to run more tests on different areas of the A.P. Hill imagery; to run tests using different analysts, expert and otherwise; to run the comparison tests with other imagery; and to compare these two techniques on other supervised classification methods. The issue of the optimum number of categories, and which categories are closest to which other categories, also bears further consideration; a clear delineation between the categories would allow the use of a weighted kappa coefficient, where mistakes close to the "correct" answer are counted less than ones far away. A weighted kappa might reveal more information as well.

References

Agresti, "Categorical Data Analysis", John Wiley & Sons, 1990.

Badr , Oliver, Hendry and Durrani, "Spatial Variation in Soil Radon", Radiation Protection Dosimetry, Vol. 49, No. 4, 1993, pp.433-442.

R.G. Congalton, "A Review of Assessing the Accuracy of Classification of Remotely Sensed Data", Remote Sensing of the Environment, volume 37, pp. 35-46, 1991.

Congalton and Green, "Assessing the Accuracy of Remotely Sensed Data: Principles and Practices", Lewis Publishers, 1999.

Cressie, "Statistics for Spatial Data", John Wiley & Sons, 1991.

Fleiss, Cohen and Everitt, "Large-sample standard errors of kappa and weighted kappa", Psychological Bulletin, Volume 72, pp. 323-327, 1969.

Krige, Guarascio and Camisani-Calzolari, "Early South African Geostatistical Techniques in Today’s Perspective", in "Geostatistics", Kluwer, pp. 1-19, 1989.

Matheron, "The Theory of Regionalized Variables and Its Applications", Fontainebleau, 1970.

Oliver and Webster, "A Geostatistical Basis for Spatial Weighting in Multivariate Classification", Mathematical Geology, volume 21, No. 1, 1989.

Oliver and Webster, "Report of the Geostatistical Analysis of High Resolution Multispectral Imagery", Report for the U.S. Army European Research Office, July 1998.

Stokes, Davis and Koch, "Categorical Data Analysis Using the SAS System", SAS Institute Inc, 1995.