OB2 Final Report - Section IIA

After the calculated bands were created and principal components analysis was run, CMATRIX, SIGDIST, DIVERGE, and SIGMAN were used to help evaluate the merits of each 'natural' and calculated band, the three major classification algorithms, and the veracity of signature selection. ELLIPSE scatterplots showed that some calculated bands had unusual digital number distributions which were very linear or banded (instead of a scatter of points). This probably resulted from the original image striping (Table IIA.5) and limited range of original digital numbers.

Haze removal(TC4) and histogram-based haze removal (Figure IIA.4) were employed to remove the additive effects of haze so that the atmospheric effect was not compounded by the arithmetic processes of other band calculations. New 16-band images for May and July were created which incorporated the original satellite data and the specialized VI, NDVI, 3/7, 3/4, TC1, TC2, TC3, TC4, and 5/2 bands (Table IIA.6). Unfortunately the July TC4 transformation was very strongly banded and banding was noted in part of the May image; both images were very noisy.

Principal components bands 1, 2, and 3 were created from the reflective May and July imagery and recombined with calculated bands into 12-band May and July images. SIGMAN (signature variance/covariance) and DIVERGE (various class weights, number of bands for comparisons, and algorithms) were used to evaluate signature discrimination in an attempt to choose the most appropriate band combinations for the desired classes.

Based on best separability listings, a new image file was created with the apparent ideal bands: the six May reflective bands, six July reflective bands, July NDVI, July TC1, July TC2, and May TC1. May and July reflective bands were also corrected for sun elevation and a new imagery file combining these normalized bands, July TC1, July TC2, and May greenness (TC2) was classified with a series of different a priori values.

Paul Hopkins (Syracuse University, personal communication March 13, 1996) is classifying the same scenes utilized for this project in the New York State portion of the Northern Forest Lands initiative. He indicated that striping in the imagery is unavoidable and he considers it part of the sensor noise/variance when processing. His analysis indicates that absolute values for striping where the land cover is uniform is small - perhaps 2-3 digital numbers - but that the dynamic range could be large. He found the swaths to be approximately 16-18 lines wide and that the striping appears to be exacerbated when the sensor passed over bright clouds - the detectors were probably saturated and needed time to adjust. A more serious problem is the pixel dropouts which occur in blocks. Steve DeGloria (CLEARS, Cornell University; personal communication March 14, 1996) observed that image geometry in the precision corrected scenes is not always good. These observations have certainly been verified with this project. Indeed, the striping appears to be exaggerated by several of the band manipulations because the calculations tend to reduce the range of digital numbers and highlight pixel patterns. Many band calculations produced very gray (low tonal variation) and sometimes noisy and/or banded files.

The primary evaluation technique for classified images was a screen display of the image (READ) with individual classes shown with OVERLAY. Visual checks were also made against maps and aerial photographs.

THRESH was used on many classifications to help evaluate signature "purity" (with histogram analysis) and analyze which classes were confused with one another (with on-screen analysis of image; photos and maps as ancillary data). Classes were displayed with on-screen class overlay on the image and THRESH used to mask out incorrectly classified pixels with histogram selection based upon distance from signature mean (Figure IIA.5).

Table IIA.6. Band Manipulations - Old Forge Test Area.

Algorithm Name	Algorithm (bold number refers to band)	Purpose	Reference
	3/4	highlights vegetative matter	Jensen, 1996
	3/7	roads and cultural features in light tones	Lillesand and Kiefer, 1994
	5/2	enhances different types of vegetation	Avery and Berlin, 1992
Vegetation Index (VI)	4/3	increased digital number indicates increased vegetation; ratioing also decreases differences in brightness due to topographic slope and aspect	Avery and Berlin, 1992
Normalized Difference Vegetation Index (NDVI)	(4-3)/(4+3)	increased brightness indicates increased photosynthetic vegetation	Avery and Berlin, 1992; Jensen, 1996
Transformed Vegetation Index (TVI)	[(4-3)/(4+3) + 0.5]^1/2* 100	proportional to green biomass	Lillesand and Kiefer, 1994
Kauth-Thomas Tasseled Cap Transformation - TC1	use ERDAS algorithm	brightness	ERDAS, 1991 (equation) Jensen, 1996
- TC2	use ERDAS algorithm	greenness	ERDAS, 1991 (equation) Jensen, 1996
- TC3	use ERDAS algorithm	other (moisture)	ERDAS, 1991 (equation) Jensen, 1996
- TC4	0.84611 - 0.70312 - 0.46403 - 0.00324 - 0.04925 - 0.01197 + 0.7879	haze removal	Lavereau, 1991
Haze	histogram-based per band	removes additive effect of haze; use before other band calculations	Jensen, 1996
Principal Components Analysis - PC1	use ERDAS algorithm	reduces dimensionality of image; compresses most of the imagery information into a few bands; the first three principal component bands should incorporate most of the image information (accounts for 97.8% of the May scene variation and 97.6% of the July scene variation)	Avery and Berlin, 1992; ERDAS, 1991 (equations); Jensen, 1996; Lillesand and Kiefer, 1994
- PC2	use ERDAS algorithm
- PC3	use ERDAS algorithm
Sun Angle Normalization	digital number of each band /sin 59^o (May) digital number of each band/sin 57^o (July)	normalizes sun angle to 90^o, used when combining imagery from different dates	Avery and Berlin, 1992; Lillesand and Kiefer, 1994

Figure IIA.4. Histogram-based haze removal. A partial histogram is shown for band 1 of the May 1992 southern scene. The haze factor is shown at the arrow. This method, described by Avery and Berlin (1992) produced similar results to deep water habitat-based haze removal for both the May and July southern scenes. Header listing for image file: OLDFORGE.LAN Date statistics printed: Date statistics created: This file has 635 rows, and 489 columns There are 14 bands in this data set This image is geo referenced to a Transverse Mercator coordinate system The upper left corner has coordinate: 498850, 4844800 The cell size is (X,Y): 25, 25 Upper left corner data file coordinate (X,Y) is: 1275, 1169 This file contains 8-bit data This is for band number 1 May Band 1 - Blue green Minimum data value is 52 Maximum data value is 255 Mean value = 62.15878 Standard deviation = 7.048968 Median = 61 Mode = 60
Data Value	POINTS	%

52	1.	0.01%	I
53	1.	0.01%	I
54	1.	0.01%	I
55	27.	0.22%	I
56	87.	0.70%	IX
57	270.	2.17%	IXXXX
58	558.	4.47%	IXXXXXXXX
59	1237.	9.94%	IXXXXXXXXXXXXXXXXXXX
60	2544.	20.44%	IXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
61	2140.	17.19%	IXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
62	1049.	14.86%	IXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
63	1664.	13.37%	IXXXXXXXXXXXXXXXXXXXXXXXXXX
64	849.	6.82%	IXXXXXXXXXXXXX
65	407.	3.27%	IXXXXXX
66	126.	1.01%	IX
67	138.	1.11%	IXX
68	77.	0.62%	IX
69	51.	0.41%	I
70	45.	0.36%	I
71	39.	0.31%	I
72	35.	0.28%	I
73	27.	0.22%	I
74	24.	0.19%	I
75	21.	0.17%	I
76	22.	0.18%	I
77	11.	0.09%	I
78	8.	0.06%	I
79	6.	0.05%	I
80	8.	0.06%	I
81	12.	0.10%	I
82	8.	0.05%	I
83	6.	0.05%	I
84	13.	0.10%	I
85	10.	0.08%	I

Figure IIA.5. THRESH histogram used for redefining class boundaries. The X axis represents distance from the signature mean and the Y axis indicates the number of pixels. The class histograms and the pixels assigned to the class may be overlayed on the imagery. As the tail of the histogram is cut off by interactive selection, the percentage of the class pixels being removed is shown and the eliminated pixels are illustrated in a different color on the imagery.

Several classes proved to be problematic. In discussions with APA personnel, urban categories appeared to be important land cover types. Urban Mixed was virtually impossible to distinguish with the processing techniques employed. Increasing training sample numbers, ensuring reasonable within-class variance, haze removal techniques, principal components analysis, THRESH histogram manipulations, and band recombinations were not successful in separating out this category. Even with a priori settings exaggerated, significant areas of Urban Mixed were being placed in areas of remote forest: areas that had no chance of change between photo and imagery dates. A Kauth-Thomas Tasseled Cap analysis (Brightness, Greenness, Third) plus TC4 (haze) was conducted using all but the thermal band. Jensen (1996) noted that the first three Tasseled Cap parameters enhance separability of urban, water, and wetland classes. DIVERGE was used to ascertain if any bands were best for picking out Urban Mixed or Urban Open. Nothing stood out except the thermal bands and they were not useable. Finally, it was concluded the Urban Mixed could not be classified reliably and it was removed as a class.

Although there was absolute confidence that forested classes were more predominant than all other classes, and that Open was more predominant than Urban Open, a priori values did not seem to substantially affect the classification even when outrageous values (maximum 10, minimum 1) were used. Urban Open was often assigned inappropriately and, like Urban Mixed, did not appear to be separable and was removed as a class.

The other major problem class was Deciduous. Principal components analysis helped demonstrate the need for Deciduous/Open because it showed up as a distinctly different signature. These areas were routinely misclassified with all imagery and appeared to be well-lit deciduous stands or open-canopied deciduous forest, often with rock. Image ratioing was attempted to remove any shadow effects and thereby better discriminate Deciduous and Deciduous misclassed as Open due to sun angle. All attempts (band ratios, calculated bands, and principal components) showed these areas to be different. Therefore, based on classification and verification with photos, training samples of Deciduous/Open were included in the analysis.

With all Old Forge quadrangle classification attempts, Urban Mixed tended to be too extensive, no matter what the manipulation, and Barren was often too widespread. Barren versus cloud was frequently problematic because clouds often posses a quite variable density and therefore signature. Urban Open was often identified correctly as a ring around clouds, demonstrating the necessity of a proximity search around cloud and cloud shadow. When brightness (TC1) for July and May were combined with the reflective bands (plus July NDVI and TC2) far too many errant clouds resulted. Conifer was frequently used as a visual meter for classification accuracy because there were several well-defined small to moderate-sized clumps. Conifer was most often mis-classed as Mixed or Cloud Shadow. Surprisingly, several of the July images produced better Conifer renditions than did the comparable May image.

Visual comparisons and CMATRIX did not show much difference between the classification results of the reflective only bands and the reflective plus band ratios. Neither haze removal nor normalizing the sun elevation angle were effective in improving the discrimination between classes significantly. Neither Urban Open nor Urban Mixed are easy to distinguish with image processing.

CMATRIX was used to compare each class in eight image files with the three major classification algorithms: Maximum Likelihood, Minimum Distance, and Mahalanobis Distance classifiers. The Mahalanobis classifier appeared to be good for the Urban Open and better, although still poor, for Urban Mixed. However, a classification of the entire southern image using the Mahalanobis classifier was extremely poor for both Urban categories. Even histogram-based class thresholding could not redefine these categories to reasonable limits. When the thermal band was included, improved percentages were noted in CMATRIX for many classifications but actual classifications were poor. Therefore, although CMATRIX may be helpful for gross differences in evaluating signatures, it is not a reliable way to assess an entire classification.

An unsupervised classification was attempted with the 12-band southern image file (reflective bands) with wetlands masked. ISODATA, the ERDAS clustering program, was used to create a maximum of 50 signature clusters with pixel skips of 12X and 14Y, and a 95% converging threshold with no more than 10 iterations. Unfortunately a histogram buffer overflow occurred and histograms could not be saved but sampled polygons were retained. MAXCLAS used the signature clusters and, with all 12 bands and a Maximum Likelihood classification (no processing of 0's, no first-pass parallelepiped classification, and no a priori values) took about 140 hours to process. The 50 classes were re-grouped to Deciduous, Conifer, Mixed, Deciduous/Open, Open, Open-Vegetated, May Clouds, May Cloud Shadow, July Clouds, and July Cloud Shadow. After visually examining the classification with DISPLAY and CLASOVR, the signature clusters were regrouped. If an unsupervised classification were to be used, Clouds and Cloud Shadow would have to be separated with a buffer zone since their variable nature makes reliable classification around the edges of these features problematic. While the resultant classification was satisfactory, the supervised classification showed better results both visually and statistically (i.e., classification accuracy assessment).