7. NBI-algorithm as a tool for pattern recognition
  NBI in Statistics next »


The NBI-algorithm can be efficiently applied in sequence recognition, including recognition of texts, complex fingerprints, patterns and spectra, “yes – no” sequences, and others. The examples below are based on a climatic data set (comparative climatic data available from National Climatic Data Center: http://lwf.ncdc.noaa.gov).

Figure 17Fig. 17 illustrates how the NBI software performs automated sorting of patterns. The patterns for 34 U.S. cities reflect the data on relative humidity (%) morning and afternoon values based on multi-year records for each month of the year (24 variables) and relative cloudiness (%) based on multi-year average percentage of clear, partly cloudy and cloudy days per month (36 variables). As is seen in Fig. 16, besides the fact that the patterns corresponding to the cities located in geographic proximity are appropriately aligned together, there clearly is quite peculiar logic in the NBI approach to pattern sorting: their alignment reflects monotonous changes in the pattern shapes.

Figure 18AIt appears that the sorting by pattern shapes performed by the NBI-algorithm based on interactive perception of the entire information represents a far more complex approach than mechanical sorting. Fig. 18 and 19 illustrate the NBI clustering of 85 objects representing 85 U.S. cities of various states, defined by the same as in the previous example 60 parameters reflecting monthly Figure 18Baverages of humidity and cloudiness. Fig. 18A is the NBI-produced dendrogram resulting from transformation of the dissimilarity matrix computed based on Euclidian distances for the 60 parameter variables. Fig. 18B shows the sequence of the cities sorted by arithmetic means based on the same dissimilarity matrix without the NBI transformation. Color pre-tagging of the objects is the same in both illustrations. As one can see, the sequences produced by the matrix sorting and by the NBI dendrogram differ dramatically. If, based on the dendrogram results, we apply the same colors to the U.S. map (Fig. 19), it reveals a remarkable correlation between the clustering results and geographic location of the states where the respective analyzed cities are located. Group 1 of the dendrogram corresponds to the Western, North Western and South Western states, and group 2 includes the rest of the states. As is seen, the four subgroups within group 2 also display geographic consistency.

Figure 19To fully appreciate thus displayed NBI efficiency in data analysis, one should realize that humidity and cloudiness represent extremely volatile meteorological parameters and are more of local rather than global characteristics. This example is a compelling demonstration of an implemented system for intelligent data understanding, as it illustrates the system-produced results derived from a set of numbers that practically display no visible correlations whatsoever.

Figure 20Fig. 20 demonstrates a more complex example of grouping of 235 U.S. cities based on 108 climatic parameters. Here, in addition to the above specified 60 parameters, there have been used 48 more parameters representing a normal daily mean, minimum, and maximum temperatures in degrees of Fahrenheit, as well as normal monthly precipitation in inches (all based on normals for 1971-2000). Also, the 36 variables of cloudiness are expressed in mean numbers of clear, partly cloudy and cloudy days per month, and not in relative percentage as in the previous example. Thus, the variables used in this example relfect the paramters of various dimensionality: percentage, days, temperature, including negative values, and inches.

While a detailed discussion of the obtained result is not within the scope of this brief presentation, logicality of the clustering decision could be assessed as valid if the cities of one state are attributed to same or proximate clusters (in this clustering analysis, number of nodes was four). As is seen in Fig. 20, more than 34% of all the cities of each appear to be in same clusters (such cities are marked with red tags). In the dendrogram, violet tags are used to highlight the instances when 50% or more of the cities under study). Magenta tags correspond to cases where more than 75% but less than 100% of the cities of one state are attributed to a same cluster (21% of all the cities). Yellow tags indicate cases when a state is represented in the dataset by only one city. As in seen in Fig. 16, only less than 17% of all the cities under study remain without colored tags. In the meantime, the sorting of same matrix without the NBI-transformation produced the result where only 4% of cities of one state fell into a same cluster - which could be interpreted as fortuity.

The above examples effectively demonstrate the NBI capability for intelligent data understanding through discovery of hidden correlations between analyzed objects, which none of other currently available high-dimensional clustering method does.

A unique advantage of the NBI-algorithm lies in its ability to find analogies by purposefully looking for them, rather than in the course of mechanical exhaustive checking as typically applied in computer search. In traditional multi-dimensional clustering methods, upon increase of a number of parameters, a search for similarities often leaves dissimilarities untouched, thus making the analysis similar to identification of fingerprinting. When parameters are grouped in accordance with aspects they reflect, as is done by the NBI-algorithm, it is easier to thoroughly compare respective groups of objects in the course of the following decision making.

The NBI-based software core being not larger than several hundreds kilobytes, it is well suited for embedded applications as an auxiliary tool for certain intermediary functions, e.g. data sorting by blocks of parameters. The resulting combination of software will thus represent a powerful instrument for intelligent data understanding based on a natural combination of inductive and deductive logic.

 
  NBI in Statistics next »

Copyright © 2000-2006 Equicom, Inc.
Legal Information | Contact Us