8. NBI in statistics
  Search Engine: A Displacement Cluster Technique next »


The NBI-algorithm represents a highly sensitive method for statistical data processing. A holistic approach to a logically interconnected set of data is a major advantage of the NBI technology with its numerous statistical tools and techniques that can be easily modified or adjusted to meet a user’s Figure 22needs in any specific area. A relatively simple example shown below is quite self-explanatory in the context of the potentials of the NBI-algorithm. Fig. 21 is a table of data on alcohol consumption in liters per capita (age 15+) in selected years in the countries-constituents of Organization for Economic Co-operation and Development (source: OECD Health Data 2001 Table 20: Alcohol consumption, liters per capita (15+), http://www.oecd.org/xls/M00019000/M00019682.xls). As is seen from the table, while certain tendencies for individual countries can be traced, it is hard to say if there is any overall trend that would be common for the group of the listed countries as a whole. However, it takes seconds for the NBI system to find out whether or not such an overall trend. The tree shown in Fig. 22 demonstrates that there indeed exists a visible trend, displaying a certain anomaly in 1997-98.

Fig. 23 illustrates the result obtained by applying another NBI-based technique, which corroborates with the previous one. In this analysis, the data processing involves reduced similarity coefficients obtained by partial transformation of the similarity matrix computed for the data in the table shown in Fig. 21. The partially transformed coefficients of similarities with the 1985 and 1995 values are laid off as abscissa and ordinate, respectively. The obtained plot shows that during the period of 1985 to 1995 there had been regular unidirectional changes in the alcohol consumption tendency, which, however, had reversed starting with 1996. The results demonstrated in Fig. 22 and 23 unequivocally point to the existence of certain regular dynamics, as well as a certain deviation in 1997-98, in the overall alcohol consumption trend in the group of 30 countries. BasedFigure 23 on this finding, an involved researcher may direct the investigation of the causes of this phenomenon accordingly. For instance, it is known that 1996 was a year of extremely low solar activity and a start of the 23rd solar cycle. There may be other factors explaining the discovered trend. The NBI-algorithm provides a number of other techniques as well that allow for integrated analysis of the problems of this kind.

Figure 24The following example pertains to economic data and is more complex than the previous one. Fig. 24 represents a table of data on comparative price levels in the countries-constituents of Organization for Economic Co-operation and Development (OECD free publication: Purchasing Power Parities. Comparative Price Levels: http://www.oecd.org/pdf/M00009000/M00009349.pdf). The figures in the table reflect the numbers of specified currency units needed in each of the listed countries to buy the same representative basket of consumer goods and services. There is no doubt that the numbers presented in this table represent the higher level statistical information that contains in itself the quintessence of the economic peculiarities of each country and economic relationships between the listed countries, including trade, people exchange, service import and export, etc. To extract this type of hidden information, the NBI employs various techniques. We will briefly discuss only three of them.

Figure 25The first one is a hierarchical tree (see Fig. 25) derived from the data shown in the table in Fig. 24. As is seen, the left portion of the tree includes the countries in which a relative cost of life is more or less the same. The closer are the countries to the same hierarchical level, the more similar are the respective costs of life. For instance, there is a node that includes 10 countries of Central Europe and Scandinavia (shown in a separate window in Fig. 25) that have comparative price levels close to 100. Alternatively, the right portion of the tree corresponds to the countries that by comparative price level values stand apart from the rest of the countries.

Figure 26The second technique – a so-called homological analysis – allows a detection of relationships of a finer structure than those reflected by a hierarchical tree. Fig. 26 demonstrates the relations between the partially transformed similarity coefficients (PTSC). Typically, homological analyses reveal three types of equations describing positions of points in two-dimensional PTSC plots: (A) a product of PTSC values equals a certain k-constant; (B) a ratio between PTSC values equals k; and (C) a ratio between PTSC values equals 1/k. In Fig. 22, the partially transformed coefficients of similarity with Finland are plotted versus partially transformed coefficients of similarity with Denmark, Italy, Norway, and Portugal. In the first of the four cases shown in Fig. 26, Finland and Denmark, located at the same node of the hierarchical tree (see Fig. 25), show a very high similarity: the k-constant is close to 1. The sections described by the dependencies of the A, B, and C type are most clearly seen on the Finland – Italy plot. Among the selected four pairs of countries, the Finland – Portugal pair shows the most noticeable difference. As is seen, a homological analysis is a sensitive method for discovery of a fine structure of relationships based on similarities and dissimilarities between objects of a statistical study.

Figure 27The third technique demonstrated below deals with assessment of stability of a data continuum, i.e. with finding out how much the stability of an obtained statistical picture depends on each contributing object. To explain how this approach works, we will change the value corresponding to the Canada - Japan cell in the above shown table of comparative price levels by increasing it 1.5 times and run a new analysis. A newly obtained hierarchical tree appears to have the same configuration of its right branch, where Japan is located, as the tree derived for the original set of data. However, the left portion of the new tree displays significant changes as is seen by comparing the left portions of the trees in Fig. 26 and 27. A similar effect was obtained by changing the value for the Canada – Czech Republic pair. A 1.5-fold change of the value for the Canada – Mexico pair does not cause any modification of the hierarchical tree, whereas the same change of the value for the Canada – United States pair significantly affects both right and left portions of the tree. Indexes of significance of individual objects for a given community of objects, as well as various other indicators can be determined by other special techniques provided by the NBI. A focus of this type analysis depends on a particular investigative goal.

The NBI approach to statistical analysis is principally different from the traditional statistical methods. Most importantly, the difference lies in resolution capacity and exhaustiveness of information discovery. However, there is no contradiction between the two approaches and both can effectively complement each other.

Even when used simply for visualization of discovered knowledge, ENBI gives new insight into underlying data and appears to be an extremely helpful analytical tool in medical and biological studies, pharmaceutical trials, biometrics, etc. Fast visual evaluation of similarities and dissimilarities between many dozens of objects and variables is technically unattainable by most of other methods. The following examples demonstrate the ENBI technique for visualization of discovered knowledge based on cancer statistics data.

Figure 28Fig. 28 shows a table on age distribution (%) of cancer incidence cases by site (organ) in 1995-99, for eight age groups, all races, both sexes (data source: National Cancer Institute, http://seer.cancer.gov/csr/1973_1999/sections.html). To make the data more comparable, breast and genital organs cancer cases were excluded (the same applies to the next example demonstrated in Fig. 30 and 31). On the scattered plot shown in Fig. 29, the X-axis reflects the similarities between distribution of cancer of various sites and that of esophagus cancer;Figure 29 and the Y-axis shows the correlation between cancer cases of various sites and colon cancer. The dark-green dots correspond to the age interval from 20 to 54 year-old, and the light-green squares, from 55 to 85 and older. As is seen, the correlations between distribution of various forms of cancer in the two groups are remarkably different. The difference is distinctly observed in the process of matrix transformation as demonstrated in the course of 10 transformations shown in Fig. 29. Both the dynamics of changes in similarities and the clustering processes occur differently in the two matrices. The same tendency is naturally displayed upon comparison of any of the age groups specified in the table shown in Fig. 28. Similarities in distribution of different cancer cases in the older age groups are much less scattered as compared with the younger age groups.

Figure 30The same technique is demonstrated on another example of analysis of cancer statistics data, Estimated New Cancer Cases and Deaths for 2002, available from National Cancer Institute, http://seer.cancer.gov/csr/1973_1999/sections.html (Fig. 30). We analyzed how the statistics of occurrence of cancer of various sites correlate with the statistics of incidence of esophagus and colon cancer cases. Male and female groups were analyzed based on two parameters: estimated new cases and estimated deaths. Fig. 31 demonstrates how dramatically different is distribution of similarities in the male and female groups, as well as the clustering processes during the matrix transformation in each case.

Figure 31The latter two examples (Fig. 28 – 31) demonstrate the anatomy of the data diagnostic technique based on the ENBI method, which is extremely effective in fast and accurate assessment of similarities and dissimilarities between complex patterns. The accuracy of assessment is enhanced by inclusion of matrix transformation dynamics into similarity criteria.

Figure 32In itself, a matrix transformation (and, accordingly, tree derivation) is based on statistical processing of input data and yields a statistically generalized picture. Thus obtained output data give a much sharper pictorial view than respective input data and facilitate better information retention and decision-making. The following example illustrated by Fig. 32 is based on selected results from The Los Angeles Times National Poll shown in Table 1 below. (The selected data from Study # 443, July 31, 2000, available at: http://www.latimes.com, are reproduced by courteous permission of Ms. Susan Pinkus, Director of The Los Angeles Times Poll.)

Fig. 33 is an illustration of a tree derived from the data shown in Fig. 32. The tree provides with a perfect visualization of disposition of priorities among the different political strata in the American society. The tree gives a snapshot that can be momentarily grasped and stays in memory, while the table data are hard to expressly memorize and their overview may leave one with subjective interpretation results.

Figure 33The proposed method for statistical data processing not only is remarkable for objectivity and ultimate graphicness, but it also provides for extraction of information at a depth that is beyond the reach of other statistical methods. A data table shown on Fig. 34 reflects alcohol drinking statuses of population groups, including the data on parallel groups in each gender. The NBI processing results shown on Fig. 35 through 48 speak for themselves to an extent that makes it unnecessary to point out how scant the original data table information seems to be prior to NBI analysis. Even a detailed statistical analysis with the use advanced modern mathematical approaches cannot provide for the correlations demonstrated below. The tree nodes corresponding to male and female groups, when treated as graphs, are absolutely symmetrical in most of the fourteen correlations. Taking into consideration that the input numbers for male and female groups differ up to 1.5 - 2 times, symmetry of the respective nodes emphasizes the reliability and truthfulness of the conclusions drawn by the NBI analysis. This is convincingly demonstrated by the example of the area and race correlation shown on Fig. 35. The Fig. 35 tree has different node lengths but can be treated as absolutely symmetrical as far as the rendered meaning is concerned: the 'male' and 'female' halves have coincident configurations.



Figures 36 through 48 show unsystematized examples of other correlations that support or do not contradict generally known U.S.A. statistics. For instance, family incomes of $35,000 - 50,000 and $50,000 and over correlate with an education more than 12 years; family incomes of $20,000 - 25,000 and $25,000 - 35,000 correlate with an education of 12 years; and a family income of less than $20,000 correlates with an education under 12 years. A currently employed status correlates with: an education of 12 and more than 12 years, incomes of $20,000 - 25,000, $25,000 - 35,000 and $35,000 - $50,000, and an age of 30 - 44 years; but does not correlate with: an education under 12 years, incomes less than $20,000 and $50,000 and over, and ages of 45 - 64 years and 65 years and older (Fig. 42, 43, and 48). This case study performed by our method demonstrates the integrity of veracious social information, as well as the method's unparallel potentials in discovery of hidden information, which is particularly valuable in analysis of public opinion poll data.

 
  Search Engine: A Displacement Cluster Technique next »

Copyright © 2000-2009 Equicom, Inc.
Legal Information | Contact Us