1. Introduction
  Unsupervised Data Clustering next »


Clustering problems arise in various contexts and are inseparable from the problems of knowledge acquisition. Knowledge discovery and data mining cannot start from data alone. This apparent axiom has an especial emphasis when applied to present-day databases that have a tendency for indefinite expansion both in the number of objects (cases, observations) and the number of variables (features, dimensions, attributes). To initiate data mining, there has to be a certain connection between observations – either a task-oriented direction based on a pre-existing model as a global description of data, or the one supposed to be provided by unsupervised techniques capable of autonomous extraction, abstraction and visualization of informational contents of databases.

As information databases grow in volume and complexity, they require more complex and costly software products. State-of-the-art computer programs for statistical data processing offer a wide and constantly growing range of options for a data investigator. This tendency, although positive in general, has a negative aspect: the user has to make his choice of an optimal technique, hence assumes the responsibility for obtained results. To be capable of doing so, an efficient user must be familiar with hundreds of mathematical statistics terms, as well as many equations and techniques, has to be trained in the use of respective software products and attend workshops and seminars. This means that having qualified specialists on staff who are capable of being in charge of important but often quite utilitarian tasks requires investment that may be unaffordable even to a medium-size company. There is only way to cope with the flooding and devaluation of data which are so easily obtainable nowadays: the development of methods for unsupervised intelligent data understanding that would gradually become less complex in use and less costly.

Since the beginning of the computer science up to the present days, the "training" of computer systems has been focused on making them capable of logic deduction by equipping them with human knowledge, man-made rules, criteria, principles, assumptions and ideas considered valuable from the viewpoint of human cognition and experience as the only basis for their operational activity. The proposed intelligent data understanding system (U.S. Patent No. 6,640,227 issued October 28, 2003) represents the non-biological intelligence (NBI) model and software based on the use of a special algorithm whose "reasoning" displays inductive logic and, in principle, does not require task-specific pre-set operational instructions.

The E-NBI software is intended to provide a fully automated search for regularities in a dataset under processing, followed with providing a menu of options including: analysis report generation, graphical visualization means (plots, trees, dendrograms, and informograms), assessment of contributions of particular objects and variables into a system of data, as well as their compatibility, etc. NBI-produced decisions may be evaluated by a user in a manual mode and, upon user approval, submitted for a final report generation. In its essence, the E-NBI task is to conduct fully autonomous knowledge discovery without pre-set instructions and based on the system ideology. While the NBI logic may differ from a user’s logic, it undoubtedly acts as an independent expert in uncertainty elimination, with its own “objective” perspective of input data.

This presentation provides a concise overview of some of the features of the NBI technology represented by a totality of certain performance criteria that ensure feedback-based self-correction of the system’s reasoning. In other words, a choice of appropriate criteria of similarity/dissimilarity, way of clustering, screening and elimination of irrelevant parameters and outlier objects – all those intelligent operations are performed based on the criteria self-generated by the NBI-algorithm. Interactivity and non-linearity of evaluation ensure an output whose quality is comparable with a level of decisions made by a professional expert. The system also bears similarity with the human mind in the way it perceives information through concurrent processing of input data as a whole.

 
  Unsupervised Data Clustering next »

Copyright © 2000-2009 Equicom, Inc.
Legal Information | Contact Us