The dataset used is this paper is taken from a paper by Cendrowska (1988) on the inductive analysis of a set of ophthalmic data. The problem is to infer the type of lens to be prescribed by determining relevant attributes of the client. The example, shown in Figure 4, is simple and the results well-defined which makes it a good test case for the basic techniques. However, the requirements and results are sufficiently complex for the test case to be non-trivial and to demonstrate many major features of the knowledge acquisition tools and their application.
Figure 4 Contact lens dataset
Cendrowska's paper argues against the utility of decision trees in developing expert systems, and for the use of modular rules. Her main arguments are that the use of trees produced by algorithms such as ID3 and its extensions do not produce effective rule sets for expert systems because:
An ID3 decision tree analysis of the contact lens dataset in Figure 4 is shown in Figure 5.
Figure 5 ID3 analysis of dataset
The ID3 tree may be converted to a set of rules as shown in Figure 6 by tracing the paths to each decision. Note that every one of these rules tests for tear production. Cendrowska notes that this is an expensive and time-consuming test, and that it is unnecessary in a number of significant cases. That is, if the decision rules of Figure 6 were transferred to the knowledge base of an expert system, the system would request data about some patients that was, in practice, unnecessary for it to reach correct conclusions.
Figure 6 Rules from ID3 analysis
Cendrowska proposes a new algorithm for rule induction called Prism which generates rules directly without going through decision trees. She uses the ophthalmic data to show that Prism solves this problem correctly, as shown in Figure 7, and satisfies her expressed requirements. That is, Figure 7 gives an alternative set of rules that are completely equivalent to those of Figure 6 but where the last three rules do not test for tear production. It should be noted that Quinlan (1987), the founder of the ID3 line of inductive techniques, has shown that the problems of redundant tests in rule sets may be overcome by post-processing decision trees. In particular, his techniques cope with noisy data whereas Prism can be used only with error-free data.
Figure 7 Prism analysis of dataset
The Induct algorithm used in KSS0 is based on one by Gaines (1989a) which generates rules directly and filters them statistically to cope with data that contains errors. It is able to reproduce Cendrowska's analysis of the ophthalmic data exactly. In addition it is able to produce the same analysis with extremely noisy data. It will also process data effectively in which many features are unknown, and this may be used to allow experts to enter rules directly as generalized cases. That is, a rule may be treated as a case in which some of the values of the attributes are unknown because they could have any value.
It is not trivial to transfer the rules which Cendrowska takes as a solution to the ophthalmic problem to an expert system in such a way that it has the behavior she requires, that is, it should not ask irrelevant questions. This means in practice that the last three rules in Figure 7 should be tested first. The appropriate control strategies to ensure this have to be transferred to the expert system also, and the Export tool in KSS0 does this in a generally applicable way using appropriate control mechanisms in the shell.
It is also non-trivial to transfer the rules in such a way that they are applicable to a wide range of problem types that may be not be defined at the time the knowledge acquisition takes place. One wishes to transfer the knowledge (which includes some control structures) to the expert system in such a way that it forms a re-usable module, not generate application-specific code. The Export tool does this using appropriate classes, objects and pattern-matching rules in the shell.
gaines@cpsc.ucalgary.ca 19-Sep-95