NMEEF-SD: Non-dominated Multiobjective Evolutionary Algorithm for Extracting Fuzzy Rules in Subgroup Discovery
A non-dominated multiobjective evolutionary algorithm for extracting fuzzy rules in subgroup discovery (NMEEFSD) is described and analyzed in this paper. This algorithm, which is based on the hybridization between fuzzy logic and genetic algorithms, deals with subgroup-discovery problems in order to extract novel and interpretable fuzzy rules of interest, and the evolutionary fuzzy system NMEEF-SD is based on the well-known Nondominated Sorting Genetic Algorithm II (NSGA-II) model but is oriented toward the subgroup-discovery task using specific operators to promote the extraction of interpretable and high-quality subgroup-discovery rules. The proposal includes different mechanisms to improve diversity in the population and permits the use of different combinations of quality measures in the evolutionary process. An elaborate experimental study, which was reinforced by the use of nonparametric tests, was performed to verify the validity of the proposal, and the proposal was compared with other subgroup discovery methods. The results show that NMEEF-SD obtains the best results among several algorithms studied.
IV. Experimental Study
In this experimental study, the aim was to analyze which combinations of quality measures used in the evolutionary process of NMEEF-SD offer better results and to compare the performance of the algorithm with other SD algorithms (both evolutionary and non-evolutionary). Therefore, we first studied the behavior of the NMEEF-SD algorithm with respect to the use of different combinations of quality measures within the evolutionary process.
The best combination was then compared with other evolutionary and classical SD algorithms. The experimentation was undertaken with real datasets from UCI repository. The properties of these datasets are presented in Table II: number of variables (nv), number of discrete variables (nvD), number of continuous variables (nvC), number of classes of the dataset (nc), and number of examples (ns).
Properties of the data sets used from the UCI repository (DOWNLOAD) |
|||||
Name | nv | nvD | nvC | nc | ns |
Appendicitis | 7 | 0 | 7 | 2 | 106 |
Australian | 14 | 8 | 6 | 2 | 690 |
Balance | 4 | 0 | 4 | 3 | 625 |
Breast-w | 9 | 9 | 0 | 2 | 699 |
Bridges | 7 | 4 | 3 | 2 | 102 |
Bupa | 6 | 0 | 6 | 2 | 345 |
Car | 6 | 6 | 0 | 4 | 1728 |
Chess | 36 | 36 | 0 | 2 | 3196 |
Cleveland | 13 | 0 | 13 | 5 | 303 |
Dermatology | 33 | 33 | 0 | 6 | 366 |
Diabetes | 8 | 0 | 8 | 2 | 768 |
Echo | 6 | 1 | 5 | 2 | 131 |
German | 20 | 13 | 7 | 2 | 1000 |
Glass | 9 | 0 | 9 | 6 | 214 |
Haberman | 3 | 0 | 3 | 2 | 306 |
Hayesroth | 4 | 4 | 0 | 3 | 132 |
Heart | 13 | 6 | 7 | 2 | 270 |
Hepatitis | 19 | 13 | 6 | 2 | 155 |
Hypothyroid | 25 | 18 | 7 | 2 | 3163 |
Ionosphere | 34 | 0 | 34 | 2 | 351 |
Iris | 4 | 0 | 4 | 3 | 150 |
Led | 7 | 0 | 7 | 10 | 500 |
Lymp | 18 | 18 | 0 | 4 | 148 |
Marketing | 13 | 13 | 0 | 10 | 8993 |
Mushrooms | 22 | 22 | 0 | 2 | 8124 |
Nursery | 8 | 8 | 0 | 5 | 12960 |
Tic-tac-toe | 9 | 9 | 0 | 2 | 958 |
Vehicle | 18 | 0 | 18 | 4 | 846 |
Vote | 16 | 16 | 0 | 2 | 435 |
Wine | 13 | 0 | 13 | 3 | 178 |
IV.B. Quality measures analysis
The complete results table can be found below:
IV.C. Comparison of the existing evolutionary algorithms for subgroup discovery
The complete results table can be found below:
IV.D. Comparison of NMEEF-SD and the classical subgroup discovery algorithms
The complete results table can be found below:
Results comparison obtained with/without the use of the Re-initialisation based on coverage operator
The complete results table can be found below: