Complementary information for the paper published in IEEE Trans. Fuzzy Systems

NMEEF-SD: Non-dominated Multiobjective Evolutionary Algorithm for Extracting Fuzzy Rules in Subgroup Discovery


A non-dominated multiobjective evolutionary algorithm for extracting fuzzy rules in subgroup discovery (NMEEFSD) is described and analyzed in this paper. This algorithm, which is based on the hybridization between fuzzy logic and genetic algorithms, deals with subgroup-discovery problems in order to extract novel and interpretable fuzzy rules of interest, and the evolutionary fuzzy system NMEEF-SD is based on the well-known Nondominated Sorting Genetic Algorithm II (NSGA-II) model but is oriented toward the subgroup-discovery task using specific operators to promote the extraction of interpretable and high-quality subgroup-discovery rules. The proposal includes different mechanisms to improve diversity in the population and permits the use of different combinations of quality measures in the evolutionary process. An elaborate experimental study, which was reinforced by the use of nonparametric tests, was performed to verify the validity of the proposal, and the proposal was compared with other subgroup discovery methods. The results show that NMEEF-SD obtains the best results among several algorithms studied.

 

 

IV. Experimental Study

In this experimental study, the aim was to analyze which combinations of quality measures used in the evolutionary process of NMEEF-SD offer better results and to compare the performance of the algorithm with other SD algorithms (both evolutionary and non-evolutionary). Therefore, we first studied the behavior of the NMEEF-SD algorithm with respect to the use of different combinations of quality measures within the evolutionary process.

The best combination was then compared with other evolutionary and classical SD algorithms. The experimentation was undertaken with real datasets from UCI repository. The properties of these datasets are presented in Table II: number of variables (nv), number of discrete variables (nvD), number of continuous variables (nvC), number of classes of the dataset (nc), and number of examples (ns).

 

Properties of the data sets used from the UCI repository (DOWNLOAD)
Name nv nvD nvC nc ns
Appendicitis 7 0 7 2 106
Australian 14 8 6 2 690
Balance 4 0 4 3 625
Breast-w 9 9 0 2 699
Bridges 7 4 3 2 102
Bupa 6 0 6 2 345
Car 6 6 0 4 1728
Chess 36 36 0 2 3196
Cleveland 13 0 13 5 303
Dermatology 33 33 0 6 366
Diabetes 8 0 8 2 768
Echo 6 1 5 2 131
German 20 13 7 2 1000
Glass 9 0 9 6 214
Haberman 3 0 3 2 306
Hayesroth 4 4 0 3 132
Heart 13 6 7 2 270
Hepatitis 19 13 6 2 155
Hypothyroid 25 18 7 2 3163
Ionosphere 34 0 34 2 351
Iris 4 0 4 3 150
Led 7 0 7 10 500
Lymp 18 18 0 4 148
Marketing 13 13 0 10 8993
Mushrooms 22 22 0 2 8124
Nursery 8 8 0 5 12960
Tic-tac-toe 9 9 0 2 958
Vehicle 18 0 18 4 846
Vote 16 16 0 2 435
Wine 13 0 13 3 178

 

 

IV.B. Quality measures analysis

The complete results table can be found below:

 

 

IV.C. Comparison of the existing evolutionary algorithms for subgroup discovery

The complete results table can be found below:

 

 

IV.D. Comparison of NMEEF-SD and the classical subgroup discovery algorithms

The complete results table can be found below:

 

 

Results comparison obtained with/without the use of the Re-initialisation based on coverage operator

The complete results table can be found below: