• No results found

UNIVERSITY OF IBADAN LIBRARY

87

The summary of datasets used in this study is presented in Table 3.1. It showed the datasets with the number of attributes, the number of classes they contain and the percentage of minority class.

UNIVERSITY OF IBADAN LIBRARY

88 Table 3.1 Summary of datasets

Datasets Attributes Number of classes

%Minority class

DM 19 3 2

SSS Result 8 4 4

TB 13 4 0.79

CM 20 5 7

UNIVERSITY OF IBADAN LIBRARY

89 3.6.1.2 RIPPER

This class implements a propositional rule learner, Repeated Incremental Pruning to Produce Error Reduction (RIPPER), which was proposed by Cohen (1995) as an optimized version of Incremental Reduced Error Prunning (IREP). The number of folds used was 3.

The minimum number total weight of the instances in a rule was 2, the number of optimization runs was 2, the seed used for randomizing the data was set to 1 and it was un-pruned.

3.6.1.3 Decision Tree

This was used for generating a pruned or un-pruned C4.5 decision tree with a Confidence Factor of 0.25. One of the reasons to avoid pruning is that most pruning scheme attempt to minimize the overall error rate. These pruning schemes can be detrimental to the minority class, since reducing the error rate in the majority class, which stands for most of the examples, would result in a greater impact over the overall error rate (Batista et al., 2004, Zadrozny and Elkan, 2001, Chawla, 2003). The minimum number of instances per leaf was 2 and the number of folds was 3. The number of seed used for randomizing the data when reduced-error pruning was used was 1. The tree was not pruned, Laplace smoothing and MDL correction was used. Hence, for this configuration, the decision tree used was C4.4, a variant of C4.5.

3.6.1.4 K-Nearest Neighbours classifier (1B3)

K-Nearest Neighbours classifier where k = 3 is the number of nearest neighbour (Aha et al., 1991). Hold-one-out cross-validation was used to select this k value. The nearest neighbour search algorithm used was neighboursearch.Linear NNSearch based on distance weighting method.

3.6.1.5 REPTree

This is a fast decision tree learner that built a decision/regression tree using information gain/variance and pruneed it using reduced-error pruning (with backfitting). Missing values are dealt with by splitting the corresponding instances into pieces. The maximum tree depth was set to -1 for no restriction, the minimum total weight of the instances in a leaf was 2 with no Pruning, and the number of folds was set to 3 while the seed used for randomizing the data was set to 1.

UNIVERSITY OF IBADAN LIBRARY

90 3.6.1.6 Support Vector Machine (SVM)

This implementation globally replaced all missing values and transformed nominal attributes into binary ones (Platt, 1998). It also normalized all attributes by default. (The coefficients in the output are based on the normalized data, not the RAW DATA data as this is important for interpreting the classifier.) Multi-class problems are solved using pairwise classification (one-vs-one). The complexity parameter was set to 1. The epsilon for round-off error was set to 1.0E-12. The kernel used was kernel Polykernel and the number of folds for the cross-validation that used to generate the training data for logistic models was set to 1.

3.6.1.7 MultiLayerPerceptron (MLP)

This classifier used back-propagation to classify instances. The nodes in this network are all sigmoid. This divided the starting learning rate by the epoch number, to determine what the current learning rate should be. The number of hidden layers of the neural network used here was 1. The learning Rate was set to 0.3 while the momentum was set to 0.2. The seed used to initialize the random number was set to 0. The TrainingTime, which is the number of epochs to train through, was set to 500, the percentage size of the validation set was set to 10 and the validation Threshold used to terminate validation testing was set to 20.

3.6.1.8 Multiple Class Classifier

This is a meta classifier for handling multiple class datasets with 2-class classifiers. This classifier is also capable of applying error correcting output codes for increased accuracy.

The random number seed used is 1. The base classifier used is an unpruned decision tree with Laplace Smoothing and Minimum (MDL) correction. The decomposition method used for transforming the multi-class problem into several 2-class ones was one – against – all (OVA).

3.6.1.9 RandomCommittee

This an ensemble of randomizable base classifiers (Random Tree). Each base classifier was built using a different random number seed (but based on the same data). The final prediction was a straight average of the predictions generated by the individual base classifiers. The base classifier used was random tree. The number of iterations was 10 on 1 and the random number seed.

UNIVERSITY OF IBADAN LIBRARY

91 3.6.1.10 Random Forest

This is an ensemble of random trees for constructing a forest trees by using bootstrap samples of training data. This algorithm was used with all its default value set. The maximum depth of the trees set to 0 for unlimited. The number of trees to be generated was set to 10 and the random number seed to be used was set to 1.

3.6.1.11 Random Subspace (Decision Forest)

This ensemble method constructs a decision tree based classifier that maintained highest accuracy on training data and improved on generalization accuracy as it grows in complexity. The classifier consists of multiple trees constructed systematically by pseudo randomly selecting subsets of components of the feature vector, that is, trees constructed in randomly chosen subspaces. The base classifier used is decision tree (C4.4). The number of iterations performed was 10 and the random number seed used was 1. The size of each subspace was 0.5.

3.6.1.12 Stacking

This is an ensemble of classifiers that combine four different classifiers using the stacking method. The base classifiers were arranged to form a heterogeneous ensemble in this order:

RIPPER, Decision tree, 1B3, Support Vector Machine and MultilayerPerceptron. The meta classifier used was the decision tree, un-pruned and Laplace smoothing with a seed number of 1 and 10 folds for cross-validation.

3.6.1.13 Bagging

This is an ensemble method for bagging a classifier to reduce variance. The size of each bag (as a percentage of the training set size) was set to 100 and the base classifier was decision tree (C4.4). 10 iterations were performed on the dataset with 1 random number seed.

3.6.1.14 Boosting (AdaBoostM1)

This is an ensemble for boosting a nominal class classifier using the AdaboostM1 method.

Only nominal class problems can be tackled. Often dramatically improves performance, but sometimes over fits. Decision tree (C4.4) was used as the base classifier. 10 iterations

UNIVERSITY OF IBADAN LIBRARY

92

were performed with 1 random number seed. Weight threshold for weight pruning was set to 100.

3.6.2 Ten - fold Cross Validation

Ten-fold cross validation was used for training the datasets. All the datasets were all divided randomly into ten parts in which the class is represented in approximately the same proportion as in the full dataset. Each part is held out in turn and the learning scheme trained on the remaining nine – tenths; then its error rate is calculated on the holdout set.

Thus, the learning procedure is executed a total of 10 times on different training sets (each set has a lot in common with the others). Finally, the ten error estimates are averaged to yield an overall error estimate.