The text below provides a brief synopsis of the history of linear oracles for classification and the Principal Direction Linear Oracle (PDLO) and Random Spherical Linear Oracle (RSLO), which we introduced for ensemble classification of DNA microarray-based gene expression data.
Historically, linear oracles for classification originated during the development of decision tree classifiers. Hyperplane splits first appeared in oblique decision trees in the form of axis-parallel splits used in CART-LC (Breiman et al, 1984) and later in OC1 (Murthy et al, 1994). Random linear oracles were recently applied by Kuncheva and Rodriguez to 35 UCI data sets using a variety of classification methods including Adaboost, bagging, multiboost, random subspace, and random forests (Kuncheva and Rodriguez, 2007). In their study of classifier ensembles, random hyperplane splits were used in which 2 points were randomly selected and the perpendicular vector at the midpoint between the 2 points was used as a reference for the hyperplane. Superior results were obtained for the random linear oracle when compared with the routine uses of various bagging and boosting forms of decision tree methods.
In the Kuncheva and Rodriguez study of random linear oracles, feature values were normalized to the range [0,1] before a hyperplane split was used to assign training and test samples to the 2 "miniclassifiers." For DNA microarrays, there are two reasons why the normalization strategy, i.e., range [0,1], would be problematic. First, it is well known that features values for microarrays are often strongly log-normally distributed. Therefore, it would not be advantageous to normalize the log-normally distributed features into a range of [0,1], since this would result in low and high density regions in sample space causing problems for hyperplane cuts. Second, if there are indeed clusters of samples in the underlying feature space, then regions between the clusters will introduce sparse areas in a feature space normalized into the range [0,1]. When applying hyperplane splits needed for oracle assignment of samples to the miniclassifiers, there would indeed be cuts that result in non-uniform distribution of samples sent to the miniclassifiers.
In light of the above drawbacks for feature normalization in the range [0,1], I developed and introduced the principal direction linear oracle (PDLO) as a way of first standardizing features, using principal components analysis on the correlation matrix R, and normalizing the principal component scores (by dividing each eigenvector element by its eigenvalue before taking the sum product of input z-scores and eigenvector elements) in order to obtain a more spherically-shaped feature space prior to hyperplane cuts. Another main advantage of PDLO is that once the normalized PC scores have been determined for, say, the first three largest eigenvalues of the standardized feature space, once can simply select one of the X-, Y-, or Z-axes and perform a 3D rotation of PC scores about the axis. The axis of rotation can be selected from the 3 possible axes (X,Y,Z) in a fixed manner or randomly by using 2*U(0,1)+1. The angle of rotation (theta) about the axis can also be selected randomly using the relationship 2*pi*U(0,1). Last, one of the most important capabilities of PDLO is that after a rotation of PC scores about an axis, the hyperplane split is based on positive and negative(zero) values of the newly transformed PC score values. Thus positive values can be sent to one subclassifier, and negative(zero) to he other for training and testing.
The random spherical linear oracle (RSLO) is essentially a knock-off of PDLO, except that it employs a random hyperplane split based on the dot product of all the (unrotated) PC scores and a the vector of direction cosines (theta_X, theta_Y, theta_Z) for one randomly selected sample.
Thus far, we have studied the effects of feature standardization, feature fuzzification, rotation, iterations, number of cross-validation folds, and different base classifiers on PDLO and RSLO performance, which to date this has eluded systematic investigation.
L. Breiman, J. Friedman, R. Olshen, C. Stone. Classification and Regression Trees. Boca Raton(FL), Chapman & Hall(CRC), 1984.
S.K. Murthy, S. Kasif, and S. Salzberg. A system for induction of oblique decision trees. Journal of Artificial Intelligence Research. 2, 1-33, 1994.
L.I. Kuncheva, J.J. Rodriguez. Classifier Ensembles with a Random Linear Oracle. IEEE Transactions on Knowledge and Data Engineering. 19(4) 500-508, 2007.
J.J. Rodriguez, L.I. Kuncheva. Naive Bayes ensembles with a random oracle. Lecture Notes in Computer Science. 4472: 450-458, 2007.
L.I. Kuncheva, C.J. Whitaker. Measures of diversity in classifier ensembles. Machine Learning 2003;51:181-207.
L.I. Kuncheva. That Elusive Diversity in Classifier Ensembles. Lecture Notes in Computer Science 2003;2652:1126-38.
M. Skurichina, L.I. Kuncheva, R.P.W. Duin. Bagging and boosting for the nearest mean classifier: effects of sample size on diversity and accuracy. Lecture Notes in Computer Science 2002;2364:62-71.