Feature selection with embedded approaches br component xij
2.3. Feature selection with embedded approaches
component xij represents the value of a given feature fj for that
In this work we evaluate two embedded feature ranking strate-
Consider now a feature ranking algorithm that leads to a rank-
gies based on Random Forests and SVM, respectively.
ing vector r with components
SVM with Recursive Feature Elimination (SVM-RFE).
tribution to the precision of the model while it is being created.
This increases the performance in terms of time compared with
Consider also a top-k list as the outcome of a feature selection the wrapper techniques.
Random Forests (RF)
(2) Every APTSTAT3-9R of the multiple decision trees that conforms the RF
where 1 indicates the presence of a feature and 0 the absence and is a condition over a single feature. Taking into account the per-
formance of the nodes, a ranking of features can be easily created
=Feature selection techniques usually generate a full ranking of
features. These rankings, however, can be converted in top-k lists
that contain the most important k features.Converting a ranking 2.4. Stability of feature selectors
output into a feature subset is easily conducted according to
In the context of classification, the feature selection or ranking techniques can be basically organized into three categories [6,17]: filter, wrapper and embedded approaches. The filter methods rely on general characteristics of the training data to rank the features ac-cording to some metric without involving any learning algorithm. The wrapper approaches incorporate the interaction between the feature selection process and the classification model, in order to determine the value of a given feature subset. Finally, in the em-bedded techniques, the feature search mechanism is built into the classifier model and are therefore specific to a given learning al-gorithm. The ranking methods studied in this work are briefly de-scribed next.
2.1. Feature selection with filters
Within this category, we consider the well-known Relief al-gorithm and the simple Pearson correlation coe cient that has proven to be very effective, even though it does not remove fea-ture redundancy .
The basic idea of the Relief algorithm is to reweigh features ac-cording to their ability to distinguish examples of the same and different classes that are near to each other .
Pearson correlation coe cient
This method looks at how well correlated each feature is with the class target. If a feature is highly correlated with one of the classes, then we can assume that it is useful for classification pur-poses .
2.2. Feature selection with wrapper approaches
Wrapper methods use the performance of a learning algorithm to assess the usefulness of a feature set. Either they iteratively discard features with the least discriminant power or they add the best features according to model performance . However, wrapper approaches are more computationally intensive than filter methods.
In this work, we evaluate two wrapper approaches that mea-sure importance of a feature set based on the performance of a Support Vector Machine an a Neural Network with a MultiLayer An important property of a feature selection method is its sta-bility [18,26,34]. The fact that under small variations in the sup-plied data, the outcome of the feature selection technique varies (either a full ranked list or a top-k list), makes the conclusions de-rived from it unreliable.
Consider we run a feature ranking algorithm K times. Results can be gathered in a matrix A with elements rij with i = 1, . . . , p and j = 1, . . . , K that indicate the rank assigned in the run-j for feature-i. The same applies to a feature selector.
In general, stability is quantified as follows: Given a set of rank-ings (subsets), pairwise similarities are computed and then, re-duced to a single metric by averaging. These (scalar) metrics can be seen as projections to one dimensional space and its use only shows where the feature selector stands in relation to the stable and the random ranking algorithm. In this paper, we also want to illustrate and motivate the use of graphical methods as a simple al-ternative approach to evaluate the stability of feature ranking algo-rithms. We will show how the projection to two dimensions allow the evaluation of the similarity between feature ranking algorithms as well as their stability.