Types of Feature Selection Methods

Friday. December 20, 2019 - 3 mins

Types of Feature Selection Methods

There are many types of feature selection methods. They are mainly put into three categories ^{[1] [2]} :

Filter methods : These techniques assign scores or rankings to available features independent of the classification method used. They rely on the intrinsic properties of the data to evaluate the features. In many instances, relevance scores are calculated and features with low scores are removed^[5]. Advantages include that filter methods are easily scalable to high dimensional datasets and need to be performed only once ^[5] . Disadvantages include the fact that the results of a filter method may not be the most optimal for the classifier model being used later. Also, many filter methods are univariate, which means they ignore feature dependencies ^[5] .
Wrapper methods : The quality of a feature subset is evaluated by an induction algorithm. For classification tasks, this could be a classifier like Decision Tree. A feature subset is generated, fed into the classifier and metrics are recorded (eg accuracy). Another subset is taken and it is fed again into the classifier. The feature subset with the highest performance is considered to be the optimal subset and the resulting model is considered the final model ^[3] .

Wrapper methods couple feature selection with the predefined classification model and therefore the selected subset of features is inevitably biased to the predefined classifier ^[3] .

Wrapper methods usually give better metrics (eg accuracy) than filter methods ^{[1] [3]} for that classifier since we aim to select a subset that maximises the quality for this classifier. This could also be considered a drawback of wrapper methods, since they have a higher risk of overfitting than filter techniques ^[5] .

Also, the final selected subset may not be the most optimal when tested over another classifier. The number of all possible subsets which could be generated for a given number of initial features grows exponentially with the number of features. This leads to wrapper methods being more computationally expensive and time consuming. The situation gets worse if building the classifier itself has a high computational cost ^[5] .
Hybrid methods : These approaches use some combination of filter and wrapper methods, to decrease the computational complexity of feature selection. Usually feature subsets are generated through some filter methods and then the subset quality is assessed using a classifier. Rather than using the classifier for every possible subset, some strategy is applied to decide which subsets should be tested. Liu et. al ^[4] suggest running some filter method over subsets of a given cardinality and the highest ranked feature subset is then fed into a classifier.The classifier only runs for a maximum number of available features times, since that is the highest cardinality that can be achieved for the feature subsets.

References :

[1] Ron Kohavi, George H. John Wrappers for feature subset selection, Artificial Intelligence 97 ( 1997) 273-324

[2] Avrim L. Blum, Pat Langley Selection of relevant features and examples in machine learning, Artificial Intelligence 97 ( 1997) 245-271

[3] Jiliang Tang, Salem Alelyani and Huan Liu Feature Selection for Classification: A Review

[4] Huan Liu and Lei Yu Toward Integrating Feature Selection Algorithms for Classification and Clustering

[5] Yvan Saeys, Iñaki Inza, Pedro Larrañaga A review of feature selection techniques in bioinformatics. Bioinformatics, Volume 23, Issue 19, 1 October 2007, Pages 2507–2517