Feature Engineering vs Feature Selection

Friday. December 20, 2019 - 2 mins

Feature Engineering vs Feature Selection

Feature Engineering / Feature Extraction

Feature engineering involves possible changes in the available features. For example, one could want to create non linear features from a feature f available in the dataset. Other features such as : sin(f), cos(f), tan(f), log(f), e^f etc. could be generated from it.

Also multiple features can be combined to form new features. For instance :
f = f₁ + f₂ + f₃
f = f₁ ∗ f₂

Feature extraction approaches project features into a new feature space and the newly constructed features are usually combinations of original features. Examples of feature extraction techniques include Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) and Canonical Correlation Analysis (CCA) ^[1].
Feature (subset) Selection Feature selection refers to finding a subset of the original features of a dataset, such that an induction algorithm that is run on data containing only these features generates a classifier with the highest possible accuracy ^[2] .

The selection criterion may not always be based on maximizing accuracy, as is noted by Liu et al. ^[3] Their definition involves evaluating and comparing candidate subsets according to a certain evaluation criterion, till a given stopping criterion is reached. Then the selected best subset usually needs to be validated by prior knowledge or different tests via synthetic and/or real-world data sets ^[3] . Feature selection approaches aim to select a small subset of features that minimize redundancy and maximize relevance to the target such as the class labels in classification. Representative feature selection techniques include Information Gain, Relief, Fisher Score and Lasso ^[1] . In feature selection, we do not combine features, nor do we make new features from an existing set of available features.

As noted by Aleyani et al. ^[4] , both approaches are capable of improving learning performance, lowering computational complexity, building better generalizable models and decreasing required storage. However, feature selection is superior in terms of better readability and interpretability since it maintains the original feature values in the reduced space while feature extraction transforms the data from the original space into a new space, which cannot be linked to the features in the original space.Therefore, further analysis of the new space is problematic since there is no physical meaning for the transformed features obtained from feature extraction technique ^[4] .

References :

[1] Jiliang Tang, Salem Alelyani and Huan Liu Feature Selection for Classification: A Review

[2] Ron Kohavi, George H. John Wrappers for feature subset selection, Artificial Intelligence 97 ( 1997) 273-324

[3] Huan Liu and Lei Yu Toward Integrating Feature Selection Algorithms for Classification and Clustering

[4] Salem Alelyani, Jiliang Tang and Huan Liu Feature Selection for Clustering: A Review