码迷,mamicode.com
首页 > 其他好文 > 详细

the steps that may be taken to solve a feature selection problem:特征选择的步骤

时间:2015-08-12 21:48:59      阅读:127      评论:0      收藏:0      [点我收藏+]

标签:特征选择   特征选择的步骤   feature selection   

参考:JMLR的paper《an introduction to variable and feature selection》


we summarize the steps that may be taken to solve a feature selection problem in a check list:


1. Do you have domain knowledge? If yes, construct a better set of “ad hoc” features.


2. Are your features commensurate(可以同单位度量的)? If no, consider normalizing them.


3. Do you suspect interdependence of features? If yes, expand your feature set by constructing conjunctive features or products of features(通过构建联合特征<应该是多个variables当做一个feature>或高次特征,扩展您的功能集), as much as your computer resources allow you(see example of use in Section 4.4).


4. Do you need to prune(裁剪) the input variables (e.g. for cost, speed or data understanding reasons)? If no, construct disjunctive features or weighted sums of features(构建析取特征<应该是一个variables当做一个feature>或加权和特征) (e.g. by clustering or matrix factorization, see Section 5).


5. Do you need to assess features individually(单独评估每个feature) (e.g. to understand their in?uence on the system or because their number is so large that you need to do a ?rst ?ltering)? If yes, use a variable ranking method (Section 2 and Section 7.2); else, do it anyway to get baseline results.


6. Do you need a predictor? If no, stop.


7. Do you suspect your data is “dirty” (has a few meaningless input patterns and/or noisy outputs or wrong class labels)? If yes, detect the outlier examples using the top ranking variables obtained in step 5 as representation; check and/or discard them(注意:这里的them是example的意思,不是feature。。。).


8. Do you know what to try ?rst? If no, use a linear predictor. Use a forward selection method(Section 4.2) with the “probe” method as a stopping criterion (Section 6) or use the L
0-norm embedded method (Section 4.3). For comparison, following the ranking of step 5, construct a sequence of predictors of same nature using increasing subsets of features. Can you match or improve performance with a smaller subset? If yes, try a non-linear predictor with that subset.


9. Do you have new ideas, time, computational resources, and enough examples? If yes, compare several feature selection methods, including your new idea, correlation coef?cients, backward selection and embedded methods (Section 4). Use linear and non-linear predictors. Select the best approach with model selection (Section 6).


10. Do you want a stable solution (to improve performance and/or understanding)? If yes, sub-sample your data and redo your analysis for several “bootstraps” (Section 7.1)




版权声明:本文为博主原创文章,未经博主允许不得转载。

the steps that may be taken to solve a feature selection problem:特征选择的步骤

标签:特征选择   特征选择的步骤   feature selection   

原文地址:http://blog.csdn.net/mmc2015/article/details/47449765

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!