the steps that may be taken to solve a feature selection problem：特征选择的步骤

时间：2015-08-12 21:48:59 阅读：127 评论：0 收藏：0 [点我收藏+]

参考：JMLR的paper《an introduction to variable and feature selection》

we summarize the steps that may be taken to solve a feature selection problem in a check list:

1. Do you have domain knowledge? If yes, construct a better set of “ad hoc” features.

2. Are your features commensurate（可以同单位度量的）? If no, consider normalizing them.

3. Do you suspect interdependence of features? If yes, expand your feature set by constructing conjunctive features or products of features（通过构建联合特征<应该是多个variables当做一个feature>或高次特征，扩展您的功能集）, as much as your computer resources allow you(see example of use in Section 4.4).

4. Do you need to prune（裁剪） the input variables (e.g. for cost, speed or data understanding reasons)? If no, construct disjunctive features or weighted sums of features（构建析取特征<应该是一个variables当做一个feature>或加权和特征） (e.g. by clustering or matrix factorization, see Section 5).

5. Do you need to assess features individually（单独评估每个feature） (e.g. to understand their in?uence on the system or because their number is so large that you need to do a ?rst ?ltering)? If yes, use a variable ranking method (Section 2 and Section 7.2); else, do it anyway to get baseline results.

6. Do you need a predictor? If no, stop.

7. Do you suspect your data is “dirty” (has a few meaningless input patterns and/or noisy outputs or wrong class labels)? If yes, detect the outlier examples using the top ranking variables obtained in step 5 as representation; check and/or discard them（注意：这里的them是example的意思，不是feature。。。）.

8. Do you know what to try ?rst? If no, use a linear predictor. Use a forward selection method(Section 4.2) with the “probe” method as a stopping criterion (Section 6) or use the L0-norm embedded method (Section 4.3). For comparison, following the ranking of step 5, construct a sequence of predictors of same nature using increasing subsets of features. Can you match or improve performance with a smaller subset? If yes, try a non-linear predictor with that subset.

9. Do you have new ideas, time, computational resources, and enough examples? If yes, compare several feature selection methods, including your new idea, correlation coef?cients, backward selection and embedded methods (Section 4). Use linear and non-linear predictors. Select the best approach with model selection (Section 6).

10. Do you want a stable solution (to improve performance and/or understanding)? If yes, sub-sample your data and redo your analysis for several “bootstraps” (Section 7.1)

the steps that may be taken to solve a feature selection problem：特征选择的步骤

标签：特征选择特征选择的步骤 feature selection

原文地址：http://blog.csdn.net/mmc2015/article/details/47449765

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行