码迷,mamicode.com
首页 > 其他好文 > 详细

特征选择

时间:2016-08-10 22:22:09      阅读:139      评论:0      收藏:0      [点我收藏+]

标签:

# -*- coding: utf-8 -*-
"""
Created on Wed Aug 10 20:26:15 2016

@author: qqhfeng
"""

#模块1 VarianceThreshold 选择特征值
‘‘‘
Feature selector that removes all low-variance features. 
This feature selection algorithm looks only at the features (X), 
not the desired outputs (y), and can thus be used for unsupervised learning.

VarianceThreshold is a simple baseline approach to feature selection. 
It removes all features whose variance doesn’t meet some threshold.
By default, it removes all zero-variance features, i.e. 
features that have the same value in all samples. 
As an example, suppose that we have a dataset with boolean features, 
and we want to remove all features that are either one or zero (on or off) 
in more than 80% of the samples. Boolean features are Bernoulli random variables,
and the variance of such variables is given by
‘‘‘

from sklearn.feature_selection import VarianceThreshold
X = [[0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 1], [0, 1, 0], [0, 1, 1]]
#sel = VarianceThreshold(threshold=(.8 * (1 - .8)))
sel = VarianceThreshold()
print sel.fit_transform(X)




#模块2 选择最重要的 SelectKBest removes all but the k highest scoring features
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
iris = load_iris()
X, y = iris.data, iris.target
print X.shape
X_new = SelectKBest(chi2, k=2).fit_transform(X, y) #chi2是一种特征重要性评价方法
print X_new.shape



#模块3 递归特征消除法

 

特征选择

标签:

原文地址:http://www.cnblogs.com/qqhfeng/p/5758354.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!