码迷,mamicode.com
首页 > 其他好文 > 详细

【作业二】林轩田机器学习基石

时间:2015-06-23 19:33:00      阅读:297      评论:0      收藏:0      [点我收藏+]

标签:

作业一被bubuko抓取了,要是能注明转载就更好了(http://bubuko.com/infodetail-916604.html

作业二关注的题目是需要coding的Q16~Q20

Q16理解了一段时间,题目阐述的不够详细。理解了题意之后,发现其实很简单。

理解问题的关键是题目中给的‘s‘是啥意思:

(1)如果s=1,则意味着x>theta y预测为1,x<theta y预测为-1;

(2)如果s=2,则以为着x<theta y预测为1,x<theta y预测为1

想明白这个事情之后,直接分theta大于0,小于0讨论,s=1 or s=-1把几种情况分别写一下,再合一起就得到答案了。

技术分享

Q17~Q18题目的代码如下

#encoding=utf8
import sys
import numpy as np
import math
from random import *

# generate input data with 20% flipping noise
def generate_input_data(time_seed):
    np.random.seed(time_seed)
    raw_X = np.sort(np.random.uniform(-1,1,20))
    noised_y = np.sign(raw_X)*np.where(np.random.random(raw_X.shape[0])<0.2,-1,1)
    return raw_X, noised_y

def calculate_Ein(x,y):
    # calculate median of interval & negative infinite & positive infinite
    thetas = np.array( [float("-inf")]+[ (x[i]+x[i+1])/2 for i in range(0, x.shape[0]-1) ]+[float("inf")] )
    Ein = x.shape[0]
    sign = 1
    target_theta = 0.0
    # positive and negative rays
    for theta in thetas:
        y_positive = np.where(x>theta,1,-1)
        y_negative = np.where(x<theta,1,-1)
        error_positive = sum(y_positive!=y)
        error_negative = sum(y_negative!=y)
        if error_positive>error_negative:
            if Ein>error_negative:
                Ein = error_negative
                sign = -1
                target_theta = theta
        else:
            if Ein>error_positive:
                Ein = error_positive
                sign = 1
                target_theta = theta
    # two corner cases
    if target_theta==float("inf"):
        target_theta = 1.0
    if target_theta==float("-inf"):
        target_theta = -1.0
    return Ein, target_theta, sign


if __name__ == __main__:
    T = 1000
    total_Ein = 0
    sum_Eout = 0
    for i in range(0,T):
        x,y = generate_input_data(i)
        curr_Ein, theta, sign = calculate_Ein(x,y)
        total_Ein = total_Ein + curr_Ein
        sum_Eout = sum_Eout + 0.5+0.3*sign*(abs(theta)-1)
    print (total_Ein*1.0) / (T*20)
    print (sum_Eout*1.0) / T

迭代次数上偷懒了,用的1000次代替5000次的结果。

coding算法思路没有什么复杂的,主要在于学习了python numpy的一些操作(如numpy.where, numpy.sign, numpy.sort)。

具体的参考学习了讨论区的(https://class.coursera.org/ntumlone-002/forum/thread?thread_id=191)。

 

Q19~Q20的代码如下,

#encoding=utf8
import sys
import numpy as np
import math
from random import *

def read_input_data(path):
    x = []
    y = []
    for line in open(path).readlines():
        items = line.strip().split( )
        tmp_x = []
        for i in range(0,len(items)-1): tmp_x.append(float(items[i]))
        x.append(tmp_x)
        y.append(float(items[-1]))
    return np.array(x),np.array(y)

def calculate_Ein(x,y):
    # calculate median of interval & negative infinite & positive infinite
    thetas = np.array( [float("-inf")]+[ ( x[i]+x[i+1] )/2 for i in range(0, x.shape[0]-1) ]+[float("inf")] )
    Ein = x.shape[0]
    sign = 1
    target_theta = 0.0
    # positive and negative rays
    for theta in thetas:
        y_positive = np.where(x>theta,1,-1)
        y_negative = np.where(x<theta,1,-1)
        error_positive = sum(y_positive!=y)
        error_negative = sum(y_negative!=y)
        if error_positive>error_negative:
            if Ein>error_negative:
                Ein = error_negative
                sign = -1
                target_theta = theta
        else:
            if Ein>error_positive:
                Ein = error_positive
                sign = 1
                target_theta = theta
    return Ein, target_theta, sign

if __name__ == __main__:
    x,y = read_input_data("train.dat")
    # record optimal descision stump parameters
    Ein = x.shape[0]
    theta = 0
    sign = 1
    index = 0
    # multi decision stump optimal process
    for i in range(0,x.shape[1]):
        input_x = x[:,i]
        input_data = np.transpose(np.array([input_x,y]))
        input_data = input_data[np.argsort(input_data[:,0])]
        curr_Ein,curr_theta,curr_sign = calculate_Ein(input_data[:,0],input_data[:,1])
        if Ein>curr_Ein:
            Ein = curr_Ein
            theta = curr_theta
            sign = curr_sign
            index = i
    print (Ein*1.0)/x.shape[0]
    # test process
    test_x,test_y = read_input_data("test.dat")
    test_x = test_x[:,index]
    predict_y = np.array([])
    if sign==1:
        predict_y = np.where(test_x>theta,1.0,-1.0)
    else:
        predict_y = np.where(test_x<theta,1.0,-1.0)
    Eout = sum(predict_y!=test_y)
    print (Eout*1.0)/test_x.shape[0]

这个代码基本复用了calculate_Ein这个函数,只是对corner cases稍作修改。

其中有个地方遇到了些麻烦,就是要对输入的(xi,y)按照xi进行排序。

之前用过lambda表达式,在这里学习了一种numpy.argsort()的方法(http://blog.csdn.net/maoersong/article/details/21875705

根据数组的某一个dimension,对矩阵按行进行重新排序(原理是返回排序后的的航标,重新生成一个矩阵)

【作业二】林轩田机器学习基石

标签:

原文地址:http://www.cnblogs.com/xbf9xbf/p/4595990.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!