码迷,mamicode.com
首页 > 其他好文 > 详细

Linear Regression ----- Stanford Machine Learning(by Andrew NG)Course Notes

时间:2014-04-29 20:11:10      阅读:616      评论:0      收藏:0      [点我收藏+]

标签:des   com   http   class   blog   style   div   img   code   size   tar   

Andrew NG的Machine learning课程地址为:https://www.coursera.org/course/ml

在Linear Regression部分出现了一些新的名词,这些名词在后续课程中会频繁出现:

Cost Function Linear Regression Gradient Descent Normal Equation Feature Scaling Mean normalization
损失函数 线性回归 梯度下降 正规方程 特征归一化 均值标准化

 

 


Model Representation

  m: number of training examples

  x(i): input (features) of ith training example

  xj(i): value of feature j in ith training example

  y(i): “output” variable / “target” variable of ith training example

  n: number of features

  θ: parameters

  Hypothesis: hθ(x) = θ0 + θ1x1 + θ2x2 + … +θnxn


Cost Function

  IDEA: Choose θso that hθ(x) is close to y for our training examples (x, y).

A.Linear Regression with One Variable Cost Function

  Cost Function:    mamicode.com,码迷

  Goal:    mamicode.com,码迷

  Contour Plot:  

  mamicode.com,码迷

B.Linear Regression with Multiple Variable Cost Function

  Cost Function:  mamicode.com,码迷

 

  Goal:   mamicode.com,码迷


Gradient Descent 

 Outline

mamicode.com,码迷

 Gradient Descent Algorithm

mamicode.com,码迷

迭代过程收敛图可能如下:

mamicode.com,码迷

(此为等高线图,中间为最小值点,图中蓝色弧线为可能的收敛路径。)

Learning Rate α:

      1) If α is too small, gradient descent can be slow to converge;

      2) If α is too large, gradient descent may not decrease on every iteration or may not converge;

      3) For sufficiently small α , J(θ) should decrease on every iteration;

Choose Learning Rate α: Debug, 0.001, 0.003, 0.006, 0.01, 0.03, 0.06, 0.1, 0.3, 0.6, 1.0;

“Batch” Gradient Descent: Each step of gradient descent uses all the training examples;

“Stochastic” gradient descent: Each step of gradient descent uses only one training examples.


Normal Equation

IDEA: Method to solve for θ analytically.

mamicode.com,码迷 for every j, then   mamicode.com,码迷

Restriction: Normal Equation does not work when (XTX) is non-invertible.

PS: 当矩阵为满秩矩阵时,该矩阵可逆。列向量(feature)线性无关且行向量(样本)线性无关的个数大于列向量的个数(特征个数n).


Gradient Descent Algorithm VS. Normal Equation 

Gradient Descent:

          a) Need to choose α;

          b) Needs many iterations;

          c) Works well even when n is large; (n > 1000 is appropriate)

Normal Equation:

          a) No need to choose α;

          b) Don’t need to iterate;

          c) Need to compute mamicode.com,码迷;

          d) Slow if n is very large. (n < 1000 is OK)


Feature Scaling

IDEA: Make sure features are on a similar scale.

好处: 减少迭代次数,有利于快速收敛

Example: If we need to get every feature into approximately a -1 ≤ xi ≤ 1 range, feature values located in [-3, 3] or [-1/3, 1/3] fields are acceptable.

Mean normalization:  mamicode.com,码迷


HOMEWORK

好了,既然看完了视频课程,就来做一下作业吧,下面是Linear Regression部分作业的核心代码:

 1.computeCost.m/computeCostMulti.m

J=1/(2*m)*sum((theta*X-y).^2);

2.gradientDescent.m/gradientDescentMulti.m

h=X*theta-y;
v=X*h;
v=v*alpha/m;
theta1=theta;
theta=theta-v;

Linear Regression ----- Stanford Machine Learning(by Andrew NG)Course Notes,码迷,mamicode.com

Linear Regression ----- Stanford Machine Learning(by Andrew NG)Course Notes

标签:des   com   http   class   blog   style   div   img   code   size   tar   

原文地址:http://www.cnblogs.com/tec-vegetables/p/3696399.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!