码迷,mamicode.com
首页 > 系统相关 > 详细

Machine Learning

时间:2019-12-14 09:50:37      阅读:113      评论:0      收藏:0      [点我收藏+]

标签:img   data   rds   tor   standards   on()   学习   ssi   nump   

假期找了几本关于机器学习的书,将一些比较重要的核心公式整体到这里。

模型描述

特征空间假设, 寻找线性系数 $ theta $ 以希望用一个线性函数逼近目标向量。

逼近的效果好坏叫做 Cost Function, 下面列出的MSE便是其中一种。

Linear Regression

梯度下降

技术图片

其中

带有正则项

  • Ridge Regression
技术图片
  • LASSO
  • Elastic Net
sklearn-线性回归
1
2
3
4
5
6
7
8
9
10
11

from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X, y)
lr.intercept_, lr.coef_


from sklearn.metrics import mean_squared_error

# sgd
from sklearn.linear_model import SGDRegressor

对数线性回归 - Logistic Regression

$ sigma(t) $ 是Sigmoid函数

Logistic Regression cost function (log loss)

Logistic cost function partial derivatives

sklearn-Logistic Regression
1
2
3
4
5

from sklearn.linear_model import LogisticRegression

log_reg = LogisticRegression()
log_reg.fit(X, y)

Softmax Regression

支持向量机

Support Vector Machine

  • Decision Functions and Predictions
  • Hard Margin Classification

subject to

  • Soft Margin Classification

subject to

  • Dual Problem

subject to

LinearSVC
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18


import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
< 大专栏  Machine Learningspan class="line">
iris = datasets.load_iris()
X = iris["data"][:, (2, 3)] # petal length, petal width
y = (iris["target"] == 2).astype(np.float64) # Iris-Virginica

svm_clf = Pipeline([
("scaler", StandardScaler()),
("linear_svc", LinearSVC(C=1, loss="hinge")),
])

svm_clf.fit(X, y)

Common kernels

  • Linear
  • Polynomial
  • Gaussian RBF
  • Sigmoid

从树到森林。

Decision Tree

Decision Trees

  • Gini impurity
  • Entropy
  • CART cost function for regression

where

DecisionTreeClassifier
1
2
3
4
5
6
7
8
9
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

iris = load_iris()
X = iris.data[:, 2:] # petal length and width
y = iris.target

tree_clf = DecisionTreeClassifier(max_depth=2)
tree_clf.fit(X, y)

Random Forests

RF 在我看来是 Ensemble Learning (集成学习)的经典代表。

以Classifiers举例,对待同样的数据, 不同分类器可能有不同的决策结果。

Logistic Regression classifier, Random Forest classifier, K-Nearest Neighbors classifier

自然而然的, 可引入选举策略来作最终决策。

voting of classifier
1
2
3
4
5
6
7
8
9
10
11
12
13
14

from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

log_clf = LogisticRegression()
rnd_clf = RandomForestClassifier()
svm_clf = SVC()

voting_clf = VotingClassifier(
estimators=[('lr', log_clf), ('rf', rnd_clf), ('svc', svm_clf)],
voting='hard')
voting_clf.fit(X_train, y_train)

Boosting

Adaboost

Gradient Boosting

效果指标

确定Model收敛的方向, 对连续和离散模型都有若干种Metrics

Classification

$F_1$ 是二者的调和平均

precision_score and recall_score
1
2

from sklearn.metrics import precision_score, recall_score

Regression

  • MSE

Machine Learning

标签:img   data   rds   tor   standards   on()   学习   ssi   nump   

原文地址:https://www.cnblogs.com/lijianming180/p/12037887.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!