【李宏毅机器学习】12. 循环神经网络

时间：2020-11-01 09:29:51 阅读：18 评论：0 收藏：0 [点我收藏+]

标签：erb time VID temp cat 无法 err inpu cnn

案例：Slot Filling
- Feedforward Network
  - Input: word vector
  - Output: word属于每个slot的概率
  - 问题：无法使用前文的信息，可能造成误判
  - 解决：在NN中引入Memory，使NN能够记住前文的信息，即RNN
- word -> vector
  - 1-of-N encoding
    - Other
    - Word hashing
RNN
- hidden layer的输出存储在memory中
- memory中的数据又作为hidden layer的输入
- 对Input的顺序敏感
- 对于句子中的相同词汇，由于引入了Memory，前文不同，输出也会不同
- 可以扩展到多层
- 种类
  - Elman Network
    - hidden layer的输出存储到memory中
  - Jordan Network
    - output layer的输出存储到memory中
  - Bidirectional RNN
    - 从两个方向扫描句子，output layer的输出来自于两个不同方向hidden layer的输出
  - LSTM
    - Long Short-term Memory
LSTM
- 每个Neuron有4个组成部分
  - Input Gate
  - Memory Cell
  - Forget Gate
  - Output Gate
- 4个输入
  - Input of Input Gate
  - Signal control of Input Gate
  - Signal control of Forget Gate
  - Signal control of Output Gate
- 1个输出
  - Output of Output Gate
- 每个Neuron中的处理步骤
  - Input of Input Gate: \(z\) -> \(g(z)\)
  - Signal control of Input Gate: \(z_i\) -> \(f(z_i)\)
  - Multiply: \(g(z)f(z_i)\)
  - Signal control of Forget Gate: \(z_f\) -> \(f(z_f)\)
  - Old Memory: \(c\)
  - Multiply: \(cf(z_f)\)
  - New Memory: \(c‘=cf(z_f)\) -> \(h(c‘)\)
  - Signal control of Output Gate: \(z_o\) -> \(f(z_o)\)
  - Multiply: \(h(c‘)f(z_o)\)
  - Output of Output Gate: \(a=h(c‘)f(z_o)\)
  - \(f\)通常是Sigmoid
- 可以把整个输入\(x_i\)分别给到4个输入，也可以把输入\(x_i\)分成4个vector分别输入
- peephole
  - 把Memory Cell的输出也作为Input Gate的输入之一
- Multi-layer LSTM
  - LSTM几乎成了RNN的代名词
Keras support
- LSTM
- GRU
  - Gated Recurrent Unit
  - 比LSTM简单，性能相近
- SimpleRNN
训练方法：BPTT
- Backpropagation through time
The error surface is though
- very flat or very steep
- 直观解释
  - \(w=1,w^{1000}=1\)
  - \(w=1.01,w^{1000}\approx 20000\)
  - \(w=0.99,w^{1000}\approx 0\)
  - \(w=0.01,w^{1000}\approx 0\)
- 解决方案：
  - LSTM
    - 能够解决梯度消失，但不能解决梯度爆炸
    - 只要Forget Gate打开，memory就会被考虑
  - Clockwise RNN
  - SCRN
    - Structurally Constrained Recurrent Network
  - Vanilla RNN+用单位矩阵初始化+ReLU激活函数
RNN的其他应用
- 多对一
  - Input: vector sequence
  - Output: one vector
  - Sentiment Analysis 情感分析
  - Key Term Extraction 关键词抽取
- 多对多（输出更短）
  - Input: vector sequence
  - Output: shorter vector sequence
  - Speech Recognition 语音识别
    - Trimming
    - CTC
      - Connectionist Temporal Classification
- 多对多（无限制）
  - Input: vector sequence
  - Output: vector sequence with any length
  - Sequence to sequence learning
  - Machine Translation 机器翻译
  - Syntactic parsing 语法解析
  - Auto-encoder
    - 单层
    - 多层
    - Encoder
    - Decoder
  - 应用
    - Chat-bot
    - Video Caption Generation
    - Image Caption Generation
- Attention-based model
- Reading Comprehension
- Visual Question Answering
- Speech Question Answering
  - 托福听力考试
RNN v.s. Structured
- RNN
  - 单向RNN没有考虑整个sequence
    - 双向RNN
  - cost和error不总是相关
  - 可以Deep
- Structured
  - 使用Viterbi算法可以考虑整个sequence
  - 能明确考虑到label之间的依赖
  - cost是error的上界
- 可以结合起来
  - Speech Recognition: CNN/LSTM/DNN+HMM
  - Semantic Tagging: Bi-directional LSTM+CRF/Structured SVM
Deep and Structured will be the future
- GAN
- Conditional GAN
- Connect Energy-based model with GAN
- Deep learning model for inference
推荐一本教科书：《Deep Learning》《深度学习》（花书）
- Part 2: Deep Learning
- Part 3: Structured Learning

【李宏毅机器学习】12. 循环神经网络

标签：erb time VID temp cat 无法 err inpu cnn

原文地址：https://www.cnblogs.com/huzheyu/p/lihongyi-ml-12.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行