码迷,mamicode.com
首页 > Web开发 > 详细

卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning

时间:2020-05-15 09:17:45      阅读:79      评论:0      收藏:0      [点我收藏+]

标签:can   multitask   change   code   chm   nts   -o   init   sdn   

技术图片
Goals for the lecture:

Introduction & overview of the key methods and developments.
[Good starting point for you to start reading and understanding papers!]

原文链接:


技术图片
@

Probabilistic Graphical Models | Elements of Meta-Learning

01 Intro to Meta-Learning

技术图片

Motivation and some examples

When is standard machine learning not enough?
Standard ML finally works for well-defined, stationary tasks.
技术图片
But how about the complex dynamic world, heterogeneous data from people and the interactive robotic systems?
技术图片

General formulation and probabilistic view

What is meta-learning?
Standard learning: Given a distribution over examples (single task), learn a function that minimizes the loss:
技术图片
Learning-to-learn: Given a distribution over tasks, output an adaptation rule that can be used at test time to generalize from a task description
技术图片

A Toy Example: Few-shot Image Classification
技术图片
技术图片

Other (practical) Examples of Few-shot Learning
技术图片
技术图片
技术图片
技术图片

Gradient-based and other types of meta-learning

Model-agnostic Meta-learning (MAML) 与模型无关的元学习

  • Start with a common model initialization \(\theta\)
  • Given a new task \(T_i\) , adapt the model using a gradient step:
    技术图片
  • Meta-training is learning a shared initialization for all tasks:
    技术图片
    技术图片

Does MAML Work?
技术图片

MAML from a Probabilistic Standpoint
Training points: 技术图片
testing points:技术图片
MAML with log-likelihood loss对数似然损失:
技术图片
技术图片

One More Example: One-shot Imitation Learning 模仿学习
技术图片

Prototype-based Meta-learning
技术图片
Prototypes:
技术图片
Predictive distribution:
技术图片
Does Prototype-based Meta-learning Work?
技术图片

Rapid Learning or Feature Reuse 特征重用
技术图片
技术图片
技术图片
技术图片

Neural processes and relation of meta-learning to GPs

Drawing parallels between meta-learning and GPs
In few-shot learning:

  • Learn to identify functions that generated the data from just a few examples.
  • The function class and the adaptation rule encapsulate our prior knowledge.

Recall Gaussian Processes (GPs): 高斯过程

  • Given a few (x, y) pairs, we can compute the predictive mean and variance.
  • Our prior knowledge is encapsulated in the kernel function.

技术图片

Conditional Neural Processes 条件神经过程
技术图片
技术图片
技术图片
技术图片

On software packages for meta-learning
A lot of research code releases (code is fragile and sometimes broken)
A few notable libraries that implement a few specific methods:

技术图片
Takeaways

  • Many real-world scenarios require building adaptive systems and cannot be solved using “learn-once” standard ML approach.
  • Learning-to-learn (or meta-learning) attempts extend ML to rich multitask scenarios—instead of learning a function, learn a learning algorithm.
  • Two families of widely popular methods:
    • Gradient-based meta-learning (MAML and such)
    • Prototype-based meta-learning (Protonets, Neural Processes, ...)
    • Many hybrids, extensions, improvements (CAIVA, MetaSGD, ...)
  • Is it about adaptation or learning good representations? Still unclear and depends on the task; having good representations might be enough.
  • Meta-learning can be used as a mechanism for causal discovery.因果发现 (See Bengio et al., 2019.)

02 Elements of Meta-RL

What is meta-RL and why does it make sense?

Recall the definition of learning-to-learn
Standard learning: Given a distribution over examples (single task), learn a function that minimizes the loss:
技术图片
Learning-to-learn: Given a distribution over tasks, output an adaptation rule that can be used at test time to generalize from a task description
技术图片
Meta reinforcement learning (RL): Given a distribution over environments, train a policy update rule that can solve new environments given only limited or no initial experience.
技术图片

Meta-learning for RL
技术图片

On-policy and off-policy meta-RL

On-policy RL: Quick Recap 符合策略的RL:快速回顾
技术图片
REINFORCE algorithm:
技术图片

On-policy Meta-RL: MAML (again!)

  • Start with a common policy initialization \(\theta\)
  • Given a new task \(T_i\) , collect data using initial policy, then adapt using a gradient step:
    技术图片
  • Meta-training is learning a shared initialization for all tasks:
    技术图片
    技术图片
    Adaptation as Inference 适应推理
    Treat policy parameters, tasks, and all trajectories as random variables随机变量
    技术图片
    meta-learning = learning a prior and adaptation = inference
    技术图片
    Off-policy meta-RL: PEARL
    技术图片
    技术图片

Key points:

  • Infer latent representations z of each task from the trajectory data.
  • The inference networkq is decoupled from the policy, which enables off-policy learning.
  • All objectives involve the inference and policy networks.
    技术图片

Adaptation in nonstationary environments 不稳定环境
Classical few-shot learning setup:

  • The tasks are i.i.d. samples from some underlying distribution.
  • Given a new task, we get to interact with it before adapting.
  • What if we are in a nonstationary environment (i.e. changing over time)? Can we still use meta-learning?
    技术图片
    Example: adaptation to a learning opponent
    技术图片Each new round is a new task. Nonstationary environment is a sequence of tasks.

Continuous adaptation setup:

  • The tasks are sequentially dependent.
  • meta-learn to exploit dependencies
    技术图片

Continuous adaptation

Treat policy parameters, tasks, and all trajectories as random variables
技术图片

RoboSumo: a multiagent competitive env
an agent competes vs. an opponent, the opponent’s behavior changes over time
技术图片

Takeaways

  • Learning-to-learn (or meta-learning) setup is particularly suitable for multi-task reinforcement learning
  • Both on-policy and off-policy RL can be “upgraded” to meta-RL:
    • On-policy meta-RL is directly enabled by MAML
    • Decoupling task inference and policy learning enables off-policy methods
  • Is it about fast adaptation or learning good multitask representations? (See discussion in Meta-Q-Learning: https://arxiv.org/abs/1910.00125)
  • Probabilistic view of meta-learning allows to use meta-learning ideas beyond distributions of i.i.d. tasks, e.g., continuous adaptation.
  • Very active area of research.

卡耐基梅隆大学(CMU)元学习和元强化学习课程 | Elements of Meta-Learning

标签:can   multitask   change   code   chm   nts   -o   init   sdn   

原文地址:https://www.cnblogs.com/joselynzhao/p/12892696.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!