码迷,mamicode.com
首页 > 其他好文 > 详细

Exponential family of distributions

时间:2021-06-05 18:38:14      阅读:0      评论:0      收藏:0      [点我收藏+]

标签:one   mode   lan   lin   mod   形状   最大   container   learn   

Choi H. I. Lecture 4: Exponential family of distributions and generalized linear model (GLM).

定义

定义: 一个分布具有如下形式的密度函数:

\[f_{\theta}(x) = \frac{1}{Z(\theta)} h(x) e^{\langle T(x), \theta \rangle}, \]

则该分布属于指数族分布.
其中\(x \in \mathbb{R}^m\), \(T(x) = (T_1(x), T_2(x), \cdots, T_k(x)) \in \mathbb{R}^k\), \(\theta = (\theta_1, \theta_2,\cdots, \theta_k)\)为未知参数, \(Z(\theta) = \int h(x)e^{\langle T(x), \theta \rangle} \mathrm{d}x\)为配平常数.

若令\(C(x) = \log h (x)\), \(A(\theta) = \log Z(\theta)\), 则

\[f_{\theta}(x) = \exp (\langle T(x), \theta \rangle - A(\theta) + C(x)). \]

指数族分布还有一种更一般的形式:

\[f_{\theta}(x) = \exp (\frac{\langle T(x), \theta \rangle - A(\theta)}{\phi} + C(x, \phi)), \]

更甚者

\[f_{\theta}(x) = \exp (\frac{\langle T(x), \lambda(\theta) \rangle - A(\theta)}{\phi} + C(x, \phi)), \]

\(\phi\)控制分布的形状.

性质

\(A(\theta)\)

Proposition 1:

\[\nabla_{\theta}A(\theta) = \int f_{\theta}(x) T(x) \mathrm{d}x = \mathbb{E}[T(X)]. \]

proof:

已知:

\[\int f_{\theta}(x) \mathrm{d}x = \int \exp (\frac{\langle T(x), \theta \rangle - A(\theta)}{\phi} + C(x, \phi)) \mathrm{d}x = 1. \]

两边关于\(\theta\)求梯度得:

\[\int f_{\theta}(x) \frac{T(x) - \nabla_{\theta} A(\theta)}{\phi} \mathrm{d}x = 0 \Rightarrow \nabla_{\theta} A(\theta) = \mathbb{E}[T(X)]. \]

Proposition 2:

\[D^2_{\theta} A = (\frac{\partial^2 A}{\partial\theta_i \partial \theta_j}) = \frac{1}{\phi}\mathrm{Cov}(T(X), T(X)) = \frac{1}{\phi}Cov(T(X)). \]

proof:

\[\frac{\partial A}{\partial \theta_i} = \int \exp (\frac{\langle T(x), \theta \rangle - A(\theta)}{\phi} + C(x, \phi)) T_i(x) \mathrm{d}x. \]

\[\begin{array}{ll} \frac{\partial^2 A}{\partial \theta_i \partial \theta_j} &= \int f_{\theta}(x) \frac{T_j (x) - \frac{\partial A}{\partial \theta_j}}{\phi} T_i(x) \mathrm{d}x \&= \frac{1}{\phi}\int f_{\theta}(x) (T_j(x) - \frac{\partial A}{\partial \theta_j}) (T_i(x) - \frac{\partial A}{\partial \theta_i})\mathrm{d}x \&= \mathrm{Cov}(T_i(X), T_j(X)). \end{array} \]

Corollary 1: \(A({\theta})\)关于\(\theta\)是凸函数.

既然其黑塞矩阵半正定.

极大似然估计

设有\(\{x^i\}_{i=1}^n\)个样本, 则对数似然函数为

\[l(\theta) = \frac{1}{\theta}[\langle \theta, \sum_{i=1}^n T(x^i)-nA(\theta)] + \sum_{i=1}^n C(x^i, \phi), \]

因为\(A(\theta)\)是凸函数, 所以上述存在最小值点, 且

\[\nabla_{\theta} l(\theta) = \frac{1}{\phi}[\sum_{i=1}^n T(x^i) - n \nabla_{\theta}A(\theta)], \]

故该最小值点在

\[\nabla_{\theta}A(\theta) = \frac{1}{n} \sum_{i=1}^n T(x^i), \]

处达到.

最大熵

最大熵原理-科学空间

指数族分布实际上满足最大熵分布, 这是在没有任何偏爱的尺度下的分布.

\[\max_{f} \quad H(f) = -\int f(x)\log f(x) \mathrm{d} x. \]

等价于最小化

\[\min_f \int f(x)\log f(x) \mathrm{d}x. \]

往往, 我们会有一些已知的统计信息, 通常以期望的形式表示:

\[\int f(x) h_i(x) \mathrm{d}x = c_i, \quad i=1,2\cdots, s. \]

则我们的目标实际上是:

\[\min_f \quad \int f(x)\log f(x) \mathrm{d}x \\mathrm{s.t.} \quad \int f(x) h_i(x) \mathrm{d}x = c_i, \quad i=0,2\cdots, s. \]

其中\(h_0 = 1, c_0 =1\), 即密度函数需满足\(\int f(x) \mathrm{d} x= 1\).

利用拉格朗日乘数得:

\[J(f,\lambda) = \int f(x)\log f(x) \mathrm{d}x + \lambda_0 (1 - \int f(x) \mathrm{d}x) + \sum_{i=1}^s \lambda_i [c_i - \int f(x) h_i(x) \mathrm{d}x] . \]

最优条件, \(J\)关于\(f\)的变分为0, 即

\[1 + \log f(x) - \lambda_0 - \sum_{i=1}^s \lambda_i h_i(x) = 0. \]

\[f(x) = \frac{1}{Z} \exp(\sum_{i=1}^s \lambda_i h_i(x)). \]

属于指数分布族.

例子

Bernoulli

\[P(x) = p^x (1-p)^{1-x} = \exp[x\log\frac{p}{1-p} + \log (1 - p)]. \]

\[\theta = \log \frac{p}{1-p}, \T(x) = x, \A(\theta) = \log (1 + e^{\theta}),\h(x) = 0. \]

指数分布

\[p(x) = \lambda \cdot e^{-\lambda x}=\exp[-\lambda x +\log \lambda ], \quad x \ge 0. \]

\[\theta = \lambda,\T(x) =-x, \A(\theta) = \log \frac{1}{\lambda}, \h(x) = \mathbb{I}(x\ge0). \]

正态分布

\[p(x) = \frac{1}{\sqrt{2\pi \sigma^2}} \exp [-\frac{(x-\mu)^2}{2\sigma^2}]. \]

\(\sigma\)视作已知参数:

\[p(x) = \exp [\frac{-\frac{1}{2}x^2 + x\mu - \frac{1}{2}\mu^2}{\sigma^2} - \frac{1}{2}\log (2\pi \sigma^2)]. \]

\[\theta = (\mu, 1), \T(x) = (x, -\frac{1}{2}x^2), \\phi = \sigma^2, \A(\theta) = \frac{1}{2}\mu^2, \C(x, \phi) = \frac{1}{2} \log (2\pi \sigma^2). \]

\(\sigma\)视作未知参数:

\[p(x) = \exp [-\frac{1}{2\sigma^2}y^2 + \frac{\mu}{\sigma^2}x - \frac{1}{2\sigma^2}\mu^2 - \log \sigma - \frac{1}{2}\log 2\pi]. \]

\[T(x) = (x, \frac{1}{2}x^2), \\theta = (\frac{\mu}{\sigma^2}, -\frac{1}{\sigma^2}), \A(\theta) = \frac{\mu^2}{2\sigma^2} + \log\sigma, \C(x) = -\frac{1}{2}\log(2\pi). \]

Exponential family of distributions

标签:one   mode   lan   lin   mod   形状   最大   container   learn   

原文地址:https://www.cnblogs.com/MTandHJ/p/14852936.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!