Notes for Chapter 4. Gaussian Models

Posted on Mon 20 March 2017 in MLAPP

Matrix derivative

We are using Einstein's summation rule below: sum over repeating index. It is useful in evaluating matrix/tensor.

\begin{align} \delta |A|&=|A+\delta A|-|A|\\ &=|D+\inv U\delta A\inv V|-|A|\\ &=|D||I+\inv A\delta A|-|A|\\ &=|A|\tr(\inv A\delta A) \end{align}

Using

\begin{align} \pfr{|A|}{A}&=e_ie_j^T\pfr{|A|}{A_{ij}}\\ &=e_ie_j^T|A|\tr(\inv A\pfr{A}{A_{ij}})\\ &=e_ie_j^T|A|\tr(\inv Ae_ie_j)\\ &=e_ie_j^T|A|\inv A_{ji}\\ &=|A|\invt A \end{align}

$$\pfr{A_{ij}B_{ji}}{A}=e_me_n^T\pfr{A_{ij}B_{ji}}{A_{mn}}=B_{nm}e_me_n^T=B^T$$

$$\pfr{a^TAa}{a}=e_m\pfr{a_iA_{ij}a_j}{a_m}=e_m(A_{mj}a_j+a_i A_{im})=(A+A^T)a$$

$$\pfr{a^TAa}{A}=e_me_n^T\pfr{a_iA_{ij}a_j}{A_{mn}}=aa^T$$

$$\pfr{\tr AB}{A}=e_me_n^T\pfr{A_{ij}B_{ji}}{A_{mn}}=e_me_n^TB_{nm}=B^T$$

Using these relationships, it is straightforward to MLE

Sigmoid model

$$\exp\left[\frac{(x-\mu)^2}{2\sigma^2}\right]=\exp\left(\frac{x^2+\mu^2-2\mu x}{2\sigma^2}\right)=Af(x)\exp(\mu x/\sigma^2)$$ $$\frac{A\exp(ax)}{A\exp(ax)+B\exp(bx)}=\frac{1}{1+\exp(Dx+C)}$$

Maximal Entropy of Gaussian Dist from Variation

Consider entropy

\begin{align} S&=-\int p\ln p \dd V-a\left(\int p\dd V-1\right)-\sum_i b_i\left(\int x_ip\dd V\right)\nl-\sum_{i,j} C_{ij}\left(\int x_ix_jp\dd V-\Sigma_{ij}\right)\\ &=-\int (\ln p+a+b^Tx+x^TCx)p \dd V+a+\tr(C\Sigma)\\ &=-\int L(p,x)\dd x+a+c \end{align}

The Euler-Lagrange equation is

\begin{align} \pfr{L}{p}&=1+a+b^Tx+x^TCx+\ln p=0\\ \Rightarrow p(x)&=\exp(-1-a-b^Tx-x^TCx) \end{align}

It is a Gaussian distribution, with constraints:

\begin{align} \int p\dd V&=1\\ \int x_ip\dd V&=0\\ \int x_ix_jp\dd V&=\Sigma_{ij}\\ \end{align}

and we can determine $a,b,c$ from constraints. Multivariable case is harder, but the method is the same: using multiplier and EL equation.

Linear Gaussian system

\begin{align} p(x)&=N(x|\mu_x, \Sigma_x)\\ p(y|x)&=N(y|Ax+b,\Sigma_y) \end{align} As $y=Ax+b$, we find $x=A^{-1}(y-b)$, so $$(y-Ax-b)^T\Sigma_y(y-Ax-b)=[x-A^{-1}(y+b)]^TA^T\Sigma_y^{-1}A[x-A^{-1}(y+b)]$$ Thus

\begin{align} N(y|Ax+b,\Sigma_y)&=N\left(x|A^{-1}(y-b), A^{-1}\Sigma_yA^{-T}\right)\\ &=N_c(x|A^T\Sigma_y^{-1}AA^{-1}(y-b), A^T\Sigma_y^{-1}A)\\ &=N_c(x|A^T\Sigma_y^{-1}(y-b), A^T\Sigma_y^{-1}A)\\ &=N_c(y|\Sigma_y^{-1}(Ax+b), \Sigma_y^{-1}) \end{align}

\begin{align} p(x,y)&= N(x|\mu_x, \Sigma_x)N(y|Ax+b,\Sigma_y)\\ &=N_c(x|\Sigma_x^{-1}\mu_x, \Sigma_x^{-1})N_c(x|A^T\Sigma_y^{-1}(y-b), A^T\Sigma_y^{-1}A)\\ &=p(y)N_c(x|\Sigma_x^{-1}\mu_x+A^T\Sigma_y^{-1}(y-b), \Sigma_x^{-1}+A^T\Sigma_y^{-1}A)\\ &=p(y)N(x|\mu_{x|y},\Sigma_{x|y})\\ &=p(y)p(x|y)\\ \end{align} Where

\begin{align} \Sigma_{x|y}^{-1}&=\Sigma_x^{-1}+A^T\Sigma_y^{-1}A\\ \mu_{x|y}&=\Sigma_{x|y}[\Sigma_x^{-1}\mu_x+A^T\Sigma_y^{-1}(y-b)]\end{align} How to calculate $p(y)$?

Attempt

We can rewrite $$N_c(x|A^T\Sigma_y^{-1}y+(A^T\Sigma_y^{-1}b+\Sigma_x^{-1}\mu_x), \Sigma_x^{-1}+A^T\Sigma_y^{-1}A)=N_c(y|Cx+d, xx)$$ From $p(x,y)=p(y)p(x|y)$, we find $\Sigma^{-1}=\Sigma_x^{-1}+A^T\Sigma_y^{-1}A-\Sigma_{x|y}^{-1}$