## # 4.1

As $$\E[x]=0,\quad \E[y]=\E[x]^2+\Var[x]^2=1/3$$

\begin{align} \Cov[x,y]&=E[xy-x\bar y-\bar xy+\bar x\bar y]\\ &=\E[xy]-\E[x]\E[y]\\ &=\int_{-1}^1 \frac{x^3}{2}dx\\ &=0\\ &\Rightarrow\rho=0\end{align}

## # 4.2

$$P(Y=y)=\frac{P(X=y)+P(X=-y)}{2}=P(X=y)$$

It is obvious that $\E[x]=\E[y]=0$,

\begin{align} \Cov[x,y]&=\E[xy]-\E[x]\E[y]\\ &=P(W=1)\E[x,y|W=1]+P(W=-1)\E[x,y|W=-1]\\ &=(E[x^2]+E[-x^2])/2\\ &=0 \end{align}

## # 4.3

We need to prove $|\rho|<1$, i.e. $$\Cov[x,y]^2\leq \Cov[x,x]\Cov[y,y]$$

The $\Cov$ function satisty

• Symmetry $\Cov[x,y]=\Cov[y,x]$
• Double linearity $\Cov[\lambda x+\mu y, z]=\lambda\Cov[x, z]+\mu\Cov[y,z]$
• Positive definiteness $\Cov[x, x]\geq 0$, equal sign holds iff $x=0$

Using Cauchy-Schwartz Inequality, the conclusion is straightforward.

## # 4.4

It is obvious that $\Cov[X,Y+c]=\Cov[X,Y]$ for any const $c$. So for $Y=aX+b$,

\begin{align} \Cov[X,Y]&=\Cov[x,aX]=a\Cov[x,x]\\ \Cov[Y,Y]&=a^2\Cov[X,X] \end{align}

So we have $$\rho=\frac{\Cov[X,Y]}{\sqrt{\Cov[X,X]\Cov[Y,Y]}}=\frac{a\Cov[X,X]}{|a|\Cov[X,X]}=\sgn[a]$$

## # 4.5

Suppose $\Lambda=\Sigma^{-1}=U^TDU$, where $U$ is orthonormal and $D$ is diagonal, then we can find basis $\vec y=U(\vec x-\vec{\mu})$ , which simplifies the integral to

\begin{align} \int \exp\left(-\frac{y^TDy}{2}\right)&=\prod \int \exp\left(-\frac{D_iy_i^2}{2}\right)dy_i\\ &=\prod \sqrt{\frac{2\pi}{D_i}}\\ &=\frac{(2\pi)^{d/2}}{\sqrt{\det D}}\\ &=(2\pi)^{d/2}\sqrt{\det \Sigma}\\ \end{align}

## # 4.6

Obviously, $\det\Sigma=(1-\rho^2)\sigma_1^2\sigma_2^2$

The inverse matrix of $\Sigma$ is

\begin{align} \Lambda&=\adj\Sigma/\det\Sigma\\ &=\begin{bmatrix} \sigma_2^2& -\rho\sigma_1\sigma_2\\ -\rho\sigma_1\sigma_2& \sigma_1^2 \end{bmatrix}/\left[(1-\rho^2)\sigma_1^2\sigma_2^2\right]\\ &=\frac{1}{1-\rho^2}\begin{bmatrix} \dfrac{1}{\sigma_1^2}& -\dfrac{\rho}{\sigma_1\sigma_2}\\ -\dfrac{\rho}{\sigma_1\sigma_2}& \dfrac{1}{\sigma_2^2} \end{bmatrix} \end{align}

Plug thess expressions into the original pdf, we can easily prove the (4.268)

## # 4.7 Bivariate conditioning

\begin{align} p(x_1,x_2)&=\frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-\rho^2}}\exp\left(-\frac{1}{2(1-\rho^2)}\left(\frac{(x_1-\mu_1)^2}{\sigma_1^2}+\frac{(x_2-\mu_2)^2}{\sigma_2^2}-2\rho \frac{x_1-\mu_1}{\sigma_1}\frac{x_2-\mu_2}{\sigma_2}\right)\right)\\ & =\frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-\rho^2}}\exp\left(-\frac{1}{2(1-\rho^2)}\left(\frac{x_1-\mu_1}{\sigma_1}-\rho\frac{x_2-\mu_2}{\sigma_2}\right)^2+\frac{(x_2-\mu_2)^2}{2\sigma_2^2}\right)\\ & =\frac{N(x_2|\mu_2,\sigma_2^2)}{\sqrt{2\pi}\sigma_1\sqrt{1-\rho^2}}\exp\left(-\frac{1}{2\sigma_1^2(1-\rho^2)}\left(x_1-\mu_1-\rho\frac{\sigma_1}{\sigma_2}(x_2-\mu_2)\right)^2\right)\\ &=N(x_2|\mu_2,\sigma_2^2)N\left(x_1\Big|\mu_1+\rho\frac{\sigma_1}{\sigma_2}(x_2-\mu_2),\sigma_1^2(1-\rho^2)\right)\\ &=p(x_2)N\left(x_1\Big|\mu_1+\rho\frac{\sigma_1}{\sigma_2}(x_2-\mu_2),\sigma_1^2(1-\rho^2)\right)\\ \Rightarrow\quad p(x_1|x_2)&=N\left(x_1\Big|\mu_1+\rho\frac{\sigma_1}{\sigma_2}(x_2-\mu_2),\sigma_1^2(1-\rho^2)\right) \end{align}

If $\sigma_i=1$, then it is simplified to $$p(x_1|x_2)=N\left(x_1|\mu_1+\rho(x_2-\mu_2),1-\rho^2\right)$$

## # 4.8 TBD

Try to use python to solve it!

## # 4.9 Sensor fusion with known variances in 1d

\begin{align} p(\mu|D)&\propto \prod_i N(y_i^{(1)}|\mu, v_1)\prod_i N(y_i^{(2)}|\mu, v_2)\\ &\propto \prod_i N(\mu|y_i^{(1)}, v_1)\prod_i N(\mu|y_i^{(2)}, v_2)\\ &\propto \prod_i N_c(\mu|v_1^{-1}y_i^{(1)}, v_1^{-1})\prod_i N_c(\mu|v_2^{-1}y_i^{(2)}, v_2^{-1})\\ &\propto N_c\left(\mu\Big|v_1^{-1}\sum_i y_i^{(1)}+v_2^{-1}\sum_i y_i^{(2)}, n_1v_1^{-1}+n_2v_2^{-1}\right)\\ &\propto N_c\left(\mu\Big|n_1v_1^{-1}\bar y^{(1)}+n_2v_2^{-1}\bar y^{(2)}, n_1v_1^{-1}+n_2v_2^{-1}\right)\\ &\propto N\left(\mu\Big|\frac{n_1\bar y^{(1)}/v_1+n_2\bar y^{(2)}/v_1}{n_1/v_1+n_2/v_2}, \frac{1}{n_1/v_1+n_2/v_2}\right) \end{align}

So the mean of $\mu$ is $\dfrac{n_1\bar y^{(1)}/v_1+n_2\bar y^{(2)}/v_1}{n_1/v_1+n_2/v_2}$, and variance of $\mu$ is $\dfrac{1}{n_1/v_1+n_2/v_2}$.

## # 4.10 Information form marginalization formula

$$\det \Lambda=\det \Sigma^{-1}=(\det\Sigma)^{-1}$$\begin{align} (x-\mu)^T\Sigma^{-1}(x-\mu)&=x^T\Lambda x-2x^T\Lambda \mu+\mu^T\Lambda\mu\\ &=x^T\Lambda x-2x^T\xi+\xi^T\Lambda^{-1}\xi \end{align}

From $$\begin{bmatrix} \Sigma_{11}& \Sigma_{12}\\ \Sigma_{21}& \Sigma_{22} \end{bmatrix} \cdot\begin{bmatrix} \Lambda_{11}&\Lambda_{12}\\ \Lambda_{21}&\Lambda_{22} \end{bmatrix}=I,$$ we find

Using the rule $N(\mu,\Sigma)\rightarrow N_c(\Sigma^{-1}\mu,\Sigma^{-1})$

\begin{align} p(x_2)&=N(x_2|\mu_2,\Sigma_{22})\\ &=N_c(x_2|\Sigma_{22}^{-1}\mu_2, \Sigma_{22}^{-1})\\ &=N_c(x_2|(\Lambda_{22}-\Lambda_{21}\Lambda_{11}^{-1}\Lambda_{12})\mu_2,\Sigma_{22}^{-1})\\ &=N_c(x_2|(\Lambda_{22}\mu_2+\Lambda_{21}\mu_1)-\Lambda_{21}\Lambda_{11}^{-1}(\Lambda_{11}\mu_1+\Lambda_{12}\mu_2),\Sigma_{22}^{-1})\\ &=N_c(x_2|\xi_2-\Lambda_{21}\Lambda_{11}^{-1}\xi_1,\Lambda_{22}-\Lambda_{21}\Lambda_{11}^{-1}\Lambda_{12})\\ p(x_1|x_2)&=N(x_1|\mu_{1|2}, \Sigma_{1|2})\\ &=N(x_1|\Lambda_{11}^{-1}(\Lambda_{11}\mu_1-\Lambda_{12}(x_2-\mu_2)), \Lambda_{11}^{-1})\\ &=N_c(x_1|\Lambda_{11}\mu_1-\Lambda_{12}(x_2-\mu_2),\Lambda_{11})\\ &=N_c(x_1|(\Lambda_{11}\mu_1+\Lambda_{12}\mu_2)-\Lambda_{12}x_2,\Lambda_{11})\\ &=N_c(x_1|\xi_1-\Lambda_{12}x_2,\Lambda_{11})\\ \end{align}

## # 13

$$p(\mu|D)\propto N(\mu|\mu_0,9)\prod_i N(y_i|\mu, 4)$$

And we find $\mu\sim N(\mu_n,\sigma_n^2)$, where $\sigma_n^2=\dfrac{1}{1/9+n/4}$, so $n=4/\sigma_n^2-4/9$. We need $$1.96\sigma_n=1/2\quad\Rightarrow n>62.29$$