Solutions for Chapter 2 Probability

$\def\bra#1{\mathinner{\langle{#1}|}} \def\ket#1{\mathinner{|{#1}\rangle}} \def\braket#1{\mathinner{\langle{#1}\rangle}} \newcommand{\mdef}{\overset{\mathrm{def}}{=}} \newcommand{\bm}{\mathbf} \DeclareMathOperator{\Var}{Var} \DeclareMathOperator{\det}{det} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator{\E}{E} \DeclareMathOperator{\Cov}{Cov} \DeclareMathOperator{\Beta}{B} \DeclareMathOperator{\Bdist}{Beta} \DeclareMathOperator{\sgn}{sgn} \DeclareMathOperator{\adj}{adj} \DeclareMathOperator{\ii}{i} \DeclareMathOperator{\dd}{d} \newcommand{\pp}{\partial} \DeclareMathOperator{\rhs}{RHS} \DeclareMathOperator{\lhs}{LHS} \DeclareMathOperator{\Beta}{B}$

# 2.1

$P=1/3$
$P=1/2$, the difference is that you happen to see a child.

# 2.2 Obvious

# 2.3

\begin{align} \Var[x+y]&=\E[(x+y)^2]-\E[x+y]^2\\ &=\E[x^2]+\E[y^2]+2\E[xy]-\E[x]^2-\E[y]^2-2\E[x]\E[y]\\ &=\Var[x]+\Var[y]+2\Cov[x,y] \end{align}

# 2.4

$I, N$ means infected and not infected, $P$ means positive $$\Pr(I|P)=\frac{\Pr(IP)}{\Pr(P)}=\frac{\Pr(IP)}{\Pr(IP)+\Pr(NP)}\approx\frac{10^{-4}}{10^{-4}+0.01}\approx 0.01$$

# 2.5

The door we chose has probability $1/3$ for the prize. After the host has revealed another invalid door, the chance of the last door is $1-1/3=2/3$. Note the host is not randomly removing a door, so the chance of the two unchosen doors will fall to the last door.

# 2.6

$$P(H|e_1,e_2)=\frac{P(H,e_1,e_2)}{P(e_1,e_2}=\frac{P(e_1,e_2|H)P(H)}{P(e_1,e_2)}$$
- Only ii is sufficient
Given $P(e_1,e_2|H)=P(e_1|H)P(e_2|H)$, we have $$P(H|e_1,e_2)=\frac{P(e_1,H)P(e_2,H)}{P(H)P(e_1,e_2)}$$
- Use $P(e_1,e_2)=\sum_h P(e_1,e_2|H=h)P(H=h)$, i is equivalent to iii
- Both i, iii are sufficient now

# 2.7

Pairwise independence means $$P(X_iX_j)=P(X_i)P(X_j), \forall i\neq j\in \{1,\ldots ,n\}$$

Mutually independent means $$P(\prod_{X\in S}X)=\prod_{X\in S}, \forall S\subseteq \{1,\ldots ,n\}$$

Thus for $n\geq 3$, the two group of identities are not necessarily equivalent.

# Counter example

$$X, Y\sim \operatorname{Ber}(0.5), \quad Z=X\oplus Y\sim \operatorname{Ber}(0.5)$$

# 2.8

We want to prove $$P(x,y|z)=P(x|z)P(y|z)\Leftrightarrow P(x,y|z)=g(x,z)h(y,z)$$

The LHS to RHS is obvious. We are now going from RHS to LHS. First of all, $p(x,y|z)$ must satisfy unitary condition $$\iint p(x,y|z)dxdy=\left(\int g(x,z)dx\right)\left(\int h(y,z)dy\right)=1$$ and $$P(x|z)= g(x,z)\int h(y,z)dy,\qquad P(y|z)= h(y,z)\int h(x,z)dx$$ And the deduction to LHS is trivial now.

# 2.9

$$X\perp W|Z,Y\Rightarrow P(XWZY)=P(XZY)P(WZY)/P(ZY)$$$$X\perp Y|Z\Rightarrow P(XYZ)=P(XZ)P(YZ)/P(Z)$$$$X\perp Y|W\Rightarrow P(XYW)=P(XW)P(YW)/P(W)$$

# a. True

We need to prove $$P(XYWZ)=P(XZ)P(YWZ)/P(Z),$$ which is the result of plugging 2nd equation into 1st one.

# b. False

We need to prove $$P(XYZW)=P(XZW)P(YZW)/P(ZW),$$ which is not reachable.

# 2.10

The pdf of $y=1/x$ is

\begin{align} g(y)&=f(x(y))\left|\frac{dx}{dy}\right|=f(1/y)/y^2\\ &=\frac{b^a}{\Gamma(a)}(1/y)^{a-1}e^{-b/y}/y^2\\ &=\frac{b^a}{\Gamma(a)}y^{-(a+1)}e^{-b/y} \end{align}

# 2.11

\begin{align} Z^2&=\int_0^{2\pi}\int_0^\infty r\exp\left(-\frac{r^2}{2\sigma^2}\right)drd\theta\\ &=2\pi\int_0^\infty\exp\left(-\frac{t}{\sigma^2}\right)dt,\quad t=r^2/2\\ &=2\pi\sigma^2 \end{align}

# 2.12

Notice $-\sum_x\sum_y p(x,y)\log p(x)=-\sum_x p(x)\log p(x)=H(X)$

\begin{align} I(X,Y)&=\sum_x\sum_y p(x,y)\log\frac{p(x,y)}{p(x)p(y)}\\ &=H(X)+H(Y)-H(X,Y) \end{align}\begin{align} H(Y|X)&=-\sum_y \log p(x,y)\log\frac{p(x,y)}{p(x)}\\ &=H(X,Y)-H(X) \end{align}

So $I(X,Y)=H(X)-H(X|Y)=H(Y)-H(Y|X)$

# 2.13

Suppose we have $$X_{\pm}=\frac{X_1\pm X_2}{\sqrt{2}}$$ then $$\begin{bmatrix} X_+\\ X_- \end{bmatrix}\sim N\left(\mathbf{0},\begin{bmatrix}\sigma_+^2&\\&\sigma_-^2\end{bmatrix}\right)=N(0,\sigma_+^2)N(0,\sigma_-^2)$$

And $$\sigma_+^2\frac{(X_1+X_2)^2}{2}+\sigma_-^2\frac{(X_1-X_2)^2}{2}=\sigma^2 (X_1^2+X_2^2)-2\rho X_1X_2$$

Thus $\sigma_\pm=(1\pm\rho)\sigma^2$. And $X_+$ is independent on $X_-$. From $X_\pm$, we can get the distribution of $X_1$ and $X_2$.

If $\rho=1$, then $X_1=X_2=X/\sqrt 2$, the mutual information $I(X_1,X_2)=H(X_1)$

If $\rho=-1$, similiar.

If $\rho=0$, they are independent. So $I(X_1,X_2)=0$

# 2.14

$r=1-H(Y|X)/H(X)=(H(X)-H(Y|X))/H(X)=I(Y, X)/H(x)$
It is obvious that $H(X)>0$.
1. $r\geq 0$ because we have $I(X,Y)\geq 0$ using Jensen inequality: $$I(X,Y)=-\sum_{i,j} \Pr(i,j)\log \frac{p_ip_j}{\Pr(i,j)}\geq - \log \sum_{i,j}p_ip_j=0$$
2. We need to prove $H(Y|X)\geq 0$. And $H(Y|X)=H(X,Y)-H(Y)\geq 0$.

\begin{align} H(X,Y)&=-\sum_i\sum_j \Pr(i,j)\log \Pr(i,j)\\ &\geq -\sum_i p_i\log \frac{\sum_j p^2(i,j)}{p_i}\\ &\geq -\sum_i p_i\log \frac{p_i^2}{p_i}=H(Y) \end{align}

As a result, $r\leq 1$

$r=0$ holds iff $I(X,Y)=0$. In this case, $\dfrac{p_ip_j}{\Pr(i,j)}$ is a constant, so $X$ is independent on $Y$
$r=0$ holds iff $I(X,Y)=H(X)$. i.e. $X=Y$

# 2.15

MLE(Maximum Likehood Estimation) is to maximize the likehood $$\prod_i q(x=i|\theta)^{n_i}=\exp(n_i\log q_i)\propto\exp(\sum_ip_i\log q_i)$$

It is equivalent to minimize $-\sum_ip_i\log q_i\equiv -\sum_ip_i\log(q_i/p_i)$

# 2.16

$f=x^{a-1}(1-x)^{b-1}/\Beta(a,b)\propto x^{a-1}(1-x)^{b-1}$ $$E(x)=\int xf dx=\Beta(a+1,b)/\Beta(a,b)=\frac{a}{a+b}$$ We can prove the last step as follows:

\begin{align} \Beta(p, q+1)&=\int_0^1 x^{p-1}(1-x)^{q}dx\\ &=\int_0^1 x^{p-1}(1-x)^{q-1}dx-\int_0^1 x^{p}(1-x)^{q-1}dx\\ &=\int_0^1 x^{p-1}(1-x)^{q-1}dx-\int_0^1 x^{p}(1-x)^{q-1}dx\\ &=\int_0^1 x^{p-1}(1-x)^{q-1}dx-\frac{p}{q}\int_0^1 x^{p-1}(1-x)^{q}dx\\ &=\Beta(p,q)-\frac{p}{q}\Beta(p, q+1) \end{align}

As a result, $\Beta(p, q+1)=\dfrac{q}{p+q}\Beta(p,q)$. We can also use the relationship betweeen Beta and Gamma function, and $\Gamma(p+1)=p\Gamma(p)$ $$\Beta(p,q+1)=\frac{\Gamma(p)\Gamma(q)}{\Gamma(p+q+1)}=\frac{q\Gamma(p)\Gamma(q)}{(p+q)\Gamma(p+q)}=\frac{q}{p+q}\Beta(p,q)$$

$$E(x^2)=\int x^2f dx=\Beta(a+2,b)/\Beta(a,b)=\frac{a}{a+b}\frac{a+1}{a+b+1}$$$$\frac{f'}{f}=\frac{a-1}{x}+\frac{b-1}{1-x}=0\Rightarrow x_m=\frac{a-1}{a-b}$$$$Var(x)=E(x^2)-E(x)^2=\frac{ab}{(a+b)^2(a+b+1)}$$

# 2.17

The cdf(cumulative distribution function) of each variable is

$$F_1(x)=\Pr(X<x)=x$$

So the cdf that the smallest one is smaller than $x$ is $$F_m(x)=1-(1-F_1(x))^2=x(2-x)$$

So the density function is $f_m=F_m'=2(1-x)$. The subscript $m$ means minimum