目录

CS188 Chapter 14 Probabilistic Reasoning

Notes CS188

Mark Chen | 13 Jun 2021

14.1 Representing Knowledge in an Uncertain Domain

This section will introduce a data structure called Bayesian network to represent the dependencies between variables. Bayesian network can represent any full-joint probability distribution and in many cases can do so very concisely.

A Bayesian network is a directed acyclic graph (DAC) where each node is annotated with quantitative probability information.

  1. Each node corresponds to a random variable, which may be discrete or continuous.
  2. A set of directed links or arrows connects pairs of nodes. If there is an arrow from node $X$ to $Y$, $X$ is said to be a parent of $Y$.
  3. Each node $X_i$ has a conditional probability distribution $\mathbf{P}(X_i\mid Parents(X_i))$ that quantifies the effect of the parents on the node.

445d0b3c3422cc07a6e40f85fca27a8

In each node of the Bayesian network, a conditional probability table (CPT) is stored. For instance, the node “$Toothache$” stores $\mathbf{P}(Toothache \mid Cavity)$.

In general, a table for a Boolean variable with $k$ Boolean parents contains $2^k$ independently specifiable probabilities.

14.2 The Semantics of Bayesian Networks

14.2.1 Representing the full joint distribution

The full joint distribution of variables $X_1$ to $X_n$ can be represented in this way:

\[P(x_1, \cdots, x_n) = \prod_{i=1}^{n}{P(x_i \mid parents(X_i))}\]

Where $parents(X_i)$ is the parent nodes of variable $X_i$.This equation defines what a given Bayesian network means.

How to construct Bayesian Networks

Except the equation above, there is another way to calculate $P(x_1, \cdots, x_n)$. Using product rule, we can also represent it in this way:

\[P(x_1, \cdots, x_n) = P(x_n \mid x_{n-1}, x_{n-2}, \cdots, x_1)P(x_{n-1}, x_{n-2}, \cdots, x_1)\]

Applying product rule recursively on the equation above, we can represent $P(x_1, \cdots, x_n)$ in a long product:

\[P(x_1, \cdots, x_n) = P(x_n \mid x_{n-1}, x_{n-2}, \cdots, x_1)P(x_{n-1} \mid x_{n-2}, \cdots, x_1)\cdots P(x_2 \mid x_1)P(x_1) \\= \prod_{i=1}^{n}{P(x_i \mid x_{i-1} \cdots x_1)}\]

Therefore, we can learn that[图片]

\[P(X_i\mid X_{i-1}. \cdots. X_i) = P(X_i \mid parents(X_i))\]

This equation indicates that … \(P(X\mid Parents(X))\;\bot\; P(Ancestor(X) \mid Parents(X))\) a48c4f85ffc63ba7ec657b08d301a7f

  1. Nodes: First find a set of variables $X_1, \cdots, X_n$ that will be included into the Bayes Network. The network will be more compact if the variables are ordered such that cause precede effect
  2. Links For $i = 1$ to $n$, do
    • Choose from $X_1, \cdots, X_{i-1}$ a minimal set of parents for $X_i$, such that Equation $P(X_i\mid X_{i-1}. \cdots. X_i) = P(X_i \mid parents(X_i))$ is satisfied.
    • For each parent insert a link from parent to $X_{i}$
    • Write down the conditional probability table, $\mathbf{P}(X_i \mid Parents(X_i))$
The Bayesian network is a correct representation of the domain only if each node is conditionally independent of its other predecessors in the node ordering, given its parents.
Some information is omitted here as they requires sophisticated knowledge in 2D Gaussian distribution, topology, etc. You can find them in Chapter 14.2 - 14.3

14.4 Exact Inference in Bayesian Networks

The basic task for probabilistic inference system is to compute the posterior probability distribution for a set of query variables, given some observed event (the assignment of values to a set of evidence variables).

\[\text{All Variables} = \text{Query Variables }\cup\text{ Evidence Variables }\cup\text{ Hidden Variables}\]

A typical query asks for the posterior probability distribution $\mathbf{P}(X\mid \mathbf{e})$

14.4.1 Inference by Enumeration

A query $\mathbf{P}(X \mid \mathbf{e})$ can be answered using the equation below:

\[\mathbf{P}(X \mid \mathbf{e}) = \alpha \mathbf{P}(X, \mathbf{e}) = \alpha \sum_y{\mathbf{P}(X,\mathbf{e}, \mathbf{y})}\]

a query can be answered using a Bayesian Network by computing sums of products of conditional probablities from the network.

Example

Suppose we have such a Bayesian Network

b1512cd7f41c44fc2aed0676e9f57cc

We use $B$ to represent the variable $Burglary$, $m$ and $j$ represent the known value of $JohnCalls$ and $MaryCalls$ (True or False). Now we want to calculate $\mathbf{P}(B\mid m, j)$

\[\begin{aligned} \mathbf{P}(B\mid m, j) &= \left\langle \frac{P(b, j, m)}{P(j, m)}, \frac{P(\neg b, j, m)}{P(j, m)} \right\rangle\\ &= \alpha \left\langle \underbrace{P(b, j, m)}_{\text{Expand}}, P(\neg b, j, m)\right\rangle \end{aligned}\]

Using Chain Rule, we can expand $P(b, j, m, E, A)$

\[\begin{aligned} P(b, j, m) &= \sum_E{\sum_A{P(b, j, m, E, A)}}\\ &= \sum_E{\sum_A{P(m \mid j, A, E, b)P(j \mid A, E, b)P(A \mid E, b)P(E\mid b)P(b)}} \end{aligned}\]

Note: For simplicity

$x\bot y$ is the shorten for $P(x)\bot P(y)$

$x\bot y \mid a$ is the shorten for $P(x\mid a) \bot P(y\mid a)$

$x\bot y, z \mid a$ is the shorten for $P(x\mid a)\bot P(y\mid a)$ and $P(x\mid a)\bot P(z\mid a)$

According to the structure of Bayesian Network, we can know that $m \bot j$, $m \bot b, E \mid A$, $j \bot b, E \mid A$ and $b\mid E$. Therefore, we can simplify the formula above

\[\begin{aligned} P(b, j, m) &= \sum_E{\sum_A{P(m \mid j, A, E, b)P(j \mid A, E, b)P(A \mid E, b)P(E\mid b)P(b)}}\\ &= \sum_E{\sum_A{P(m\mid A)P(j \mid A)P(A\mid E, b)P(E)P(b)}} \end{aligned}\]

To simplify this formula, we can “extract the common factor” out from nested sum$\sum$.

\[\begin{aligned} &P(b, j, m)\\ = &P(b)\sum_E\left({P(E)\sum_A\left({P(m \mid A)P(j\mid A)P(A\mid E, b)}\right)}\right)\\ = &P(b)\sum_E\left({P(E)\sum_A\left({\underbrace{P(m\mid Parents(m))P(j\mid Parents(j))P(A\mid Parents(A))}_{\text{Find these value in Bayesian Network}}}\right)}\right) \end{aligned}\]

The worst time complexity of query with enumeration is $O(n2^n)$

To formally describe the process of Inference by Enumeration, we can use two functions - $\text{ENUMERATION-ASK}$ and $\text{ENUMERATE-ALL}$.

e8e8e0482668ce9ff68fc7b628f4762

The $\times$ here represents the pointwise product between vectors instead of scalar product.

Here’s how to use $\text{ENUMERATION-ASK}$ and $\text{ENUMERATE-ALL}$ to evaluate $\text{ENUMERATION-ASK}(B, \lbrace j,m\rbrace, bn)$.

8918902a08f4f456312d98b252d9846

Some information is omitted here. You can find them in Chapter 14.4.2 - 14.4.4

14.5 Approximate Inference in Bayesian Networks

This section describes randomized sampling algorithms, also called Monte Carlo algorithms, that provide approximate answers whose accuracy depends on the number of samples generated.

14.5.1 Direct Sampling Methods

Prior Sample

Step 1. Use topological sort to sort all variables in the Bayesian Network.

Step 2. Assign a value to the first variable (the node with no Parents) randomly with probability distribution $\mathbf{P}(X_1)$.

Step 3. Assign a value to next variable with probability distribution $\mathbf{P}(X_2 \mid Parents(X_2))$

Step 4. Repeat Step 3 until all variables in Bayesian Network has an assignment

After step 4, we successfully construct a sample from Bayesian network. By repeating the sampling for many times, we can approximate the Probability distribution $\mathbf{P}(X\mid e)$.

Suppose $S_{PS}(x_1, \cdots, x_n)$ represents the probability of getting sample where $X_1 = x_1, \cdots, X_n=x_n$.

\[S_{PS}(x_1\cdots x_n) = \prod_{i=1}^n{P(x_i\mid Parents(X_i))}\]

Suppose we have take $N$ direct samples and among them, there are $N_{PS}(x_1\cdots x_n)$ sample where $X_1=x_1\cdots X_n=x_n$. The ration between $N$ and $N_{SP}$ will converge as $N$ approach $\infty$.

\[\lim_{N\rightarrow\infty}{\frac{N_{PS}(x_1\cdots x_n)}{N}} = S_{PS}(x_1, \cdots, x_n)=P(x_1, \cdots, x_n)\]

The estimated probability becomes exact in the large-sample limit. Such an estimate is called consistent.

\[P(x_1, \cdots, x_m)\approx N_{PS}(x_1, \cdots, x_m)/N\]

Rejection Sampling

Rejection sampling is a general method for producing samples from a hard-to-sample distribution given an easy-to-sample distribution. It can produce a consistent estimation of the true probability.

b00ce22da893459c32e8a44cc39f6e1

The biggest problem of rejection sampling is that it rejects too much samples! The number of samples being rejected increases exponentially as the number of evidence variable increases.


评论区