Machine Learning: Directed Graphical Models (Bayes Nets)

Introduction
Suppose we observe multiple correlated variables (such as words in a document or genes in a microarray).

How can we represent the joint distribution?
How can we use the distribution to infer one set of variables given another in a reasonable amount of computation time?
How can we learn the parameters of this distribution with a reasonable amount of data?

Chain Rule

The problem with this expression is that it becomes more and more complicated to represent the conditional distributions. If we use conditional probability tables, to represent each probability we need:

O(k) for P(X1)
O(K^2) for P(X2| X1)
.....
O(K^V) for P(Xv | X1, X2, ... Xv-1 )

Conditional Independence
Markov Assumption: Let us assume that "the future is independent of the past given the present) or:

Xt+1 is independent of Xt-1 given Xt

Thus the joint distribution will now look like:

Graphical Models
A graphical model is a way to represent a joint distribution by making conditional independence assumptions.
The nodes represent random variables.
lack of edges represent conditional independence.

Directed Graphical Models
Known as Bayesian networks, belief networks or causal networks.
Ordered Markov Property: the assumption that a node only depends on its immediate parents and not on all predecessors in the ordering. Thus:

Thus in general we will have:

Markov and hidden Markov Models

first order Markov chain : current state only depends on immediate past

second order Markov chain: current state depends on two levels from the past

We could create higher order Markov chains but the number of parameters will blow up and so we assume that there is an underlying hidden process, modeled by a first-order Markov chain. This is called Hidden Markov Model.

First order Hidden Markov Model

Zt is hidden variable at "time" t. They represent quantities of interest, such as the identity of the word that someone is currently speaking.
Xt is observed variable at "time" t. They are what we measure, such as the acoustic waveform.
P(Zt | Zt-1) --> transition model
P( Xt | Zt ) --> observarion model
P( Zt | X1:t) --> state estimation

Genetic Linkage Analysis
for every species i and location or locus j along the genome, we create 3 nodes:
The observed marker Xi,j : This can be a property such as blood type, or a fragment of DNA
Two hidden alleles,

Machine Learning

Tuesday, 27 May 2014

Directed Graphical Models (Bayes Nets)

No comments:

Post a Comment

Blog Archive