Computational Genomics Lecture 10 Hidden Markov Models (hmms) Outline Finite, or Discrete, Markov Models

Download 501 b.
Hajmi501 b.

Computational Genomics Lecture 10 Hidden Markov Models (HMMs)


  • Finite, or Discrete, Markov Models

  • Hidden Markov Models

  • Three major questions:

  • Q1.: Computing the probability of a given observation.

    • A1.: Forward – Backward (Baum Welch) dynamic programming algorithm.
  • Q2.: Computing the most probable sequence, given

  • an observation.

    • A2.: Viterbi’s dynamic programming Algorithm
  • Q3.: Learn best model, given an observation,.

    • A3.: Expectation Maximization (EM): A Heuristic.

Markov Models

  • A discrete (finite) system:

    • N distinct states.
    • Begins (at time t=1) in some initial state(s).
    • At each time step (t=1,2,…) the system moves
    • from current to next state (possibly the same as
    • the current state) according to transition
    • probabilities associated with current state.
  • This kind of system is called a finite, or discrete Markov model

  • After Andrei Andreyevich Markov (1856 -1922)


  • Markov Chains (Markov Models)

  • Hidden Markov Chains (HMMs)

  • Algorithmic Questions

  • Biological Relevance

Discrete Markov Model: Example

  • Discrete Markov Model with 5 states.

  • Each aij represents the probability of moving from state i to state j

  • The aij are given in a matrix A = {aij}

  • The probability to start in a given state i is i , The vector repre-sents these startprobabilities.

Markov Property

Markov Chains

Simple Minded Weather Example

Simple Minded Weather Example

Coke vs. Pepsi (a cental cultural dilemma)

Coke vs. Pepsi

Coke vs. Pepsi

Coke vs. Pepsi

Equilibrium (Stationary) Distribution

  • Suppose 60% of all people now drink Coke, and 40% drink Pepsi. What fraction will be drinking Coke 10,100,1000,10000 … weeks from now?

  • For each week, probability is well defined. But does it converge to some equilibrium distribution [p0,p1]?

  • If it does, then eqs. : .9p0+.2p1 =p0, .8p1+.1p0 =p1

  • must hold, yielding p0= 2/3, p1=1/3 .

Equilibrium (Stationary) Distribution

  • Whether or not there is a stationary distribution, and

  • whether or not it is unique if it does exist, are determined

  • by certain properties of the process. Irreducible means that

  • every state is accessible from every other state. Aperiodic

  • means that there exists at least one state for which the

  • transition from that state to itself is possible. Positive

  • recurrent means that the expected return time is finite for

  • every state.

Equilibrium (Stationary) Distribution

  • If the Markov chain is positive recurrent, there exists a stationary distribution. If it is positive recurrent and irreducible, there exists a unique stationary distribution, and furthermore the process constructed by taking the stationary distribution as the initial distribution is ergodic. Then the average of a function f over samples of the Markov chain is equal to the average with respect to the stationary distribution,

Equilibrium (Stationary) Distribution

  • Writing P for the transition matrix, a stationary distribution is a vector π which satisfies the equation

    • Pπ = π .
  • In this case, the stationary distribution π is an eigenvector of the transition matrix, associated with the eigenvalue 1.

Discrete Markov Model - Example

  • States – Rainy:1, Cloudy:2, Sunny:3

  • Matrix A –

  • Problem – given that the weather on day 1 (t=1) is sunny(3), what is the probability for the observation O:

Discrete Markov Model – Example (cont.)

  • The answer is -

Types of Models

  • Ergodic model

    • Strongly connected - directed
    • path w/ positive probabilities
    • from each state i to state j
    • (but not necessarily complete
    • directed graph)

Third Example: A Friendly Gambler

Fourth Example: A Friendly Gambler

Let Us Change Gear

  • Enough with these simple Markov chains.

  • Our next destination: Hidden Markov chains.

Hidden Markov Models (probabilistic finite state automata)

    • Often we face scenarios where states cannot be
    • directly observed.
    • We need an extension: Hidden Markov Models

Hidden Markov Models - HMM

Example: Dishonest Casino

Coin-Tossing Example

Loaded Coin Example

HMMs – Question I

  • Given an observation sequence O = (O1 O2 O3 … OL), and a model M = {A, B, }how do we efficiently compute P(O|M), the probability that the given model M produces the observation O in a run of length L ?

  • This probability can be viewed as a measure of the

  • quality of the model M. Viewed this way, it enables discrimination/selection among alternative models.

Coin-Tossing Example

C-G Islands Example

Example: CpG islands

  • In human genome, CG dinucleotides are relatively rare

    • CG pairs undergo a process called methylation that modifies the C nucleotide
    • A methylated C mutate (with relatively high chance) to a T
  • Promotor regions are CG rich

    • These regions are not methylated, and thus mutate less often
    • These are called CpG islands

CpG Islands

  • We construct Markov chain for CpG rich and poor regions

  • Using maximum likelihood estimates from 60K nucleotide, we get two models

Ratio Test for CpC islands

Empirical Evalation

Finding CpG islands

  • Simple Minded approach:

  • Pick a window of size N (N = 100, for example)

  • Compute log-ratio for the sequence in the window, and classify based on that

  • Problems:

  • How do we select N?

  • What do we do when the window intersects the boundary of a CpG island?

Alternative Approach

  • Build a model that include “+” states and “-” states

  • A state “remembers” last nucleotide and the type of region

  • A transition from a - state to a + describes a start of CpG island

A Different C-G Islands Model

HMM Recognition (question I)

  • For a given model M = { A, B, p} and a given state sequence Q1 Q2 Q3 … QL ,, the probability of an observation sequence O1 O2 O3 … OL is

  • P(O|Q,M) = bQ1O1 bQ2O2 bQ3O3 … bQTOT

  • For a given hidden Markov model M = { A, B, p}

  • the probability of the state sequence Q1 Q2 Q3 … QL

  • is (the initial probability of Q1 is taken to be pQ1)

  • P(Q|M) = pQ1 aQ1Q2 aQ2Q3 aQ3Q4 … aQL-1QL

  • So, for a given HMM, M

  • the probability of an observation sequence O1O2O3 … OT

  • is obtained by summing over all possible state sequences

HMM – Recognition (cont.)

  • P(O| M) = P(O|Q) P(Q|M)

  • = Q Q1 bQ1O1 aQ1Q2 bQ2O2 aQ2Q3 bQ2O2 …

  • Requires summing over exponentially many paths

  • Can this be made more efficient?

HMM – Recognition (cont.)

  • Why isn’t it efficient? – O(2LQL)

    • For a given state sequence of length L we have about 2L calculations
      • P(Q|M) = Q1 aQ1Q2 aQ2Q3 aQ3Q4 … aQT-1QT
      • P(O|Q) = bQ1O1 bQ2O2 bQ3O3 … bQTOT
    • There are QL possible state sequence
    • So, if Q=5, and L=100, then the algorithm requires 200x5100 computations
    • We can use the forward-backward (F-B) algorithm to do things efficiently

The Forward Backward Algorithm

  • A white board presentation.

The F-B Algorithm (cont.)

  • Option 1) The likelihood is measured using any sequence of states of length T

    • This is known as the “Any Path” Method
  • Option 2) We can choose an HMM by the probability generated using the best possible sequence of states

    • We’ll refer to this method as the “Best Path” Method

HMM – Question II (Harder)

  • Given an observation sequence, O = (O1 O2 … OT), and a model, M = {A, B, p }, how do we efficiently compute the most probable sequence(s) of states, Q ?

  • Namely the sequence of states Q = (Q1 Q2 … QT) , which maximizes P(O|Q,M), the probability that the given model M produces the given observation O when it goes through the specific sequence of states Q .

  • Recall that given a model M, a sequence of observations O, and a sequence of states Q, we can efficiently compute P(O|Q,M) (should watch out for numeric underflows)

Most Probable States Sequence (Q. II)

  • Idea:

  • If we know the identity of Qi , then the most probable sequence on i+1,…,n does not depend on observations before time i

  • A white board presentation of Viterbi’s algorithm

Dishonest Casino (again)

  • Computing posterior probabilities for “fair” at each point in a long sequence:

HMM – Question III (Hardest)

  • Given an observation sequence O = (O1 O2 … OL), and a class of models, each of the form M = {A,B,p}, which specific model “best” explains the observations?

  • A solution to question I enables the efficient computation of P(O|M) (the probability that a specific model M produces the observation O).

  • Question III can be viewed as a learning problem: We want to use the sequence of observations in order to “train” an HMM and learn the optimal underlying model parameters (transition and output probabilities).


  • Given a sequence x1,…,xn, h1,…,hn

  • How do we learn Akl and Bka ?

  • We want to find parameters that maximize the likelihood P(x1,…,xn, h1,…,hn)

  • We simply count:

  • Nkl - number of times hi=k & hi+1=l

  • Nka - number of times hi=k & xi = a


  • Given only sequence x1,…,xn

  • How do we learn Akl and Bka ?

  • We want to find parameters that maximize the likelihood P(x1,…,xn)

  • Problem:

  • Counts are inaccessible since we do not observe hi

  • If we have Akl and Bka we can compute

Expected Counts

  • We can compute expected number of times hi=k & hi+1=l

  • Similarly

Expectation Maximization (EM)

  • Choose Akl and Bka

  • E-step:

  • Compute expected counts E[Nkl], E[Nka]

  • M-Step:

  • Restimate:

  • Reiterate

EM - basic properties

  • P(x1,…,xn: Akl, Bka)  P(x1,…,xn: A’kl, B’ka)

    • Likelihood grows in each iteration
  • If P(x1,…,xn: Akl, Bka) = P(x1,…,xn: A’kl, B’ka) then Akl, Bka is a stationary point of the likelihood

    • either a local maxima, minima, or saddle point

Complexity of E-step

  • Compute forward and backward messages

    • Time & Space complexity: O(nL)
  • Accumulate expected counts

EM - problems

  • Local Maxima:

  • Learning can get stuck in local maxima

  • Sensitive to initialization

  • Require some method for escaping such maxima

  • Choosing L

  • We often do not know how many hidden values we should have or can learn

Communication Example

Do'stlaringiz bilan baham:

Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan © 2019
ma'muriyatiga murojaat qiling