HHybrid Monte Carlo method for conformational sampling of proteins


Download 505 b.
Sana04.01.2018
Hajmi505 b.
#23726


HHybrid Monte Carlo method for conformational sampling of proteins


Overview



Protein: The Machinery of Life



Protein Structure



Why protein folding?

  • Huge gap: sequence data and 3D structure data

    • EMBL/GENBANK, DNA (nucleotide) sequences 15 million sequence, 15,000 million base pairs
    • SWISSPROT, protein sequences 120,000 entries
    • PDB, 3D protein structures 20,000 entries
  • Bridging the gap through prediction

    • Aim of structural genomics:
      • “Structurally characterize most of the protein sequences by an efficient combination of experiment and prediction,” Baker and Sali (2001)
    • Thermodynamics hypothesis: Native state is at the global free energy minimum Anfinsen (1973)


Questions related to folding I

  • Long time kinetics: dynamics of folding

    • only statistical correctness possible
    • ensemble dynamics
    • e.g., folding@home
  • Short time kinetics

    • strong correctness possible
    • e.g., transport properties, diffusion coefficients


Questions related to folding II

  • Sampling

    • Compute equilibrium averages by visiting all (most) of “important” conformations
    • Examples:
      • Equilibrium distribution of solvent molecules in vacancies
      • Free energies
      • Characteristic conformations (misfolded and folded states)


Overview



Classical molecular dynamics

  • Newton’s equations of motion:

  • Atoms

  • Molecules

  • CHARMM force field (Chemistry at Harvard Molecular Mechanics)











MD, MC, and HMC in sampling

  • Molecular Dynamics takes long steps in phase space, but it may get trapped

  • Monte Carlo makes a random walk (short steps), it may escape minima due to randomness

  • Can we combine these two methods?



Hybrid Monte Carlo

  • We can sample from a distribution with density p(x) by simulating a Markov chain with the following transitions:

    • From the current state, x, a candidate state x’ is drawn from a proposal distribution S(x,x’). The proposed state is accepted with prob. min[1,(p(x’) S(x’,x)) / (p(x) S(x,x’))]
    • If the proposal distribution is symmetric, S(x’,x)) = S(x,x’)), then the acceptance prob. only depends on p(x’) / p(x)


Hybrid Monte Carlo II

  • Proposal functions must be reversible:

  • if x’ = s(x), then x = s(x’)

  • Proposal functions must preserve volume

  • Jacobian must have absolute value one

  • Valid proposal: x’ = -x

  • Invalid proposals:

    • x’ = 1 / x (Jacobian not 1)
    • x’ = x + 5 (not reversible)


Hybrid Monte Carlo III

  • Hamiltonian dynamics preserve volume in phase space

  • Hamiltonian dynamics conserve the Hamiltonian H(q,p)

  • Reversible symplectic integrators for Hamiltonian systems preserve volume in phase space

  • Conservation of the Hamiltonian depends on the accuracy of the integrator

  • Hybrid Monte Carlo: Use reversible symplectic integrator for MD to generate the next proposal in MC



HMC Algorithm

  • Perform the following steps:

  • 1. Draw random values for the momenta p from normal distribution; use given positions q

  • 2. Perform cyclelength steps of MD, using a symplectic reversible integrator with timestep t, generating (q’,p’)

  • 3. Compute change in total energy

    • H = H(q’,p’) - H(q,p)
  • 4. Accept new state based on exp(- H )



Hybrid Monte Carlo IV

  • Advantages of HMC:

  • HMC can propose and accept distant points in phase space, provided the accuracy of the MD integrator is high enough

  • HMC can move in a biased way, rather than in a random walk (distance k vs sqrt(k))

  • HMC can quickly change the probability density



Hybrid Monte Carlo V

  • As the number of atoms increases, the total error in the H(q,p) increases. The error is related to the time step used in MD

  • Analysis of N replicas of multivariate Gaussian distributions shows that HMC takes O(N5/4 ) with time step t = O(N-1/4) Kennedy & Pendleton, 91



Hybrid Monte Carlo VI

  • The key problem in scaling is the accuracy of the MD integrator

  • More accurate methods could help scaling

  • Creutz and Gocksch 89 proposed higher order symplectic methods for HMC

  • In MD, however, these methods are more expensive than the scaling gain. They need more force evaluations per step



Overview



Evaluating MC methods I

  • Is method sampling from desired distribution?

    • Does it preserve detailed balance?
      • Use simple model systems that can be solved analytically. Compare to analytical results or well known solution methods. Examples, Lennard-Jones liquid, butane
    • Is it ergodic?
      • Impossible to prove for realistic problems. Instead, show self-averaging of properties


Evaluating MC methods II

  • Is system equilibrated?

    • Average values of set of properties fluctuate around mean value
    • Convergence to steady state from
      • Different initial conditions
      • Different pseudo random number generators
  • Are statistical errors small?

    • Run should be about 10 times longer than slowest relaxation in system
    • Estimate statistical errors by independent block averaging
    • Compute properties
    • Vary system sizes
  • What are the sampling rates?



Improved HMC

  • Symplectic integrators conserve exactly (within roundoff error) a modified Hamiltonian that for short MD simulations (such as in HMC) stays close to the true Hamiltonian Sanz-Serna & Calvo 94

  • Our idea is to use highly accurate approximations to the modified Hamiltonian in order to improve the scaling of HMC



Shadow Hamiltonian

  • Work by Skeel and Hardy, 2001, shows how to compute an arbitrarily accurate approximation to the modified Hamiltonian, called the Shadow Hamiltonian

  • Hamiltonian: H=1/2pTM-1p + U(q)

  • Modified Hamiltonian: HM = H + O(t p)

  • Shadow Hamiltonian: SH2p = HM + O(t 2p)

    • Arbitrary accuracy
    • Easy to compute
    • Stable energy graph
  • Example, SH4 = H – f( qn-1, qn-2, pn-1, pn-2 ,βn-1 ,βn-2)



  • See comparison of SHADOW and ENERGY



Shadow HMC

  • Replace total energy H with shadow energy

    • SH2m = SH2m (q’,p’) – SH2m (q,p)
  • Nearly linear scalability of sampling rate

  • Computational cost SHMC, N(1+1/2m), where m is accuracy order of integrator

  • Extra storage (m copies of q and p)

  • Moderate overhead (25% for small proteins)



Example Shadow Hamiltonian



ProtoMol: a framework for MD



SHMC implementation

  • Shadow Hamiltonian requires propagation of β

  • Can work for any integrator



Systems tested



Sampling Metric 1

  • Generate a plot of dihedral angle vs. energy for each angle

  • Find local maxima

  • Label ‘bins’ between maxima

  • For each dihedral angle, print the label of the energy bin that it is currently in



Sampling Metric 2



Acceptance Rates



More Acceptance Rates



Sampling rate for decalanine (dt = 2 fs)



Sampling rate for 2mlt



Sampling rate comparison

  • Cost per conformation is total simulation time divided by number of new conformations discovered (2mlt, dt = 0.5 fs)

    • HMC 122 s/conformation
    • SHMC 16 s/conformation
    • HMC discovered 270 conformations in 33000 seconds
    • SHMC discovered 2340 conformations in 38000 seconds


Conclusions

  • SHMC has a much higher acceptance rate, particularly as system size and timestep increase

  • SHMC discovers new conformations more quickly

  • SHMC requires extra storage and moderate overhead.

  • SHMC works best at relatively large timesteps



Future work

  • Multiscale problems for rugged energy surface

    • Multiple time stepping algorithms plus constraining
    • Temperature tempering and multicanonical ensemble
    • Potential smoothing
  • System size

    • Parallel Multigrid O(N) electrostatics
  • Applications



Acknowledgments

  • Dr. Thierry Matthey, co-developer of ProtoMol, University of Bergen, Norway

  • Graduate students: Qun Ma, Alice Ko, Yao Wang, Trevor Cickovski

  • Students in CSE 598K, “Computational Biology,” Spring 2002

  • Dr. Robert Skeel, Dr. Ruhong Zhou, and Dr. Christoph Schutte for valuable discussions

  • Dr. Radford Neal’s presentation “Markov Chain Sampling Using Hamiltonian Dynamics” (http://www.cs.utoronto.ca )

  • Dr. Klaus Schulten’s presentation “An introduction to molecular dynamics simulations” (http://www.ks.uiuc.edu )

  • Dr. Edward Maginn’s “Monte Carlo Primer”



Download 505 b.

Do'stlaringiz bilan baham:




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling