HHybrid Monte Carlo method for conformational sampling of proteins
Overview
Protein: The Machinery of Life
Protein Structure
Why protein folding? Huge gap: sequence data and 3D structure data - EMBL/GENBANK, DNA (nucleotide) sequences 15 million sequence, 15,000 million base pairs
- SWISSPROT, protein sequences 120,000 entries
- PDB, 3D protein structures 20,000 entries
Bridging the gap through prediction - Aim of structural genomics:
- “Structurally characterize most of the protein sequences by an efficient combination of experiment and prediction,” Baker and Sali (2001)
- Thermodynamics hypothesis: Native state is at the global free energy minimum Anfinsen (1973)
Questions related to folding I Long time kinetics: dynamics of folding - only statistical correctness possible
- ensemble dynamics
- e.g., folding@home
- strong correctness possible
- e.g., transport properties, diffusion coefficients
Questions related to folding II Sampling - Compute equilibrium averages by visiting all (most) of “important” conformations
- Examples:
- Equilibrium distribution of solvent molecules in vacancies
- Free energies
- Characteristic conformations (misfolded and folded states)
Overview
Classical molecular dynamics Newton’s equations of motion: Atoms Molecules CHARMM force field (Chemistry at Harvard Molecular Mechanics)
MD, MC, and HMC in sampling Molecular Dynamics takes long steps in phase space, but it may get trapped Monte Carlo makes a random walk (short steps), it may escape minima due to randomness Can we combine these two methods?
We can sample from a distribution with density p(x) by simulating a Markov chain with the following transitions: - From the current state, x, a candidate state x’ is drawn from a proposal distribution S(x,x’). The proposed state is accepted with prob. min[1,(p(x’) S(x’,x)) / (p(x) S(x,x’))]
- If the proposal distribution is symmetric, S(x’,x)) = S(x,x’)), then the acceptance prob. only depends on p(x’) / p(x)
Hybrid Monte Carlo II Proposal functions must be reversible: if x’ = s(x), then x = s(x’) Proposal functions must preserve volume Jacobian must have absolute value one Valid proposal: x’ = -x Invalid proposals: - x’ = 1 / x (Jacobian not 1)
- x’ = x + 5 (not reversible)
Hybrid Monte Carlo III Hamiltonian dynamics preserve volume in phase space Hamiltonian dynamics conserve the Hamiltonian H(q,p) Reversible symplectic integrators for Hamiltonian systems preserve volume in phase space Conservation of the Hamiltonian depends on the accuracy of the integrator Hybrid Monte Carlo: Use reversible symplectic integrator for MD to generate the next proposal in MC
HMC Algorithm Perform the following steps: 2. Perform cyclelength steps of MD, using a symplectic reversible integrator with timestep t, generating (q’,p’) 3. Compute change in total energy 4. Accept new state based on exp(- H )
Hybrid Monte Carlo IV Advantages of HMC: HMC can propose and accept distant points in phase space, provided the accuracy of the MD integrator is high enough HMC can move in a biased way, rather than in a random walk (distance k vs sqrt(k))
Hybrid Monte Carlo V As the number of atoms increases, the total error in the H(q,p) increases. The error is related to the time step used in MD Analysis of N replicas of multivariate Gaussian distributions shows that HMC takes O(N5/4 ) with time step t = O(N-1/4) Kennedy & Pendleton, 91
Hybrid Monte Carlo VI The key problem in scaling is the accuracy of the MD integrator More accurate methods could help scaling Creutz and Gocksch 89 proposed higher order symplectic methods for HMC In MD, however, these methods are more expensive than the scaling gain. They need more force evaluations per step
Overview
Evaluating MC methods I Is method sampling from desired distribution? - Does it preserve detailed balance?
- Use simple model systems that can be solved analytically. Compare to analytical results or well known solution methods. Examples, Lennard-Jones liquid, butane
- Is it ergodic?
- Impossible to prove for realistic problems. Instead, show self-averaging of properties
Evaluating MC methods II Is system equilibrated? - Average values of set of properties fluctuate around mean value
- Convergence to steady state from
- Different initial conditions
- Different pseudo random number generators
Are statistical errors small? - Run should be about 10 times longer than slowest relaxation in system
- Estimate statistical errors by independent block averaging
- Compute properties
- Vary system sizes
What are the sampling rates?
Improved HMC Symplectic integrators conserve exactly (within roundoff error) a modified Hamiltonian that for short MD simulations (such as in HMC) stays close to the true Hamiltonian Sanz-Serna & Calvo 94 Our idea is to use highly accurate approximations to the modified Hamiltonian in order to improve the scaling of HMC
Shadow Hamiltonian Work by Skeel and Hardy, 2001, shows how to compute an arbitrarily accurate approximation to the modified Hamiltonian, called the Shadow Hamiltonian Hamiltonian: H=1/2pTM-1p + U(q) Modified Hamiltonian: HM = H + O(t p) Shadow Hamiltonian: SH2p = HM + O(t 2p) - Arbitrary accuracy
- Easy to compute
- Stable energy graph
Example, SH4 = H – f( qn-1, qn-2, pn-1, pn-2 ,βn-1 ,βn-2)
See comparison of SHADOW and ENERGY
Shadow HMC Replace total energy H with shadow energy - SH2m = SH2m (q’,p’) – SH2m (q,p)
Nearly linear scalability of sampling rate Extra storage (m copies of q and p) Moderate overhead (25% for small proteins)
Example Shadow Hamiltonian
ProtoMol: a framework for MD
SHMC implementation Shadow Hamiltonian requires propagation of β Can work for any integrator
Systems tested
Sampling Metric 1 Generate a plot of dihedral angle vs. energy for each angle Find local maxima Label ‘bins’ between maxima For each dihedral angle, print the label of the energy bin that it is currently in
Sampling Metric 2 Round each dihedral angle to the nearest degree
Acceptance Rates
More Acceptance Rates
Sampling rate for decalanine (dt = 2 fs)
Sampling rate for 2mlt
Sampling rate comparison Cost per conformation is total simulation time divided by number of new conformations discovered (2mlt, dt = 0.5 fs) - HMC 122 s/conformation
- SHMC 16 s/conformation
- HMC discovered 270 conformations in 33000 seconds
- SHMC discovered 2340 conformations in 38000 seconds
Conclusions SHMC has a much higher acceptance rate, particularly as system size and timestep increase SHMC discovers new conformations more quickly SHMC requires extra storage and moderate overhead. SHMC works best at relatively large timesteps
Future work Multiscale problems for rugged energy surface - Multiple time stepping algorithms plus constraining
- Temperature tempering and multicanonical ensemble
- Potential smoothing
System size - Parallel Multigrid O(N) electrostatics
Applications
Acknowledgments Dr. Thierry Matthey, co-developer of ProtoMol, University of Bergen, Norway Graduate students: Qun Ma, Alice Ko, Yao Wang, Trevor Cickovski Students in CSE 598K, “Computational Biology,” Spring 2002 Dr. Robert Skeel, Dr. Ruhong Zhou, and Dr. Christoph Schutte for valuable discussions Dr. Radford Neal’s presentation “Markov Chain Sampling Using Hamiltonian Dynamics” (http://www.cs.utoronto.ca ) Dr. Klaus Schulten’s presentation “An introduction to molecular dynamics simulations” (http://www.ks.uiuc.edu ) Dr. Edward Maginn’s “Monte Carlo Primer”
Do'stlaringiz bilan baham: |