M4.D2: Introduction to molecular simulations

M4.D2: Introduction to molecular simulations#

Learning Objectives#

Explain the need for molecular dynamics software packages to run more accurate simulations. List all the variables that need to be taken into account when running an atomistic simulation.
Run a simulation of a water liquid. Briefly describe the forcefield, the type of intermolecular interactions, algorithm for the equations of motion, time step, length of the simulation.
Plot properties of the simulation through time. Identify the equilibration period in a molecular dynamics simulation.
Calculate the heat capacity of the liquid
Use MDAnalysis to analyze a molecular dynamics trajectory

Identifying the main charactersitics of molecular simulations#

In order to run an accurate molecular dynamics simulation there are list of technical decisions that need to be made. Using a molecular dynamics software package helps mainstream those decisions.

In class we will briefly talk about:

The Molecular Mechanics Force Field: https://en.wikipedia.org/wiki/Force_field_(chemistry)
Long-range interactions: https://en.wikibooks.org/wiki/Molecular_Simulation/Treatment_of_Long_Range_Forces
The equations of motion: integrators https://en.wikipedia.org/wiki/Molecular_dynamics#Molecular_dynamics_algorithms
Thermostats and barostats

Running a molecular dynamics simulation of water#

Optional: Watch the video on Canvas showing step by step how to run the simulation of a cube of waters

Analyzing a molecular dynamics simulation of water#

The files water.out, water-processed.pdb, and water.dcd are created after you run the simulation and they will be in the folder that you have indicated. These files will be easier to read if you select the same folder as where you have this notebook, if not, you may just need to copy them to the current folder.

It is important that your simulation paramaters include:

Write your results every 10 steps
At least 50 ps of simulation, that’s 50000 steps of 1fs timestep.
Your output should include temperature, KE, PE and Total E.

import pandas as pd

data = pd.read_csv("water_box/water.out",sep="\t")
display(data)
#using iloc I can select the 3rd column
myPE = data.iloc[:,2]
myKE = data.iloc[:,3]
myT = data.iloc[:,5]
myDens = data.iloc[:,7]

	#"Step"	Time (ps)	Potential Energy (kJ/mole)	Kinetic Energy (kJ/mole)	Total Energy (kJ/mole)	Temperature (K)	Box Volume (nm^3)	Density (g/mL)
0	10	0.11	-15852.863671	1946.993188	-13905.870483	176.001094	9.353182	0.946724
1	20	0.12	-15597.068406	1758.497530	-13838.570876	158.961773	9.353182	0.946724
2	30	0.13	-15719.284506	1757.676208	-13961.608297	158.887529	9.338105	0.948253
3	40	0.14	-15856.482367	2010.354291	-13846.128076	181.728707	9.338105	0.948253
4	50	0.15	-15583.168041	1850.758898	-13732.409143	167.301865	9.268756	0.955348
...	...	...	...	...	...	...	...	...
4995	49960	50.06	-14399.581223	3315.126929	-11084.454295	299.675402	8.586259	1.031285
4996	49970	50.07	-14448.337692	3320.166359	-11128.171332	300.130949	8.586259	1.031285
4997	49980	50.08	-14448.884486	3343.899860	-11104.984626	302.276371	8.556562	1.034865
4998	49990	50.09	-14419.638159	3334.846615	-11084.791544	301.457990	8.648801	1.023828
4999	50000	50.10	-14345.658959	3278.600705	-11067.058254	296.373565	8.568750	1.033393

5000 rows × 8 columns

import matplotlib.pyplot as plt

plt.plot(myPE)
plt.plot(myPE[200:])

[<matplotlib.lines.Line2D at 0x13f6c0140>]

_images/ac54a0de454d5b491634322112dd399aa93320bbf29562f40d22995b2f5946f8.png

plt.plot(myDens)
plt.plot(myDens[200:])

[<matplotlib.lines.Line2D at 0x13f72bb90>]

_images/f1d093eff9ef6635fcabe1f6122a0379eca5a62f8c6e23b9634af7edcb3fbae0.png

plt.plot(myT)
plt.plot(myT[200:])

[<matplotlib.lines.Line2D at 0x12f204740>]

_images/3a0a3f89eff51466eb3bcfee4cbbd2d7e55413d078a6c43d51ec5168de444efb.png

Your turn

In a similar fashion as above, plot the kinetic energy, the total energy of the system, and the temperature
Why is the kinetic energy always positive and the potential always negative?
At what portion of the simulation are your properties stable enough that allow you to start “measuring”?

The heat capacity can also be computed from the simulation using the total energy average of squares.

\[ <E>^2 - <E^2> \]

Build a loop to calculate the heat capacity (we won’t be discussing its units for now)

totE = data.iloc[:,4]

aveE = 0
aveE2 = 0
for e in totE:
    e = float(e)
    aveE += e
    aveE2 += e**2

totSteps = len(totE)

averageE_squared = (aveE/totSteps)**2
averageSquared_E = (aveE2/totSteps)

print(averageSquared_E-averageE_squared)

70213.67867276073

You can use MDanalysis to read the trajectory. Here below is an example where we monitor the distance between two oxygens in two different waters. How can you explain the big jumps in distance?

import MDAnalysis as mda
import numpy as np

u = mda.Universe('water_box/water-processed.pdb','water_box/water.dcd')

dist = []
for snap in u.trajectory:
    #selecting the central atom
    myO1 = u.select_atoms("name O and segid 2").center_of_geometry()
    myO2 = u.select_atoms("name O and resid 2 and segid 3").center_of_geometry()
    #selecting
    myDist = np.linalg.norm(myO1-myO2)
    dist.append(myDist)

plt.plot(dist);

_images/ede17469d6469905dfa14ad5a151c626410c223ab6b44e114486d40d431a08b9.png

Fun exercise: Using raw python (no extra modules) open the initial pdb file, read line by line, and find the water that is closest to first water and follow its distance throughout the trajectory.

Questions#

Submit the answers to the questions listed above