This section will explain what is needed to set up an MD simulation with Gromacs, starting with peptide coordinates in a PDB file.
The figures throughout this tutorial give an overview of the current position in the molecular dynamics workflow. They will expand by clicking on them.
It takes two ingredients for Gromacs to understand a molecular system: atom
coordinates and topology information. The topology file (typically ending in
*.top) contains information on how many molecules of which kind a system contains,
of what kind their interactions are and the parameters that are used to
compute the energy and forces.
Gromacs can generate topologies for you.
To define a topology, a force field needs to be specified.
A force field is a collection of well balanced parameters for the energy and force computation.
Several different families of force fields exist, differing in the way their parameters were derived. Within each familiy, often multiple versions exist, each of which includes corrections to the force field parameters, hopefully making the force field more accurate (within its domain of applicability). But newer versions do not necessarily perform better than their predecessors in all situations. Which force field is good for your purpose depends strongly on your system. For protein systems, the Amber and Charmm force fields are very often a good bet. If you are interested, you can have a look at this historical overview of force field evolution and a review of modern force fields. Protein force fields only contain parameters for standard amino acids, ions and sometimes for nucleic acids and some sugars. If your system contains ligands or unnatural amino acids you will need to find out how to parameterize molecules in accordance your specific choice of force field. In addition to the force field, a water model is also needed. Water is a tough molecule to simulate in classical MD simulations. This is due to the electronic structure of water that gives rise to complex interactions. Several different water models exist. A good overview of water models can be found on the web. It is usually a good idea to stick to a water model that is recommended for your choice of force field or a compatible one.
PDB Formatted Input
Download the file
kigaki.pdb from the
of this tutorial.
Open the file with VMD to have a look at the peptide.
Now generate a topology with the command below. Choose Amber99sb-ildn
as a force field together with the TIP3P water model. While the TIP4P-EW water
model performs better in combination with the Amber force fields, it consists of
four instead of three particles. To limit the simulation size we will use the TIP3P
water model here.
gmx pdb2gmx -f kigaki.pdb -o kigaki.gro -p kigaki.top -i kigaki_posre.itp -ignh
gmxand starts every command. It will help you to remember that
gmx help commandsshows you a list of all available commands with a short description for each of them, while
gmx help <command>gives a more detailed description of the tool whose name you substituded for
<command>. (Gromacs versions before 5.0 consisted of many programs that have now been merged into
gmx. For compatibility reasons it is therefore still possible to omit
gmxand directly start with e.g.
pdb2gmx, but this not guaranteed to work in the future). The command generated three files.
kigaki.grocontains residue and atom names and the coordinates of all atoms. It could in addition contain velocities for the atoms, but at this point we haven’t generated any yet. The second file is the topology file (
kigaki.top). Have a look at its contents using the
lesscommand. After some comments, this file first includes the parameters of the force field in use:
; Include forcefield parameters
#include) instructs Gromacs to substitute the contents of the specified file at that particular position in the topology file. The
forcefield.itpfile contains the force field parameters of the Amber99sb-ildn force field, which is contained in the Gromacs installation folder. The next section defines molecule types to make them known to Gromacs.
[ moleculetype ]
; Name nrexcl
The third file we have generated is
kigaki_posre.itp. It contains position restraints for all heavy atoms of the protein, i.e. harmonic potentials that couple the atoms to reference positions. This will be necessary during preparatory steps for the MD simulation to let the water equilibrate around the protein without the protein itself moving significantly, so it cannot get deformed by unequilibrated water molecules. After that, topologies for the water molecules and ions are included as well.
Finally the system is given a name and the number of each molecular species in the system is listed:
[ system ]
Protein in water
[ molecules ]
; Compound #mols
pdb2gmxto optimize hydrogen bonding networks.
We have now generated a topological description of our peptide.
Setting the Simulation Box
The next step is to create a box around our peptide that we can later fill with solvent. Gromacs can generate boxes of various shapes: triclinical cells, cubes, dodecahedra and octahedra. The triclinical cell has box vectors at right angles, but with different length, while the cubical box has in addition box vectors of equal length. The solute will be centerd in the simulation box with a minimum distance to any wall of 10 Ångstrom (1.0 nm). As the speed of the simulation decreases approximately linearly with the number of particles, we want the box to be as small as possible, but at the same time keep the solute away from its periodic images at all times. The triclinical cell does not work well, as its rectangular shape is problematic for non-globular proteins that might rotate, unless the center of mass rotation is taken care of. We will choose a dodecahedral cell, as it is close in shape to a sphere and minimizes the volume needed to keep the minimum solute to wall distance:
gmx editconf -f kigaki.gro -o kigaki_box.gro -bt dodecahedron -d 1.0
*.grofile. (You can navigate to the end of a text file in the
lessprogram with the
gbrings you back to the top) The numbers in this final line correspond to the box vector components (in nm) in the order:
v1(x) v2(y) v3(z) v1(y) v1(z) v2(x) v2(z) v3(x) v3(y). The latter 6 numbers may be omitted for rectangular boxes and Gromacs only supports boxes with
v1(y)=v1(z)=v2(z)=0. Generate some of the other box shapes too and have a look at them with VMD by loading the corresponding
*.grofile and typing
pbc boxin the VMD console. The dodecahedral and octahedral boxes appear not as you might have expected. The reason is that the boxes are always represented in a general, brick shaped unit cell representation for efficiency. Consult the section on periodic boundary conditions in the Gromacs manual to get an idea of how the different representations are transformed into each other.
Now the box can be filled with solvent molecules.
We have selected the TIP3P water model before, so we need to provide the coordinates of a three-particle water model.
As there are no TIP3P coordinates in the Amber force field directories, we will use the coordinates from
This file will be found automatically by Gromacs, without specifying a detailed path.
Note that we are still using the correct parameters of the TIP3P water model and only borrowing the coordinates from 216 water molecules,
which were produced with the SPC water model.
gmx solvate -cp kigaki_box.gro -cs spc216.gro -o kigaki_solvated.gro -p kigaki.top
It is also possible to add some ions in order to mimic simulation conditions with specified salt concentrations.
However, ions are only approximated as charged point particles and do not possess more detailed electronic structure,
which might be relevant for ion binding sites and transition metals ligated to proteins.
Though for NaCl the simple point charge description is usually sufficient.
We add enough ions to neutralize the system and add an additional concentration of 100 mM NaCl.
For doing so, we need an
*.mdp file that contains molecular dynamics parameters.
In general, an
*.mdp file contains information on the algorithms and options to use during
a simulation with Gromacs.
It is parsed by the Gromacs preprocessor (
grompp) and combined with initial coordinates
and velocities into a
*.tpr files contain the complete information that is needed for an energy minimization or
any kind of MD simulation with gromacs and is the only required input file for
Gromacs' computing engine
So why do we need an
*.mdp file at this step,
as we only want to add ions to our system?
Gromacs tries to insert the ions at optimal positions that minimize the energy of the system.
Therefore it needs to know which algorithms and parameters to use to compute the energy.
gmx grompp -f ions.mdp -po ions.out.mdp -c kigaki_solvated.gro -p kigaki.top -o ions.tpr -maxwarn 1
grompp. However, since the system will be neutralized at the next step, we can ignore this warning which is done with the flag
-maxwarn 1. The resulting
*.tprfile will be fed into a tool that acually generates the ions:
gmx genion -s ions.tpr -o kigaki_ions.gro -p kigaki.top -pname NA -nname CL -neutral -conc 0.1
SOLgroup, which contains all atoms of the solvent (water) molecules. The ions will then be substituted for water molecules at positions that minimize the overall electrostatic energy.
- Have a look at the resulting system in VMD. The ions can be made visible with the selection keyword ions, using a VdW representation.