Analysis software

From Biowerkzeug Wiki
Jump to: navigation, search

Running simulations is often the easy bit. The hard bit is to extract meaningful information from the Gigabytes of trajectory data. This list can act as a starting point. For most advanced uses, however, one will probably have to write analysis code in python, Perl, tcl, C/C++, bash ... or any other language that "gets the job done".

"Native" tools

Many of the MD packages come with their own analysis tools or scripting language. Sometimes it is possible to convert data formats between packages and use the other package's analysis tools.

Gromacs analysis tools
oone of the strengths of Gromacs is that it comes with a large number of useful analysis tools that make many of the standard analysis tasks simple to perform
VMD can be used through its GUI or by scripting it in tcl to great effect
Charmm is feature-rich but its scripting language can cause a steep learning curve
LAMMPS/pizza is a python library geared towards output from LAMMPS
command-line based analysis

MD Analysis libraries

a python library to analyze a range of trajectories (e.g. DCD, XTC, TRR, XYZ) and single frames (PDB, GRO, CRD, PQR).
Another python-based framework for doing analysis is the Molecular Modelling Tool Kit. However, it does not natively read Charmm dcd files and hence it can be cumbersome to use.
The Lightweight Object-Oriented Structure library (LOOS) from Alan Grossfield's lab provides a lightweight C++ library for analysis of molecular dynamics simulations. This includes parsing a number of PDB variants, as well as the native system description and trajectory formats for CHARMM, NAMD, and Amber. LOOS is not intended to be an all-encompassing library and it is primarily geared towards reading data in and processing rather than manipulating the files and structures and writing them out.

Specialized tools

Oliver Smart's program to trace out pore surfaces and estimate single channel conductances.
CAVER provides rapid, accurate and fully automated calculation of pathways leading from buried cavities to outside solvent in static and dynamic protein structures. Calculated pathways can be visualized by graphic program PyMol dissecting anatomy and dynamics of entrance tunnels. CAVER allows analysis of any molecular structure including proteins, nucleic acids, inorganic materials, etc. CAVER is available as online version or PyMol plugin suitable for calculation of pathways in discrete protein structures and stand alone version enabling analysis of trajectories from the molecular dynamics simulations.
Definition of secondary structure of proteins given a set of 3D coordinates. The DSSP program defines secondary structure, geometrical features and solvent exposure of proteins, given atomic coordinates in Protein Data Bank format. The program does NOT PREDICT protein structure. According to the Science Citation Index (July 1995), the program has been cited in the scientific literature more than 1000 times.
Structural Alignment of Multiple Proteins. STAMP is a package for the alignment of protein sequence based on three-dimensional (3D) structure. It provides not only multiple alignments and the corresponding `best-fit' superimpositions, but also a systematic and reproducible method for assessing the quality of such alignments. It also provides a method for protein 3D structure data base scanning. In addition to structure comparison, the STAMP package provides input for programs to display and analyse protein sequence alignments and tertiary structures. Please note that, although STAMP outputs a sequence alignment, it is a program for 3D structures, and NOT sequences.
finds and calculates helix hinges. It optionally finds the hinge point and calculates kink and swivel angles.

General purpose mathematical packages

Scientific Python and pylab
a matlab-like python module that has sophisticated analysis and plotting capabilities
R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.