Commit 9f149d4e by peastman

Merge pull request #13 from rmcgibbo/package

Package up PDBFixer
parents 111fd4c0 4c104519
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
# C extensions
*.so
# Distribution / packaging
.Python
env/
bin/
build/
develop-eggs/
dist/
eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.cache
nosetests.xml
coverage.xml
# Translations
*.mo
# Mr Developer
.mr.developer.cfg
.project
.pydevproject
# Rope
.ropeproject
# Django stuff:
*.log
*.pot
# Sphinx documentation
docs/_build/
......@@ -23,14 +23,25 @@ Protein Data Bank (PDB) files often have a number of problems that must be fixed
PDBFixer can fix all of these problems for you in a fully automated way. You simply select a file, tell it which problems to fix, and it does everything else.
<p>
PDBFixer can be used in three different ways: as a desktop application with a graphical user interface; as a command line application; or as a Python API. This allows you to use it in whatever way best matches your own needs for flexibility, ease of use, and scriptability. The following sections describe how to use it in each of these ways.
<h1>2. Installation</h2>
To install PDBFixer, navigate to the root directory of the source distribution you've download and type
<p>
<tt>python setup.py install</tt>
<p>
Before running PDBFixer, you must first install <a href="https://simtk.org/home/openmm">OpenMM</a> 5.2 or later. Follow the installation instructions in the OpenMM manual. It is also highly recommended that you install CUDA or OpenCL. In principle PDBFixer can use the OpenMM reference platform, but it will be prohibitively slow. Finally, PDBFixer requires that <a href="http://www.numpy.org">NumPy</a> be installed.
This will install the PDBFixer python package as well as the command line program <tt>pdbfixer</tt>.
<p>
Before running PDBFixer, you must first install <a href="https://simtk.org/home/openmm">OpenMM</a> 5.2 or later. Follow the installation instructions in the OpenMM manual. It is also highly recommended that you install CUDA or OpenCL. In principle PDBFixer can use the OpenMM reference platform, but it will be prohibitively slow. PDBFixer requires that <a href="http://www.numpy.org">NumPy</a> be installed.
<h1>2. PDBFixer as a Desktop Application</h1>
<h1>3. PDBFixer as a Desktop Application</h1>
To run PDBFixer as a desktop application, type
<p>
<tt>python pdbfixer.py</tt>
<tt>pdbfixer</tt>
<p>
on the command line. PDBFixer displays its user interface through a web browser, but it is still a single user desktop application. It should automatically launch a web browser and open a new window displaying the user interface. If for any reason this does not happen, you can launch a web browser yourself and point it to <a href="http://localhost:8000">http://localhost:8000</a>.
<p>
......@@ -70,18 +81,18 @@ This page gives you the chance to make several other optional changes:
You're all done! Click "Save File" to save the processed PDB file to disk. Then click "Process Another File" if you have more files to process, or "Quit" (in the top right corner of the page) if you are finished.
<h1>3. PDBFixer as a Command Line Application</h1>
<h1>4. PDBFixer as a Command Line Application</h1>
PDBFixer provides a simple command line interface that is especially useful if you want to script it to process many files at once. This interface is significantly less flexible than either the desktop interface or the Python API, but it is still powerful enough for many purposes.
<p>
To get usage instructions for the command line interface, type
<p>
<tt>python pdbfixer.py --help</tt>
<tt>pdbfixer --help</tt>
<p>
This displays the following information:
<tt><pre>
Usage: pdbfixer.py
pdbfixer.py [options] filename
Usage: pdbfixer
pdbfixer [options] filename
When run with no arguments, it launches the user interface. If any arguments are specified, it runs in command line mode.
......@@ -110,11 +121,11 @@ Options:
</pre></tt>
For example, consider the following command line:
<p>
<tt>python pdbfixer.py --keep-heterogens=water --replace-nonstandard --water-box=4.0 4.0 3.0 myfile.pdb</tt>
<tt>pdbfixer --keep-heterogens=water --replace-nonstandard --water-box=4.0 4.0 3.0 myfile.pdb</tt>
<p>
This will load the file "myfile.pdb". It will add any missing atoms to existing residues, but not add any missing residues (because we did not specify <tt>--add-residues</tt>). Hydrogens will be added that are appropriate at pH 7.0 (the default). If the file contains any nonstandard amino acids or nucleotides, they will be replaced with the closest equivalent standard ones. Any water molecules present in the file will be kept, but all other heterogens will be deleted. Finally a water box of size 4 by 4 by 3 nanometers will be added surrounding the structure. If necessary, counterions will be added to neutralize it (Na+ or Cl-), but no other ions will be added (because we accepted the default ionic strength of 0.0). After making all these changes, the result will be written to a file called "output.pdb".
<h1>4. PDBFixer as a Python API</h1>
<h1>5. PDBFixer as a Python API</h1>
This is the most powerful way to use PDBFixer. It allows you to script the processing of PDB files while maintaining precise programmatic control of every part of the process.
<p>
......
......@@ -29,6 +29,7 @@ DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
USE OR OTHER DEALINGS IN THE SOFTWARE.
"""
from __future__ import print_function
__author__ = "Peter Eastman"
__version__ = "1.0"
......@@ -42,11 +43,11 @@ angleK = 10.0
# Create the new force field file.
print '<ForceField>'
print('<ForceField>')
# Print the atom types, while identifying types and classes to omit.
print ' <AtomTypes>'
print(' <AtomTypes>')
omitTypes = set()
omitClasses = set()
for atomType in forcefield._atomTypes:
......@@ -55,94 +56,94 @@ for atomType in forcefield._atomTypes:
omitTypes.add(atomType)
omitClasses.add(atomClass)
else:
print ' <Type name="%s" class="%s" element="%s" mass="%g"/>' % (atomType, atomClass, element.symbol, mass)
print ' </AtomTypes>'
print(' <Type name="%s" class="%s" element="%s" mass="%g"/>' % (atomType, atomClass, element.symbol, mass))
print(' </AtomTypes>')
# Print the residue templates.
print ' <Residues>'
for template in forcefield._templates.itervalues():
print ' <Residue name="%s">' % template.name
print(' <Residues>')
for template in forcefield._templates.values():
print(' <Residue name="%s">' % template.name)
atomIndex = {}
for i, atom in enumerate(template.atoms):
if atom.type not in omitTypes:
print ' <Atom name="%s" type="%s"/>' % (atom.name, atom.type)
print(' <Atom name="%s" type="%s"/>' % (atom.name, atom.type))
atomIndex[i] = len(atomIndex)
for (a1, a2) in template.bonds:
if a1 in atomIndex and a2 in atomIndex:
print ' <Bond from="%d" to="%d"/>' % (atomIndex[a1], atomIndex[a2])
print(' <Bond from="%d" to="%d"/>' % (atomIndex[a1], atomIndex[a2]))
for atom in template.externalBonds:
if atom in atomIndex:
print ' <ExternalBond from="%d"/>' % atomIndex[atom]
print ' </Residue>'
print ' </Residues>'
print(' <ExternalBond from="%d"/>' % atomIndex[atom])
print(' </Residue>')
print(' </Residues>')
# Print the harmonic bonds.
print ' <HarmonicBondForce>'
print(' <HarmonicBondForce>')
bonds = [f for f in forcefield._forces if isinstance(f, ff.HarmonicBondGenerator)][0]
for i in range(len(bonds.types1)):
type1 = iter(bonds.types1[i]).next()
type2 = iter(bonds.types2[i]).next()
type1 = next(iter(bonds.types1[i]))
type2 = next(iter(bonds.types2[i]))
if type1 not in omitTypes and type2 not in omitTypes:
class1 = forcefield._atomTypes[type1][0]
class2 = forcefield._atomTypes[type2][0]
print ' <Bond class1="%s" class2="%s" length="%g" k="%g"/>' % (class1, class2, bonds.length[i], bondK)
print ' </HarmonicBondForce>'
print(' <Bond class1="%s" class2="%s" length="%g" k="%g"/>' % (class1, class2, bonds.length[i], bondK))
print(' </HarmonicBondForce>')
# Print the harmonic angles.
print ' <HarmonicAngleForce>'
print(' <HarmonicAngleForce>')
angles = [f for f in forcefield._forces if isinstance(f, ff.HarmonicAngleGenerator)][0]
for i in range(len(angles.types1)):
type1 = iter(angles.types1[i]).next()
type2 = iter(angles.types2[i]).next()
type3 = iter(angles.types3[i]).next()
type1 = next(iter(angles.types1[i]))
type2 = next(iter(angles.types2[i]))
type3 = next(iter(angles.types3[i]))
if type1 not in omitTypes and type2 not in omitTypes and type3 not in omitTypes:
class1 = forcefield._atomTypes[type1][0]
class2 = forcefield._atomTypes[type2][0]
class3 = forcefield._atomTypes[type3][0]
print ' <Angle class1="%s" class2="%s" class3="%s" angle="%g" k="%g"/>' % (class1, class2, class3, angles.angle[i], angleK)
print ' </HarmonicAngleForce>'
print(' <Angle class1="%s" class2="%s" class3="%s" angle="%g" k="%g"/>' % (class1, class2, class3, angles.angle[i], angleK))
print(' </HarmonicAngleForce>')
# Print the periodic torsions.
print ' <PeriodicTorsionForce>'
print(' <PeriodicTorsionForce>')
torsions = [f for f in forcefield._forces if isinstance(f, ff.PeriodicTorsionGenerator)][0]
for torsion in torsions.proper:
type1 = iter(torsion.types1).next()
type2 = iter(torsion.types2).next()
type3 = iter(torsion.types3).next()
type4= iter(torsion.types4).next()
type1 = next(iter(torsion.types1))
type2 = next(iter(torsion.types2))
type3 = next(iter(torsion.types3))
type4= next(iter(torsion.types4))
if type1 not in omitTypes and type2 not in omitTypes and type3 not in omitTypes and type4 not in omitTypes:
class1 = forcefield._atomTypes[type1][0]
class2 = forcefield._atomTypes[type2][0]
class3 = forcefield._atomTypes[type3][0]
class4 = forcefield._atomTypes[type4][0]
print ' <Proper class1="%s" class2="%s" class3="%s" class4="%s"' % (class1, class2, class3, class4),
print(' <Proper class1="%s" class2="%s" class3="%s" class4="%s"' % (class1, class2, class3, class4), end=' ')
for i in range(len(torsion.k)):
print ' periodicity%d="%d" phase%d="%g" k%d="%g"' % (i+1, torsion.periodicity[i], i+1, torsion.phase[i], i+1, torsion.k[i]),
print '/>'
print(' periodicity%d="%d" phase%d="%g" k%d="%g"' % (i+1, torsion.periodicity[i], i+1, torsion.phase[i], i+1, torsion.k[i]), end=' ')
print('/>')
for torsion in torsions.improper:
type1 = iter(torsion.types1).next()
type2 = iter(torsion.types2).next()
type3 = iter(torsion.types3).next()
type4= iter(torsion.types4).next()
type1 = next(iter(torsion.types1))
type2 = next(iter(torsion.types2))
type3 = next(iter(torsion.types3))
type4= next(iter(torsion.types4))
if type1 not in omitTypes and type2 not in omitTypes and type3 not in omitTypes and type4 not in omitTypes:
class1 = forcefield._atomTypes[type1][0]
class2 = forcefield._atomTypes[type2][0]
class3 = forcefield._atomTypes[type3][0]
class4 = forcefield._atomTypes[type4][0]
print ' <Improper class1="%s" class2="%s" class3="%s" class4="%s"' % (class1, class2, class3, class4),
print(' <Improper class1="%s" class2="%s" class3="%s" class4="%s"' % (class1, class2, class3, class4), end=' ')
for i in range(len(torsion.k)):
print ' periodicity%d="%d" phase%d="%g" k%d="%g"' % (i+1, torsion.periodicity[i], i+1, torsion.phase[i], i+1, torsion.k[i]),
print '/>'
print ' </PeriodicTorsionForce>'
print(' periodicity%d="%d" phase%d="%g" k%d="%g"' % (i+1, torsion.periodicity[i], i+1, torsion.phase[i], i+1, torsion.k[i]), end=' ')
print('/>')
print(' </PeriodicTorsionForce>')
# Print the script to add the soft-core nonbonded force.
print ' <Script>'
print """import simtk.openmm as mm
print(' <Script>')
print("""import simtk.openmm as mm
nb = mm.CustomNonbondedForce('C/((r/0.2)^4+1)')
nb.addGlobalParameter('C', 1.0)
sys.addForce(nb)
......@@ -154,7 +155,7 @@ for bond in data.bonds:
for angle in data.angles:
exclusions.add((min(angle[0], angle[2]), max(angle[0], angle[2])))
for a1, a2 in exclusions:
nb.addExclusion(a1, a2)"""
print ' </Script>'
nb.addExclusion(a1, a2)""")
print(' </Script>')
print '</ForceField>'
\ No newline at end of file
print('</ForceField>')
\ No newline at end of file
......@@ -626,7 +626,7 @@ class PDBFixer(object):
return nearest
if __name__=='__main__':
def main():
if len(sys.argv) < 2:
# Display the UI.
......@@ -674,3 +674,6 @@ if __name__=='__main__':
if options.box is not None:
fixer.addSolvent(options.box*unit.nanometer, options.positiveIon, options.negativeIon, options.ionic*unit.molar)
app.PDBFile.writeFile(fixer.topology, fixer.positions, open(options.output, 'w'))
if __name__ == '__main__':
main()
\ No newline at end of file
......@@ -6,6 +6,7 @@ import uiserver
import webbrowser
import os.path
import gzip
import time
from io import BytesIO
try:
from urllib.request import urlopen
......@@ -204,4 +205,15 @@ def launchUI():
uiserver.beginServing()
uiserver.setCallback(controlsCallback, "/controls")
displayStartPage()
webbrowser.open('http://localhost:'+str(uiserver.server.server_address[1]))
url = 'http://localhost:'+str(uiserver.server.server_address[1])
print("PDBFixer running: %s " % url)
webbrowser.open(url)
# the uiserver is running in a background daemon thread that dies whenever
# the main thread exits. So, to keep the whole process alive, we just sleep
# here in the main thread. When Control-C is called, the main thread shuts
# down and then the uiserver exits. Without this daemon/sleep combo, the
# process cannot be killed with Control-C. Reference stack overflow link:
# http://stackoverflow.com/a/11816038/1079728
while True:
time.sleep(0.5)
......@@ -69,7 +69,9 @@ callback = {}
server = _ThreadingHTTPServer(("localhost", 8000), _Handler)
def beginServing():
Thread(target=server.serve_forever).start()
t = Thread(target=server.serve_forever)
t.daemon = True
t.start()
def setContent(newContent):
global content
......
"""pdbfixer: Fixes problems in PDB files
Protein Data Bank (PDB) files often have a number of problems that must be
fixed before they can be used in a molecular dynamics simulation. The details
vary depending on how the file was generated. Here are some of the most common
ones:
- If the structure was generated by X-ray crystallography, most or all of the
- hydrogen atoms will usually be missing.
- There may also be missing heavy atoms in flexible regions that could not be
clearly resolved from the electron density. This may include anything from a
few atoms at the end of a sidechain to entire loops.
- Many PDB files are also missing terminal atoms that should be present at the
ends of chains.
- The file may include nonstandard residues that were added for crystallography
purposes, but are not present in the naturally occurring molecule you want to
simulate.
- The file may include more than what you want to simulate. For example, there
may be salts, ligands, or other molecules that were added for experimental
purposes. Or the crystallographic unit cell may contain multiple copies of a
protein, but you only want to simulate a single copy.
- There may be multiple locations listed for some atoms.
- If you want to simulate the structure in explicit solvent, you will need to
add a water box surrounding it.
PDBFixer can fix all of these problems for you in a fully automated way. You
simply select a file, tell it which problems to fix, and it does everything else.
"""
from __future__ import print_function
import os
import sys
from os.path import relpath, join
from setuptools import setup, find_packages
DOCLINES = __doc__.split("\n")
########################
__version__ = '1.0'
VERSION = __version__
ISRELEASED = False
########################
CLASSIFIERS = """\
Development Status :: 3 - Alpha
Intended Audience :: Science/Research
Intended Audience :: Developers
License :: OSI Approved :: MIT License
Programming Language :: Python
Programming Language :: Python :: 3
Topic :: Scientific/Engineering :: Bio-Informatics
Topic :: Scientific/Engineering :: Chemistry
Operating System :: Microsoft :: Windows
Operating System :: POSIX
Operating System :: Unix
Operating System :: MacOS
"""
def find_package_data():
files = []
for root, dirnames, filenames in os.walk('pdbfixer'):
for fn in filenames:
files.append(relpath(join(root, fn), 'pdbfixer'))
return files
def check_dependencies():
from distutils.version import StrictVersion
found_openmm = True
found_openmm_52_or_later = True
found_numpy = True
try:
from simtk import openmm
openmm_version = StrictVersion(openmm.Platform.getOpenMMVersion())
if openmm_version < StrictVersion('5.2'):
found_openmm_52_or_later = False
except ImportError as err:
found_openmm = False
try:
import numpy
except:
found_numpy = False
msg = None
bar = ('-' * 70) + "\n" + ('-' * 70)
if found_openmm:
if not found_openmm_52_or_later:
msg = [bar, '[Unmet Dependency] PDBFixer requires OpenMM version 5.2 or later. You have version %s.' % openmm_version, bar]
else:
msg = [bar, '[Unmet Dependency] PDBFixer requires the OpenMM python package. Refer to <http://openmm.org> for details and installation instructions.', bar]
if not found_numpy:
msg = [bar, '[Unmet Dependency] PDBFixer requires the numpy python package. Refer to <http://www.scipy.org/scipylib/download.html> for numpy installation instructions.', bar]
if msg is not None:
import textwrap
print()
print(os.linesep.join([line for e in msg for line in textwrap.wrap(e)]), file=sys.stderr)
#print('\n'.join(list(textwrap.wrap(e) for e in msg)))
setup(
name='pdbfixer',
author='Peter Eastman',
description=DOCLINES[0],
long_description="\n".join(DOCLINES[2:]),
version=__version__,
license='MIT',
url='https://github.com/peastman/pdbfixer',
platforms=['Linux', 'Mac OS-X', 'Unix', 'Windows'],
classifiers=CLASSIFIERS.splitlines(),
packages=find_packages(),
package_data={'pdbfixer': find_package_data()},
zip_safe=False,
entry_points={'console_scripts': ['pdbfixer = pdbfixer.pdbfixer:main']})
check_dependencies()
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment