Support nonstandard residues (#296)

* Can add templates for new residues * Added documentation on templates * Bug fixes

Support nonstandard residues (#296)
* Can add templates for new residues * Added documentation on templates * Bug fixes
92044f3c · Peter Eastman · GitHub · 5cf78a2b · 92044f3c · 92044f3c
Commit 92044f3c authored Jul 16, 2024 by Peter Eastman Committed by GitHub Jul 16, 2024
Showing with 147 additions and 33 deletions

Manual.html
+29 -0

pdbfixer/pdbfixer.py
+0 -0

pdbfixer/tests/data/CAS.pdb
+45 -0

pdbfixer/tests/test_add_hydrogens.py
+23 -5

pdbfixer/tests/test_build_and_simulate.py
+15 -15

pdbfixer/tests/test_mutate.py
+35 -13

No files found.
--- a/Manual.html
+++ b/Manual.html
@@ -281,5 +281,34 @@ fixer.nonstandardResidues = [(residue, 'ALA') for residue, replacement in fixer.
 <span style="color:grey">fixer.replaceNonstandardResidues()</span>
 </pre></tt>

+<h3>Nonstandard Residues</h3>
+
+PDBFixer knows how to build all the standard amino acids and nucleotides.  If you want to apply it to other residues, you need to tell it how.  This is done by defining templates.
+
+A template is a description of a residue: what atoms it contains, how they are bonded to each other, and what conformation they take on.  For residues defined in the PDB Chemical Component Dictionary, you can simply tell PDBFixer to download the definition of whatever residue you need.  For example, if you want to model a protein containing a selenocysteine (PDB code SEC), call
+
+<tt><pre>
+fixer.downloadTemplate('SEC')
+</pre></tt>
+
+Now you can call other methods such as <tt>findMissingAtoms()</tt>, <tt>addMissingAtoms()</tt>, and <tt>addMissingHydrogens()</tt> in the usual way.  PDBFixer will fix problems in the SEC residue exactly as it does for standard ones.
+<p/>
+If your structure contains molecules or residues that are not present in the PDB Chemical Component Dictionary, you can still add templates for them, but you need to define them yourself.  This is done by calling <tt>registerTemplate()</tt>.  It takes two required arguments: an OpenMM Topology object describing the structure of the residue, and a list of positions describing a typical conformation of the residue.  These can be easily obtained from, for example, a PDB or PDBx/mmCIF file containing just the single residue.
+
+<tt><pre>
+pdbx = PDBxFile('MOL.cif')
+fixer.registerTemplate(pdbx.topology, pdbx.positions)
+</pre></tt>
+
+For residues that appear as part of larger molecules rather than in isolation, you need to provide one more piece of information: which atoms should only be present in terminal residues.  For example, a C-terminal amino acid is capped with OXT and HXT atoms, but these atoms are omitted when the residue appears in the middle of a chain and is bonded to another residue.  Likewise, the nitrogen in an N-terminal amino acid has two hydrogens attached to it (H and H2), but when it appears in the middle of a chain and the nitrogen is bonded to another residue, the H2 hydrogen is omitted.  You can pass a third argument containing a list of boolean flags for each atom, specifying which ones should be omitted from non-terminal residues.
+
+<tt><pre>
+pdbx = PDBxFile('RES.cif')
+terminal = [atom.name in ('H2', 'OXT', 'HXT') for atom in pdbx.topology.atoms()]
+fixer.registerTemplate(pdbx.topology, pdbx.positions, terminal)
+</pre></tt>
+
+Strictly speaking, it is the bonds that matter, not the position in the chain.  For example, the sulfur in a CYS residue has a hydrogen that must be omitted when it forms a disulfide bond to another residue.  What matters is whether the sulfur has a bond connecting it to another residue, not the position of the CYS residue in its chain.
+
 </body>
 </html>
--- a/pdbfixer/pdbfixer.py
+++ b/pdbfixer/pdbfixer.py
--- a/pdbfixer/tests/data/CAS.pdb
+++ b/pdbfixer/tests/data/CAS.pdb
+ATOM      1  N   CAS A   1      -1.377  -1.201  -2.417  1.00 10.00           N
+ATOM      2  CA  CAS A   1      -0.664   0.081  -2.349  1.00 10.00           C
+ATOM      3  CB  CAS A   1       0.084   0.179  -1.018  1.00 10.00           C
+ATOM      4  C   CAS A   1       0.319   0.169  -3.487  1.00 10.00           C
+ATOM      5  O   CAS A   1       0.824  -0.836  -3.928  1.00 10.00           O
+ATOM      6  OXT CAS A   1       0.635   1.364  -4.010  1.00 10.00           O
+ATOM      7  SG  CAS A   1      -1.100   0.073   0.350  1.00 10.00           S
+ATOM      8 AS   CAS A   1       0.299   0.246   2.090  1.00 10.00          AS
+ATOM      9  CE1 CAS A   1      -0.737   0.161   3.787  1.00 10.00           C
+ATOM     10  CE2 CAS A   1       1.610  -1.249   2.038  1.00 10.00           C
+ATOM     11  H   CAS A   1      -0.678  -1.925  -2.343  1.00 10.00           H
+ATOM     12  H2  CAS A   1      -1.945  -1.261  -1.585  1.00 10.00           H
+ATOM     13  HA  CAS A   1      -1.380   0.899  -2.424  1.00 10.00           H
+ATOM     14  HB2 CAS A   1       0.613   1.131  -0.969  1.00 10.00           H
+ATOM     15  HB3 CAS A   1       0.800  -0.638  -0.943  1.00 10.00           H
+ATOM     16  HXT CAS A   1       1.267   1.420  -4.740  1.00 10.00           H
+ATOM     17 HE11 CAS A   1      -0.056   0.245   4.633  1.00 10.00           H
+ATOM     18 HE12 CAS A   1      -1.455   0.980   3.815  1.00 10.00           H
+ATOM     19 HE13 CAS A   1      -1.268  -0.789   3.841  1.00 10.00           H
+ATOM     20 HE21 CAS A   1       2.178  -1.202   1.109  1.00 10.00           H
+ATOM     21 HE22 CAS A   1       2.291  -1.165   2.884  1.00 10.00           H
+ATOM     22 HE23 CAS A   1       1.079  -2.200   2.092  1.00 10.00           H
+CONECT    1    2   11   12
+CONECT    2    1    3    4   13
+CONECT    3    2    7   14   15
+CONECT    4    2    5    6
+CONECT    5    4
+CONECT    6    4   16
+CONECT    7    3    8
+CONECT    8    7    9   10
+CONECT    9    8   17   18   19
+CONECT   10    8   20   21   22
+CONECT   11    1
+CONECT   12    1
+CONECT   13    2
+CONECT   14    3
+CONECT   15    3
+CONECT   16    6
+CONECT   17    9
+CONECT   18    9
+CONECT   19    9
+CONECT   20   10
+CONECT   21   10
+CONECT   22   10
+END
--- a/pdbfixer/tests/test_add_hydrogens.py
+++ b/pdbfixer/tests/test_add_hydrogens.py
 import pdbfixer
+import openmm.app as app
 from pathlib import Path
-from io import StringIO

 def test_nonstandard():
    """Test adding hydrogens to nonstandard residues."""
-    content = (Path(__file__).parent / "data" / "4JSV.pdb").read_text()
-    fixer = pdbfixer.PDBFixer(pdbfile=StringIO(content))
+    fixer = pdbfixer.PDBFixer(filename=(Path(__file__).parent / "data" / "4JSV.pdb").as_posix())
    fixer.removeChains(chainIndices=[0, 1, 2])
    fixer.addMissingHydrogens()
    for residue in fixer.topology.residues():
@@ -17,10 +16,29 @@ def test_nonstandard():

 def test_leaving_atoms():
    """Test adding hydrogens to a nonstandard residue with leaving atoms."""
-    content = (Path(__file__).parent / "data" / "1BHL.pdb").read_text()
-    fixer = pdbfixer.PDBFixer(pdbfile=StringIO(content))
+    fixer = pdbfixer.PDBFixer(filename=(Path(__file__).parent / "data" / "1BHL.pdb").as_posix())
    fixer.addMissingHydrogens()
    for residue in fixer.topology.residues():
        count = sum(1 for atom in residue.atoms() if atom.element.symbol == 'H')
        if residue.name == 'CAS':
            assert count == 10
+
+def test_registered_template():
+    """Test adding hydrogens based on a template registered by the user."""
+    fixer = pdbfixer.PDBFixer(filename=(Path(__file__).parent / "data" / "1BHL.pdb").as_posix())
+
+    # Register a template for CAS from which a single hydrogen has been removed.
+
+    pdb = app.PDBFile((Path(__file__).parent / "data" / "CAS.pdb").as_posix())
+    modeller = app.Modeller(pdb.topology, pdb.positions)
+    modeller.delete([list(modeller.topology.atoms())[-1]])
+    terminal = [atom.name in ('H2', 'OXT', 'HXT') for atom in modeller.topology.atoms()]
+    fixer.registerTemplate(modeller.topology, modeller.positions, terminal)
+
+    # See if the correct hydrogens get added.
+
+    fixer.addMissingHydrogens()
+    for residue in fixer.topology.residues():
+        count = sum(1 for atom in residue.atoms() if atom.element.symbol == 'H')
+        if residue.name == 'CAS':
+            assert count == 9
--- a/pdbfixer/tests/test_build_and_simulate.py
+++ b/pdbfixer/tests/test_build_and_simulate.py

 from __future__ import print_function
-import pdbfixer
-import openmm
+from pdbfixer.pdbfixer import PDBFixer
+from openmm import app

 import os
 import os.path
@@ -97,7 +97,7 @@ def test_build_and_simulate():
        ]

    # DEBUG: A few small test cases.
-    pdbcodes_to_build = ['110D', '116D', '117D', '118D', '134D', '135D', '136D', '138D', '143D', '148D', '151D', '152D', '159D', '177D', '17RA', '183D', '184D', '186D', '187D', '188D', '189D', '1A11', '1A13', '1A1P', '1A3P', '1A51', '1A60', '1A83', '1A9L', '1AAF', '1AB1', '1ABZ', '1AC7', '1ACW', '1AD7', '1ADX', '1AFP', '1AFT', '1AFX', '1AG7', '1AGG', '1AGL', '1AGT', '1AHL', '1AIE', '1AJ1', '1AJF', '1AJJ', '1AJU', '1AKG', '1AKX', '1AL1', '1ALE', '1ALF', '1ALG', '1AM0', '1AMB', '1AMC', '1AML', '1ANP', '1ANR', '1ANS', '1AOO']
+    pdbcodes_to_build = ['110D', '116D', '117D', '118D', '134D', '135D', '136D', '138D', '143D', '148D', '151D', '152D', '177D', '17RA', '183D', '184D', '186D', '187D', '188D', '189D', '1A11', '1A13', '1A1P', '1A3P', '1A51', '1A60', '1A83', '1A9L', '1AAF', '1AB1', '1ABZ', '1AC7', '1ACW', '1AD7', '1ADX', '1AFP', '1AFT', '1AFX', '1AG7', '1AGG', '1AGL', '1AGT', '1AHL', '1AIE', '1AJ1', '1AJF', '1AJJ', '1AKG', '1AL1', '1ALE', '1ALF', '1ALG', '1AM0', '1AMB', '1AMC', '1AML', '1ANP', '1ANR', '1ANS', '1AOO', '1BH7', '1BX8', '1CEK']

    # Don't simulate any.
    pdbcodes_to_simulate = []
@@ -105,20 +105,12 @@ def test_build_and_simulate():
    # Keep track of list of failures.
    failures = list()
        
+    forcefield = app.ForceField('amber14-all.xml', 'amber14/tip3pfb.xml')
    for pdbcode in pdbcodes_to_build:
        print("------------------------------------------------")
        print(pdbcode)

-        output_pdb_filename = 'output.pdb'
-
-        # PDB setup parameters.
-        # TODO: Try several combinations?
-        from openmm import unit
        pH = 7.0
-        ionic = 50.0 * unit.millimolar
-        box = 10.0 * unit.angstrom
-        positiveIon = 'Na+'
-        negativeIon = 'Cl-'

        outfile = tempfile.NamedTemporaryFile(mode='w', delete=False)
        output_pdb_filename = outfile.name
@@ -126,11 +118,17 @@ def test_build_and_simulate():
        timeout_seconds = 30
        watchdog = Watchdog(timeout_seconds)
        build_successful = False
-        try:        
-            from pdbfixer.pdbfixer import PDBFixer
-            from openmm import app
+        try:
            stage = "Creating PDBFixer..."
            fixer = PDBFixer(pdbid=pdbcode)
+            stage = "Deleting hydrogens..."
+            if pdbcode in ['135D', '136D', '177D', '1A83', '1AGG', '1AJ1']:
+                # These input files include extra hydrogens that aren't supported by the force field.
+                # To avoid problems, delete all pre-existing hydrogens.
+                modeller = app.Modeller(fixer.topology, fixer.positions)
+                modeller.delete([a for a in fixer.topology.atoms() if a.element == app.element.hydrogen])
+                fixer.topology = modeller.topology
+                fixer.positions = modeller.positions
            stage = "Finding missing residues..."
            fixer.findMissingResidues()
            stage = "Finding nonstandard residues..."
@@ -147,6 +145,8 @@ def test_build_and_simulate():
            fixer.addMissingHydrogens(pH)
            stage = "Writing PDB file..."
            app.PDBFile.writeFile(fixer.topology, fixer.positions, outfile)
+            stage = "Create System..."
+            forcefield.createSystem(fixer.topology)
            stage = "Done."
            outfile.close()
            build_successful = True

--- a/pdbfixer/tests/test_mutate.py
+++ b/pdbfixer/tests/test_mutate.py
 import openmm.app as app
 import pdbfixer
-import tempfile
 from pytest import raises
+from pathlib import Path

 def test_mutate_1():
    fixer = pdbfixer.PDBFixer(pdbid='1VII')
@@ -10,11 +10,7 @@ def test_mutate_1():
    fixer.findMissingAtoms()
    fixer.addMissingAtoms()        
    fixer.addMissingHydrogens(7.0)
-    with tempfile.NamedTemporaryFile(mode='w+') as temp_pdb:
-        app.PDBFile.writeFile(fixer.topology, fixer.positions, temp_pdb)
-        temp_pdb.flush()
-        pdb = app.PDBFile(temp_pdb.name)
-    
+
    new_residue57 = list(fixer.topology.residues())[16]
    assert new_residue57.name == "GLY", "Name of mutated residue did not change correctly!"
    assert len(list(new_residue57.atoms())) == 7, "Should have 7 atoms in GLY 56"
@@ -31,7 +27,6 @@ def test_mutate_2():
    fixer.findMissingAtoms()
    fixer.addMissingAtoms()        
    fixer.addMissingHydrogens(7.0)
-    temp_pdb = tempfile.NamedTemporaryFile(mode='w+')
    new_residue57 = list(fixer.topology.residues())[16]
    new_residue56 = list(fixer.topology.residues())[15]
    assert new_residue57.name == "LEU", "Name of mutated residue did not change correctly!"
@@ -56,15 +51,42 @@ def test_mutate_3_fails():
 def test_mutate_4_fails():
    with raises(KeyError):
        fixer = pdbfixer.PDBFixer(pdbid='1VII')
-        fixer.applyMutations(["ALA-57-WTF", "SER-56-ALA"], "A")
-
-
-def test_mutate_5_fails():
-    with raises(KeyError):
-        fixer = pdbfixer.PDBFixer(pdbid='1VII')
        fixer.applyMutations(["ALA-1000-GLY", "SER-56-ALA"], "A")

 def test_mutate_multiple_copies_of_chain_A():
    fixer = pdbfixer.PDBFixer(pdbid='1OL5')
    fixer.applyMutations(['TPO-287-THR','TPO-288-THR'], "A")

+def test_mutate_to_nonstandard():
+    """Test mutating to a nonstandard residue defined with registerTemplate()."""
+    fixer = pdbfixer.PDBFixer(filename=(Path(__file__).parent / "data" / "1BHL.pdb").as_posix())
+    pdb = app.PDBFile((Path(__file__).parent / "data" / "CAS.pdb").as_posix())
+    terminal = [atom.name in ('H2', 'OXT', 'HXT') for atom in pdb.topology.atoms()]
+    fixer.registerTemplate(pdb.topology, pdb.positions, terminal)
+    fixer.applyMutations(["SER-57-CAS", "ILE-60-CAS", "ASP-207-CAS"], "A")
+    fixer.findMissingResidues()
+    fixer.findMissingAtoms()
+    fixer.addMissingAtoms()
+    fixer.addMissingHydrogens(7.0)
+    residues = list(fixer.topology.residues())
+    for i in (0, 3, 134):
+        assert residues[i].name == "CAS"
+        atoms = list(residues[i].atoms())
+        assert sum(1 for a in atoms if a.name == 'H2') == (1 if i == 0 else 0)
+        assert sum(1 for a in atoms if a.name == 'OXT') == (1 if i == 134 else 0)
+        assert sum(1 for a in atoms if a.name == 'HXT') == (1 if i == 134 else 0)
+
+def test_download_template():
+    """Test mutating to a nonstandard residue defined in the PDB."""
+    fixer = pdbfixer.PDBFixer(filename=(Path(__file__).parent / "data" / "1BHL.pdb").as_posix())
+    fixer.applyMutations(["SER-57-SEP", "ILE-60-SEP", "ASP-207-SEP"], "A")
+    fixer.findMissingResidues()
+    fixer.findMissingAtoms()
+    fixer.addMissingAtoms()
+    fixer.addMissingHydrogens(7.0)
+    residues = list(fixer.topology.residues())
+    for i in (0, 3, 134):
+        assert residues[i].name == "SEP"
+        atoms = list(residues[i].atoms())
+        assert sum(1 for a in atoms if a.element == app.element.phosphorus) == 1
+        assert sum(1 for a in atoms if a.name == 'OXT') == (1 if i == 134 else 0)