Commit 822438b5 by rjoosten

Added a description of the DSSP output and some references.

parent fad274bd
...@@ -15,32 +15,22 @@ ...@@ -15,32 +15,22 @@
<nav z2:replace="~{menu :: navbar('about')}" /> <nav z2:replace="~{menu :: navbar('about')}" />
<div class="container site-content"> <div class="container site-content">
<article>
<h2>General</h2>
<!-- <p>This is the DSSP web server. Before using it, please read the <a z2:href="@{/privacy-policy}">privacy policy</a>.</p> -->
</article>
<article> <article>
<h2>References</h2> <h2>References</h2>
<!-- TODO: fill in --> <p>If you use the DSSP software or databank please cite the appropriate paper:</p>
<p>bla bla.</p>
<div class="programListTable"> <div class="programListTable">
<table style="margin-left: 0;"> <table style="margin-left: 0;">
<tr> <tr>
<td><a href="https://doi.org/10.1107/S2052252514009324" target="_BLANK">Web&nbsp;server</a></td> <td><a href="https://doi.org/10.1093/nar/gkq1105" target="_BLANK">Current version</a></td>
<td>Joosten RP, Long F, Murshudov GN, Perrakis A. The PDB_REDO server for macromolecular <td>Joosten RP, te Beek TAH, Krieger E, Hekkelman ML, Hooft RWW, Schneider R, Sander C, Vriend
structure model optimization. IUCrJ. 2014; 1(4):213-220. A series of PDB related databases for everyday needs. Nuc. Acids Res. 2010; 39:D411-D419.</td>
</td>
</tr> </tr>
<tr> <tr>
<td><a href="https://doi.org/10.1002/pro.3353" target="_BLANK">Databank</a></td> <td><a href="https://doi.org/10.1002/bip.360221211" target="_BLANK">Original algorithm</a></td>
<td>van <td>Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.
Beusekom B, Touw WG, Tatineni M, Somani S, Rajagopal G, Luo J, Gilliland GL, Biopolymers 1983; 22:2577-2637.
Perrakis A, Joosten RP Homology-based hydrogen bond information improves
crystallographic structures in the PDB. Protein Science. 2018; 27:798-808.
</td> </td>
</tr> </tr>
</table> </table>
...@@ -49,91 +39,169 @@ ...@@ -49,91 +39,169 @@
<article> <article>
<h2>The Science behind DSSP</h2> <h2>Using DSSP data</h2>
<!-- TODO: fill in --> <p>DSSP provides an elaborate description of the secondary structure elements in a protein structure, including backbone hydrogen bonding
<!-- <p>Below are other publications that describe different algorithms and concepts of PDB-REDO.</p> and the topology of &beta;-sheets. The most popular feature is the per-residue assignment of secondary structure with a single character code:
</p>
<div class="programListTable"> <ul>
<table style="margin-left: 0;"> <li>H = &alpha;-helix</li>
<tr> <li>B = residue in isolated &beta;-bridge</li>
<td><a href="https://doi.org/10.1126/science.317.5835.195" target="_BLANK">No data, no <li>E = extended strand, participates in &beta; ladder</li>
PDB&#8209;REDO</a></td> <li>G = 3<sub>10</sub>-helix</li>
<td>Joosten RP, Vriend G. PDB improvement starts with data deposition. Science 2007; <li>I = &pi;-helix</li>
317:195-196. <li>P = &kappa;-helix (poly-proline II helix)</li>
</td> <li>T = hydrogen-bonded turn</li>
</tr> <li>S = bend</li>
<tr>
<td><a href="https://dx.doi.org/10.1107/S0907444908037591" target="_BLANK">The value of </ul>
re&#8209;refinement </a>
</td> <p>The full DSSP output is provided in two formats. The legacy DSSP format was origianlly designed for structures that were in
<td>Joosten RP, Womack T, Vriend G, Bricogne G. Re-refinement from deposited X&#8209;ray PDB-formatted models. Now, 40 years later, the PDB format has become obsolete as it cannot capture the large structure models that
data can deliver improved models for most PDB entries. Acta Cryst. 2009; modern structural biology methods can provide. The mmCIF format is the data format of choice for structural biology as it has no
D65:176-185. size limitations for structure models and it can hold extensive annotations and metadata. DSSP now writes its data straight to these
</td> mmCIF files by default. The legacy DSSP format can still be written but only for structure models that fit.</p>
</tr>
<tr> </article>
<td><a href="https://doi.org/10.1107/S0021889809008784" target="_BLANK">The first PDB-REDO</a></td>
<td>Joosten RP, et al and Vriend G. PDB_REDO: automated re&#8209;refinement of <article>
X&#8209;ray <a id="DSSP"></a>
structure models in the PDB. J. Appl. Cryst. 2009; 42:376-384. <h2>DSSP format</h2>
</td> <p>The output from DSSP contains secondary structure assignments and other information. Extract from 3kew.dssp (header):</p>
</tr><tr> <pre>
<td><a href="https://doi.org/10.1107/S0907444911054515" target="_BLANK">Decision making</a></td> ==== Secondary Structure Definition by the program DSSP, NKI version 4.3 ==== DATE=2023-06-08 .
<td>Joosten RP, Joosten K, Murshudov GN, Perrakis A. PDB_REDO: constructive validation, REFERENCE W. KABSCH AND C.SANDER, BIOPOLYMERS 22 (1983) 2577-2637 .
more HEADER TRANSFERASE 26-OCT-09 3KEW .
than just looking for errors. Acta Cryst. 2012; D68:484-496. COMPND MOL_ID: 1; MOLECULE: DHHA1 domain protein; CHAIN: A, B; FRAGMENT: N-TERMINAL FRAGMENT, RESIDUES 1-231; SYNONYM: A... .
</td> SOURCE MOL_ID: 1; GENE: ALAS, CPF_0714; STRAIN: ATCC 13124; ORGANISM_SCIENTIFIC: Clostridium perfringens; ORGANISM_TAXID... .
</tr> AUTHOR Y.Patskovsky; R.Toro; M.Gilmore; S.Miller; J.M.Sauder; S.C.Almo; S.K.Burley; New York SGX Research Center for Str... .
<tr> 458 3 0 0 0 TOTAL NUMBER OF RESIDUES, NUMBER OF CHAINS, NUMBER OF SS-BRIDGES(TOTAL,INTRACHAIN,INTERCHAIN) .
<td><a href="https://doi.org/10.1093/bioinformatics/btr590" target="_BLANK">Model rebuilding 24682.5 ACCESSIBLE SURFACE OF PROTEIN (ANGSTROM**2) .
</a> 319 69.7 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(J) , SAME NUMBER PER 100 RESIDUES .
</td> 6 1.3 TOTAL NUMBER OF HYDROGEN BONDS IN PARALLEL BRIDGES, SAME NUMBER PER 100 RESIDUES .
<td>Joosten RP, Joosten K, Cohen SX, Vriend G, Perrakis A. Automatic rebuilding and 144 31.4 TOTAL NUMBER OF HYDROGEN BONDS IN ANTIPARALLEL BRIDGES, SAME NUMBER PER 100 RESIDUES .
optimization of crystallographic structures in the Protein Data Bank. Bioinformatics 0 0.0 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-5), SAME NUMBER PER 100 RESIDUES .
2011; 27:3392-3398. 2 0.4 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-4), SAME NUMBER PER 100 RESIDUES .
</td> 12 2.6 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-3), SAME NUMBER PER 100 RESIDUES .
</tr> 0 0.0 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-2), SAME NUMBER PER 100 RESIDUES .
<tr> 0 0.0 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-1), SAME NUMBER PER 100 RESIDUES .
<td><a href="https://doi.org/10.1002/pro.3353" target="_blank">Homology restraints</a></td> 0 0.0 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+0), SAME NUMBER PER 100 RESIDUES .
<td>van 0 0.0 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+1), SAME NUMBER PER 100 RESIDUES .
Beusekom B, Touw WG, Tatineni M, Somani S, Rajagopal G, Luo J, Gilliland GL, 50 10.9 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+2), SAME NUMBER PER 100 RESIDUES .
Perrakis A, Joosten RP. Homology-based hydrogen bond information improves 34 7.4 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+3), SAME NUMBER PER 100 RESIDUES .
crystallographic structures in the PDB. Protein Science. 2018; 27:798-808. 84 18.3 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+4), SAME NUMBER PER 100 RESIDUES .
</td> 2 0.4 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+5), SAME NUMBER PER 100 RESIDUES .
</tr> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 *** HISTOGRAMS OF *** .
<tr> 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 2 0 0 0 0 0 RESIDUES PER ALPHA HELIX .
<td><a href="https://doi.org/10.1107/S2052252518010552" target="_blank">Loop building</a></td> 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PARALLEL BRIDGES PER LADDER .
<td>van 4 0 4 8 2 6 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ANTIPARALLEL BRIDGES PER LADDER .
Beusekom B, Joosten K, Hekkelman ML, Joosten RP, 2 2 2 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 LADDERS PER SHEET .
Perrakis A. Homology-based loop modeling yields more complete crystallographic protein structures. </pre>
IUCrJ 2018; 5:585-594.
</td> <p>The first few lines are taken from the input model file, then some general statistics about the model and hydrogen bonding
</tr> are given. The histograms describe the distribution of sizes of secondary structure elements. For instance, this structure has
<tr> three helices, one short one consisting of 4 residues and two longer ones of 16 and 17 residues. Note that beta sheets are described
<td><a href="https://doi.org/10.1107/S2059798319003875" target="_blank">N-glycan building</a></td> as a collection of ladders, rather than strands. Ladders can be seen as two strands together with the hydrogen bonds as the rungs
<td>van of the ladder. More formal definitions are given in the Kabsch and Sander paper.</p>
Beusekom B, Wezel N, Hekkelman ML, Perrakis A, Emsley E, Joosten RP.
Building and rebuilding N-glycans in protein structure models.
Acta Cryst. 2019; D75:416-425. <p>The model statistics are followed by a detailed per-residue description. Extract from 3kew.dssp (continued):</p>
</td> <pre>
</tr> ....;....1....;....2....;....3....;....4....;....5....;....6....;....7..
<tr> .-- sequential resnumber, including chain breaks as extra residues
<td><a href="https://doi.org/10.1107/S2059798321007610" target="_blank">Nucleic acids</a></td> | .-- original resname, not necessarily sequential, may contain letters for insertion codes
<td>de Vries I, Kwakman T, Lu X-J, Hekkelman ML, Deshpande M, Velankar S, Perrakis A, Joosten RP. | | .-- one-letter chain ID
New restraints and validation approaches for nucleic acid structures in PDB-REDO. | | | .-- amino acid sequence in one letter code
Acta Cryst. 2021; D77:1127-1141. | | | | .-- secondary structure summary based on columns 19-38
</td> | | | | |.-- PPII (kappa) helix
</tr> | | | | ||.-- 3-10 helix
<tr> | | | | |||.-- alpha helix
<td><a href="https://doi.org/10.1107/S2059798316013036" target="_blank">Zinc sites</a></td> | | | | ||||.-- pi helix
<td>Touw WG, van Beusekom B, Evers JMG, Vriend G, Joosten RP. | | | | |||||.-- geometrical bend
Validation and correction of Zn–CysxHisy complexes. | | | | ||||||.-- chirality
Acta Cryst. 2016; D72:1110-1118. | | | | |||||||.-- beta bridge label
</td> | | | | ||||||||.-- beta bridge label
</tr> | | | | ||||||||| .-- beta bridge partner resnum
</table> | | | | ||||||||| | .-- beta bridge partner resnum
</div> --> | | | | ||||||||| | |.-- beta sheet label
| | | | ||||||||| | || .-- solvent accessibility
| | | | ||||||||| | || |
# RESIDUE AA STRUCTURE BP1 BP2 ACC N-H-->O O-->H-N N-H-->O O-->H-N TCO KAPPA ALPHA PHI PSI X-CA Y-CA Z-CA
1 1 A L 0 0 119 0, 0.0 2,-0.3 0, 0.0 33,-0.2 0.000 360.0 360.0 360.0 168.8 8.7 6.9 63.0
2 2 A T E -a 34 0A 66 31,-2.0 33,-2.1 1,-0.1 2,-0.7 -0.456 360.0-169.6 -87.8 130.5 7.7 8.8 59.8
3 3 A K E > -a 35 0A 66 -2,-0.3 3,-1.2 31,-0.2 4,-0.2 -0.850 8.5-179.0-111.3 94.7 7.6 7.5 56.2
4 4 A L G >> S+ 0 0 23 31,-2.5 4,-2.9 -2,-0.7 3,-2.0 0.786 71.6 72.4 -65.6 -32.5 7.1 10.6 54.1
5 5 A Y G 34 S+ 0 0 2 30,-0.8 -1,-0.3 1,-0.3 31,-0.1 0.709 101.0 46.1 -56.9 -26.7 7.0 8.7 50.7
6 6 A Y G <4 S+ 0 0 39 -3,-1.2 -1,-0.3 2,-0.1 -2,-0.2 0.439 115.4 47.1 -93.5 -4.1 3.5 7.4 51.7
7 7 A E T <4 S- 0 0 138 -3,-2.0 2,-0.3 1,-0.2 -2,-0.2 0.825 135.2 -0.3 -99.6 -48.8 2.4 10.9 52.8
8 8 A D >< - 0 0 57 -4,-2.9 3,-1.4 3,-0.1 -1,-0.2 -0.852 61.5-167.8-144.3 106.0 3.6 13.0 49.9
</pre>
<p>Below is a brief description of the data columns. More details are described in the Kabsch and Sander paper.</p>
<h3>RESIDUE</h3>
<p>Two columns of residue numbers. First column is DSSP's sequential residue number, starting at the first residue actually in the model set
and including chain breaks; this number is used to refer to residues throughout. The second column gives the numbering as is used in the
structure model 'residue number','insertion code' and 'chain identifier'; these are given for reference only.</p>
<h3>AA</h3>
<p>One letter amino acid code, non standard residues are marked as <em>X</em>. CYS in an SS-bridge are marked by a lower case letter. So when cysteines
are bridged, then the first bridged cysteine in the sequence and its partner elsewhere in the sequence are marked <em>a</em>. The next bridged cysteine,
that is not yet marked, and its partner are both marked <em>b</em>, etcetera. Unbridged cysteines remain marked as <em>C</em>.</p>
<h3>S (first column in STRUCTURE block)</h3>
<p>The one-letter summary of secondary structure, intended to approximate crystallographers' intuition, based on columns 19-38, which are the principal
result of DSSP analysis of the atomic coordinates. More details in the Kabsch and Sander paper.</p>
<h3>BP1 and BP2</h3>
<p>Residue numbers of the first and (if available) second beta bridge partner. The letter marked the B-sheet that contains the bridges.</p>
<h3>ACC</h3>
<p>Water exposed surface in Angstrom**2. <em>Note:</em>The values for solvent exposure may not mean what you think:
<ul>
<li>Effects leading to larger than expected values: solvent exposure calculation ignores unusual residues, like ACE, or residues with incomplete backbone.
it also ignores HETATOMS, like a heme or metal ligands. Also, side chains may not have all atoms explicitly modeled.</li>
<li>Effects leading to smaller than expected values: in complexes, e.g. a dimer, solvent exposure is for the entire assembly, not for the monomer.
Also, atom OXT of c-terminal residues is treated like a side chain atom if it is listed as part of the last residue.</li>
<li>Unknown or non-standard residues are named X on output and are not checked for the expected number of sidechain atoms.</li>
<li>All explicit water molecules, like other hetatoms, are ignored.</li>
</ul>
</p>
<h3>N-H-->O etc.</h3>
<p>Hydrogen bonds; e.g. -3,-1.4 means that this residue (i) has its HN atom H-bonded to O of residue i-3 with an electrostatic H-bond energy of -1.4 kcal/mol.
There are two columns for each type of H-bond, to allow for bifurcated H-bonds. <em>Note:</em>The marked H-bonds are the best and second best candidate. The second best
and even the best (in rare occasions) may be unrealistically por H-bonds.</p>
<h3>TCO</h3>
<p>The cosine of angle between C=O of residue i and C=O of residue i-1. For &alpha;-helices, TCO is near +1, for &beta;-sheets TCO is near -1.
These values are descriptive and not used for structure definition.</p>
<h3>KAPPA</h3>
<p>Virtual bond angle (bend angle) defined by the three C&alpha; atoms of residues i-2, i, and i+2. Used to define bends (structure code <em>S</em>).</p>
<h3>ALPHA</h3>
<p>Virtual torsion angle (dihedral angle) defined by the four C&alpha; atoms of residues i-1, i, i+1, and i+2. Used to define chirality (structure code <em>+</em> or <em>-</em>).
<h3>PHI and PSI</h3>
<p>The peptide backbone torsion angles as described in the IUPAC standard</p>
<h3>X-CA, Y-CA, and Z-CA</h3>
<p>Just a copy of the C&alpha; atom coordinates in the structure model</p>
</article>
<article>
<a id="mmCIF"></a>
<h2>DSSP data in mmCIF files</h2>
<p>The mmCIF-formatted DSSP output caries the same information as the DSSP format but in a more scalable way and with a formal description caputered in
an mmCIF dictionary. It is designed to be machine readable. Developers who create software to read these annotations can use our
<a href="https://github.com/PDB-REDO/dssp/blob/trunk/mmcif_pdbx/dssp-extension.dic" target="_BLANK">extension to the mmCIF dictionary</a> on GitHub.
<em>Note:</em> For sake of speed the solvent accessibility is not calculated by default when using mmCIF output. The command-line switch
<code>--calculate-accessibility</code> can be used to switch this feature on.
</p>
</article> </article>
</div> </div>
......
...@@ -38,7 +38,7 @@ ...@@ -38,7 +38,7 @@
For your convenience we have already run DSSP on the entire Protein Data Bank. All entries are available in <a z2:href="@{/about#mmCIF}">mmCIF format</a> For your convenience we have already run DSSP on the entire Protein Data Bank. All entries are available in <a z2:href="@{/about#mmCIF}">mmCIF format</a>
and, if they fit, in the <a z2:href="@{/about#DSSP}">legacy DSSP format.</a> and, if they fit, in the <a z2:href="@{/about#DSSP}">legacy DSSP format.</a>
<h2>Manual download (single entries) via the website or using <code>wget</code>:</h2> <h2>Manually download (single entries) via the website or using <code>wget</code>:</h2>
<ul> <ul>
<li><code>wget https://pdb-redo.eu/dssp/9xyz.cif.gz</code> downloads annotated mmCIF file</li> <li><code>wget https://pdb-redo.eu/dssp/9xyz.cif.gz</code> downloads annotated mmCIF file</li>
<li><code>wget https://pdb-redo.eu/dssp/9xyz.dssp</code> downloads the legacy DSSP file (if available)</li> <li><code>wget https://pdb-redo.eu/dssp/9xyz.dssp</code> downloads the legacy DSSP file (if available)</li>
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment