Commit 822438b5 by rjoosten

Added a description of the DSSP output and some references.

parent fad274bd
......@@ -15,32 +15,22 @@
<nav z2:replace="~{menu :: navbar('about')}" />
<div class="container site-content">
<article>
<h2>General</h2>
<!-- <p>This is the DSSP web server. Before using it, please read the <a z2:href="@{/privacy-policy}">privacy policy</a>.</p> -->
</article>
<article>
<h2>References</h2>
<!-- TODO: fill in -->
<p>bla bla.</p>
<p>If you use the DSSP software or databank please cite the appropriate paper:</p>
<div class="programListTable">
<table style="margin-left: 0;">
<tr>
<td><a href="https://doi.org/10.1107/S2052252514009324" target="_BLANK">Web&nbsp;server</a></td>
<td>Joosten RP, Long F, Murshudov GN, Perrakis A. The PDB_REDO server for macromolecular
structure model optimization. IUCrJ. 2014; 1(4):213-220.
</td>
<td><a href="https://doi.org/10.1093/nar/gkq1105" target="_BLANK">Current version</a></td>
<td>Joosten RP, te Beek TAH, Krieger E, Hekkelman ML, Hooft RWW, Schneider R, Sander C, Vriend
A series of PDB related databases for everyday needs. Nuc. Acids Res. 2010; 39:D411-D419.</td>
</tr>
<tr>
<td><a href="https://doi.org/10.1002/pro.3353" target="_BLANK">Databank</a></td>
<td>van
Beusekom B, Touw WG, Tatineni M, Somani S, Rajagopal G, Luo J, Gilliland GL,
Perrakis A, Joosten RP Homology-based hydrogen bond information improves
crystallographic structures in the PDB. Protein Science. 2018; 27:798-808.
<td><a href="https://doi.org/10.1002/bip.360221211" target="_BLANK">Original algorithm</a></td>
<td>Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.
Biopolymers 1983; 22:2577-2637.
</td>
</tr>
</table>
......@@ -49,91 +39,169 @@
<article>
<h2>The Science behind DSSP</h2>
<h2>Using DSSP data</h2>
<!-- TODO: fill in -->
<!-- <p>Below are other publications that describe different algorithms and concepts of PDB-REDO.</p>
<p>DSSP provides an elaborate description of the secondary structure elements in a protein structure, including backbone hydrogen bonding
and the topology of &beta;-sheets. The most popular feature is the per-residue assignment of secondary structure with a single character code:
</p>
<div class="programListTable">
<table style="margin-left: 0;">
<tr>
<td><a href="https://doi.org/10.1126/science.317.5835.195" target="_BLANK">No data, no
PDB&#8209;REDO</a></td>
<td>Joosten RP, Vriend G. PDB improvement starts with data deposition. Science 2007;
317:195-196.
</td>
</tr>
<tr>
<td><a href="https://dx.doi.org/10.1107/S0907444908037591" target="_BLANK">The value of
re&#8209;refinement </a>
</td>
<td>Joosten RP, Womack T, Vriend G, Bricogne G. Re-refinement from deposited X&#8209;ray
data can deliver improved models for most PDB entries. Acta Cryst. 2009;
D65:176-185.
</td>
</tr>
<tr>
<td><a href="https://doi.org/10.1107/S0021889809008784" target="_BLANK">The first PDB-REDO</a></td>
<td>Joosten RP, et al and Vriend G. PDB_REDO: automated re&#8209;refinement of
X&#8209;ray
structure models in the PDB. J. Appl. Cryst. 2009; 42:376-384.
</td>
</tr><tr>
<td><a href="https://doi.org/10.1107/S0907444911054515" target="_BLANK">Decision making</a></td>
<td>Joosten RP, Joosten K, Murshudov GN, Perrakis A. PDB_REDO: constructive validation,
more
than just looking for errors. Acta Cryst. 2012; D68:484-496.
</td>
</tr>
<tr>
<td><a href="https://doi.org/10.1093/bioinformatics/btr590" target="_BLANK">Model rebuilding
</a>
</td>
<td>Joosten RP, Joosten K, Cohen SX, Vriend G, Perrakis A. Automatic rebuilding and
optimization of crystallographic structures in the Protein Data Bank. Bioinformatics
2011; 27:3392-3398.
</td>
</tr>
<tr>
<td><a href="https://doi.org/10.1002/pro.3353" target="_blank">Homology restraints</a></td>
<td>van
Beusekom B, Touw WG, Tatineni M, Somani S, Rajagopal G, Luo J, Gilliland GL,
Perrakis A, Joosten RP. Homology-based hydrogen bond information improves
crystallographic structures in the PDB. Protein Science. 2018; 27:798-808.
</td>
</tr>
<tr>
<td><a href="https://doi.org/10.1107/S2052252518010552" target="_blank">Loop building</a></td>
<td>van
Beusekom B, Joosten K, Hekkelman ML, Joosten RP,
Perrakis A. Homology-based loop modeling yields more complete crystallographic protein structures.
IUCrJ 2018; 5:585-594.
</td>
</tr>
<tr>
<td><a href="https://doi.org/10.1107/S2059798319003875" target="_blank">N-glycan building</a></td>
<td>van
Beusekom B, Wezel N, Hekkelman ML, Perrakis A, Emsley E, Joosten RP.
Building and rebuilding N-glycans in protein structure models.
Acta Cryst. 2019; D75:416-425.
</td>
</tr>
<tr>
<td><a href="https://doi.org/10.1107/S2059798321007610" target="_blank">Nucleic acids</a></td>
<td>de Vries I, Kwakman T, Lu X-J, Hekkelman ML, Deshpande M, Velankar S, Perrakis A, Joosten RP.
New restraints and validation approaches for nucleic acid structures in PDB-REDO.
Acta Cryst. 2021; D77:1127-1141.
</td>
</tr>
<tr>
<td><a href="https://doi.org/10.1107/S2059798316013036" target="_blank">Zinc sites</a></td>
<td>Touw WG, van Beusekom B, Evers JMG, Vriend G, Joosten RP.
Validation and correction of Zn–CysxHisy complexes.
Acta Cryst. 2016; D72:1110-1118.
</td>
</tr>
</table>
</div> -->
<ul>
<li>H = &alpha;-helix</li>
<li>B = residue in isolated &beta;-bridge</li>
<li>E = extended strand, participates in &beta; ladder</li>
<li>G = 3<sub>10</sub>-helix</li>
<li>I = &pi;-helix</li>
<li>P = &kappa;-helix (poly-proline II helix)</li>
<li>T = hydrogen-bonded turn</li>
<li>S = bend</li>
</ul>
<p>The full DSSP output is provided in two formats. The legacy DSSP format was origianlly designed for structures that were in
PDB-formatted models. Now, 40 years later, the PDB format has become obsolete as it cannot capture the large structure models that
modern structural biology methods can provide. The mmCIF format is the data format of choice for structural biology as it has no
size limitations for structure models and it can hold extensive annotations and metadata. DSSP now writes its data straight to these
mmCIF files by default. The legacy DSSP format can still be written but only for structure models that fit.</p>
</article>
<article>
<a id="DSSP"></a>
<h2>DSSP format</h2>
<p>The output from DSSP contains secondary structure assignments and other information. Extract from 3kew.dssp (header):</p>
<pre>
==== Secondary Structure Definition by the program DSSP, NKI version 4.3 ==== DATE=2023-06-08 .
REFERENCE W. KABSCH AND C.SANDER, BIOPOLYMERS 22 (1983) 2577-2637 .
HEADER TRANSFERASE 26-OCT-09 3KEW .
COMPND MOL_ID: 1; MOLECULE: DHHA1 domain protein; CHAIN: A, B; FRAGMENT: N-TERMINAL FRAGMENT, RESIDUES 1-231; SYNONYM: A... .
SOURCE MOL_ID: 1; GENE: ALAS, CPF_0714; STRAIN: ATCC 13124; ORGANISM_SCIENTIFIC: Clostridium perfringens; ORGANISM_TAXID... .
AUTHOR Y.Patskovsky; R.Toro; M.Gilmore; S.Miller; J.M.Sauder; S.C.Almo; S.K.Burley; New York SGX Research Center for Str... .
458 3 0 0 0 TOTAL NUMBER OF RESIDUES, NUMBER OF CHAINS, NUMBER OF SS-BRIDGES(TOTAL,INTRACHAIN,INTERCHAIN) .
24682.5 ACCESSIBLE SURFACE OF PROTEIN (ANGSTROM**2) .
319 69.7 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(J) , SAME NUMBER PER 100 RESIDUES .
6 1.3 TOTAL NUMBER OF HYDROGEN BONDS IN PARALLEL BRIDGES, SAME NUMBER PER 100 RESIDUES .
144 31.4 TOTAL NUMBER OF HYDROGEN BONDS IN ANTIPARALLEL BRIDGES, SAME NUMBER PER 100 RESIDUES .
0 0.0 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-5), SAME NUMBER PER 100 RESIDUES .
2 0.4 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-4), SAME NUMBER PER 100 RESIDUES .
12 2.6 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-3), SAME NUMBER PER 100 RESIDUES .
0 0.0 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-2), SAME NUMBER PER 100 RESIDUES .
0 0.0 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-1), SAME NUMBER PER 100 RESIDUES .
0 0.0 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+0), SAME NUMBER PER 100 RESIDUES .
0 0.0 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+1), SAME NUMBER PER 100 RESIDUES .
50 10.9 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+2), SAME NUMBER PER 100 RESIDUES .
34 7.4 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+3), SAME NUMBER PER 100 RESIDUES .
84 18.3 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+4), SAME NUMBER PER 100 RESIDUES .
2 0.4 TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+5), SAME NUMBER PER 100 RESIDUES .
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 *** HISTOGRAMS OF *** .
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 2 0 0 0 0 0 RESIDUES PER ALPHA HELIX .
0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PARALLEL BRIDGES PER LADDER .
4 0 4 8 2 6 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ANTIPARALLEL BRIDGES PER LADDER .
2 2 2 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 LADDERS PER SHEET .
</pre>
<p>The first few lines are taken from the input model file, then some general statistics about the model and hydrogen bonding
are given. The histograms describe the distribution of sizes of secondary structure elements. For instance, this structure has
three helices, one short one consisting of 4 residues and two longer ones of 16 and 17 residues. Note that beta sheets are described
as a collection of ladders, rather than strands. Ladders can be seen as two strands together with the hydrogen bonds as the rungs
of the ladder. More formal definitions are given in the Kabsch and Sander paper.</p>
<p>The model statistics are followed by a detailed per-residue description. Extract from 3kew.dssp (continued):</p>
<pre>
....;....1....;....2....;....3....;....4....;....5....;....6....;....7..
.-- sequential resnumber, including chain breaks as extra residues
| .-- original resname, not necessarily sequential, may contain letters for insertion codes
| | .-- one-letter chain ID
| | | .-- amino acid sequence in one letter code
| | | | .-- secondary structure summary based on columns 19-38
| | | | |.-- PPII (kappa) helix
| | | | ||.-- 3-10 helix
| | | | |||.-- alpha helix
| | | | ||||.-- pi helix
| | | | |||||.-- geometrical bend
| | | | ||||||.-- chirality
| | | | |||||||.-- beta bridge label
| | | | ||||||||.-- beta bridge label
| | | | ||||||||| .-- beta bridge partner resnum
| | | | ||||||||| | .-- beta bridge partner resnum
| | | | ||||||||| | |.-- beta sheet label
| | | | ||||||||| | || .-- solvent accessibility
| | | | ||||||||| | || |
# RESIDUE AA STRUCTURE BP1 BP2 ACC N-H-->O O-->H-N N-H-->O O-->H-N TCO KAPPA ALPHA PHI PSI X-CA Y-CA Z-CA
1 1 A L 0 0 119 0, 0.0 2,-0.3 0, 0.0 33,-0.2 0.000 360.0 360.0 360.0 168.8 8.7 6.9 63.0
2 2 A T E -a 34 0A 66 31,-2.0 33,-2.1 1,-0.1 2,-0.7 -0.456 360.0-169.6 -87.8 130.5 7.7 8.8 59.8
3 3 A K E > -a 35 0A 66 -2,-0.3 3,-1.2 31,-0.2 4,-0.2 -0.850 8.5-179.0-111.3 94.7 7.6 7.5 56.2
4 4 A L G >> S+ 0 0 23 31,-2.5 4,-2.9 -2,-0.7 3,-2.0 0.786 71.6 72.4 -65.6 -32.5 7.1 10.6 54.1
5 5 A Y G 34 S+ 0 0 2 30,-0.8 -1,-0.3 1,-0.3 31,-0.1 0.709 101.0 46.1 -56.9 -26.7 7.0 8.7 50.7
6 6 A Y G <4 S+ 0 0 39 -3,-1.2 -1,-0.3 2,-0.1 -2,-0.2 0.439 115.4 47.1 -93.5 -4.1 3.5 7.4 51.7
7 7 A E T <4 S- 0 0 138 -3,-2.0 2,-0.3 1,-0.2 -2,-0.2 0.825 135.2 -0.3 -99.6 -48.8 2.4 10.9 52.8
8 8 A D >< - 0 0 57 -4,-2.9 3,-1.4 3,-0.1 -1,-0.2 -0.852 61.5-167.8-144.3 106.0 3.6 13.0 49.9
</pre>
<p>Below is a brief description of the data columns. More details are described in the Kabsch and Sander paper.</p>
<h3>RESIDUE</h3>
<p>Two columns of residue numbers. First column is DSSP's sequential residue number, starting at the first residue actually in the model set
and including chain breaks; this number is used to refer to residues throughout. The second column gives the numbering as is used in the
structure model 'residue number','insertion code' and 'chain identifier'; these are given for reference only.</p>
<h3>AA</h3>
<p>One letter amino acid code, non standard residues are marked as <em>X</em>. CYS in an SS-bridge are marked by a lower case letter. So when cysteines
are bridged, then the first bridged cysteine in the sequence and its partner elsewhere in the sequence are marked <em>a</em>. The next bridged cysteine,
that is not yet marked, and its partner are both marked <em>b</em>, etcetera. Unbridged cysteines remain marked as <em>C</em>.</p>
<h3>S (first column in STRUCTURE block)</h3>
<p>The one-letter summary of secondary structure, intended to approximate crystallographers' intuition, based on columns 19-38, which are the principal
result of DSSP analysis of the atomic coordinates. More details in the Kabsch and Sander paper.</p>
<h3>BP1 and BP2</h3>
<p>Residue numbers of the first and (if available) second beta bridge partner. The letter marked the B-sheet that contains the bridges.</p>
<h3>ACC</h3>
<p>Water exposed surface in Angstrom**2. <em>Note:</em>The values for solvent exposure may not mean what you think:
<ul>
<li>Effects leading to larger than expected values: solvent exposure calculation ignores unusual residues, like ACE, or residues with incomplete backbone.
it also ignores HETATOMS, like a heme or metal ligands. Also, side chains may not have all atoms explicitly modeled.</li>
<li>Effects leading to smaller than expected values: in complexes, e.g. a dimer, solvent exposure is for the entire assembly, not for the monomer.
Also, atom OXT of c-terminal residues is treated like a side chain atom if it is listed as part of the last residue.</li>
<li>Unknown or non-standard residues are named X on output and are not checked for the expected number of sidechain atoms.</li>
<li>All explicit water molecules, like other hetatoms, are ignored.</li>
</ul>
</p>
<h3>N-H-->O etc.</h3>
<p>Hydrogen bonds; e.g. -3,-1.4 means that this residue (i) has its HN atom H-bonded to O of residue i-3 with an electrostatic H-bond energy of -1.4 kcal/mol.
There are two columns for each type of H-bond, to allow for bifurcated H-bonds. <em>Note:</em>The marked H-bonds are the best and second best candidate. The second best
and even the best (in rare occasions) may be unrealistically por H-bonds.</p>
<h3>TCO</h3>
<p>The cosine of angle between C=O of residue i and C=O of residue i-1. For &alpha;-helices, TCO is near +1, for &beta;-sheets TCO is near -1.
These values are descriptive and not used for structure definition.</p>
<h3>KAPPA</h3>
<p>Virtual bond angle (bend angle) defined by the three C&alpha; atoms of residues i-2, i, and i+2. Used to define bends (structure code <em>S</em>).</p>
<h3>ALPHA</h3>
<p>Virtual torsion angle (dihedral angle) defined by the four C&alpha; atoms of residues i-1, i, i+1, and i+2. Used to define chirality (structure code <em>+</em> or <em>-</em>).
<h3>PHI and PSI</h3>
<p>The peptide backbone torsion angles as described in the IUPAC standard</p>
<h3>X-CA, Y-CA, and Z-CA</h3>
<p>Just a copy of the C&alpha; atom coordinates in the structure model</p>
</article>
<article>
<a id="mmCIF"></a>
<h2>DSSP data in mmCIF files</h2>
<p>The mmCIF-formatted DSSP output caries the same information as the DSSP format but in a more scalable way and with a formal description caputered in
an mmCIF dictionary. It is designed to be machine readable. Developers who create software to read these annotations can use our
<a href="https://github.com/PDB-REDO/dssp/blob/trunk/mmcif_pdbx/dssp-extension.dic" target="_BLANK">extension to the mmCIF dictionary</a> on GitHub.
<em>Note:</em> For sake of speed the solvent accessibility is not calculated by default when using mmCIF output. The command-line switch
<code>--calculate-accessibility</code> can be used to switch this feature on.
</p>
</article>
</div>
......
......@@ -38,7 +38,7 @@
For your convenience we have already run DSSP on the entire Protein Data Bank. All entries are available in <a z2:href="@{/about#mmCIF}">mmCIF format</a>
and, if they fit, in the <a z2:href="@{/about#DSSP}">legacy DSSP format.</a>
<h2>Manual download (single entries) via the website or using <code>wget</code>:</h2>
<h2>Manually download (single entries) via the website or using <code>wget</code>:</h2>
<ul>
<li><code>wget https://pdb-redo.eu/dssp/9xyz.cif.gz</code> downloads annotated mmCIF file</li>
<li><code>wget https://pdb-redo.eu/dssp/9xyz.dssp</code> downloads the legacy DSSP file (if available)</li>
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment