Added a description of the DSSP output and some references.

822438b5 · rjoosten · fad274bd · 822438b5 · 822438b5
Commit 822438b5 authored Jun 09, 2023 by rjoosten
Hide whitespace changes
Inline Side-by-side

Showing with 169 additions and 101 deletions

docroot/about.html
+168 -100

docroot/download.html
+1 -1

No files found.
--- a/docroot/about.html
+++ b/docroot/about.html
@@ -15,32 +15,22 @@
 	<nav z2:replace="~{menu :: navbar('about')}" />

 	<div class="container site-content">
-
-		<article>
-			<h2>General</h2>
-
-			<!-- <p>This is the DSSP web server. Before using it, please read the <a z2:href="@{/privacy-policy}">privacy policy</a>.</p> -->
-		</article>
 				
 		<article>
 			<h2>References</h2>

-			<!-- TODO: fill in -->
-			<p>bla bla.</p>
+			<p>If you use the DSSP software or databank please cite the appropriate paper:</p>
 			<div class="programListTable">
 				<table style="margin-left: 0;">
 					<tr>
-						<td><a href="https://doi.org/10.1107/S2052252514009324" target="_BLANK">Web&nbsp;server</a></td>
-						<td>Joosten RP, Long F, Murshudov GN, Perrakis A. The PDB_REDO server for macromolecular
-							structure model optimization. IUCrJ. 2014; 1(4):213-220.
-						</td>
+						<td><a href="https://doi.org/10.1093/nar/gkq1105" target="_BLANK">Current version</a></td>
+						<td>Joosten RP, te Beek TAH, Krieger E, Hekkelman ML, Hooft RWW, Schneider R, Sander C, Vriend
+						    A series of PDB related databases for everyday needs. Nuc. Acids Res. 2010; 39:D411-D419.</td>
 					</tr>
 					<tr>
-						<td><a href="https://doi.org/10.1002/pro.3353" target="_BLANK">Databank</a></td>
-						<td>van
-							Beusekom B, Touw WG, Tatineni M, Somani S, Rajagopal G, Luo J, Gilliland GL,
-							Perrakis A, Joosten RP Homology-based hydrogen bond information improves
-							crystallographic structures in the PDB. Protein Science. 2018; 27:798-808.
+						<td><a href="https://doi.org/10.1002/bip.360221211" target="_BLANK">Original algorithm</a></td>
+						<td>Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.
+							Biopolymers 1983; 22:2577-2637.
 						</td>
 					</tr>
 				</table>
@@ -49,91 +39,169 @@
 		
 		<article>

-			<h2>The Science behind DSSP</h2>
+			<h2>Using DSSP data</h2>
 			
-			<!-- TODO: fill in -->
-			<!-- <p>Below are other publications that describe different algorithms and concepts of PDB-REDO.</p>
+			<p>DSSP provides an elaborate description of the secondary structure elements in a protein structure, including backbone hydrogen bonding 
+			and the topology of &beta;-sheets. The most popular feature is the per-residue assignment of secondary structure with a single character code:
+			</p>

-			<div class="programListTable">
-				<table style="margin-left: 0;">
-                                        <tr>
-						<td><a href="https://doi.org/10.1126/science.317.5835.195" target="_BLANK">No data, no
-								PDB&#8209;REDO</a></td>
-						<td>Joosten RP, Vriend G. PDB improvement starts with data deposition. Science 2007;
-							317:195-196.
-						</td>
-					</tr>					
-					<tr>
-						<td><a href="https://dx.doi.org/10.1107/S0907444908037591" target="_BLANK">The value of
-								re&#8209;refinement </a>
-						</td>
-						<td>Joosten RP, Womack T, Vriend G, Bricogne G. Re-refinement from deposited X&#8209;ray
-							data can deliver improved models for most PDB entries. Acta Cryst. 2009;
-							D65:176-185.
-						</td>
-					</tr>					
-					<tr>
-						<td><a href="https://doi.org/10.1107/S0021889809008784" target="_BLANK">The first PDB-REDO</a></td>
-						<td>Joosten RP, et al and Vriend G. PDB_REDO: automated re&#8209;refinement of
-							X&#8209;ray
-							structure models in the PDB. J. Appl. Cryst. 2009; 42:376-384.
-						</td>
-					</tr><tr>
-						<td><a href="https://doi.org/10.1107/S0907444911054515" target="_BLANK">Decision making</a></td>
-						<td>Joosten RP, Joosten K, Murshudov GN, Perrakis A. PDB_REDO: constructive validation,
-							more
-							than just looking for errors. Acta Cryst. 2012; D68:484-496.
-						</td>
-					</tr>
-					<tr>
-						<td><a href="https://doi.org/10.1093/bioinformatics/btr590" target="_BLANK">Model rebuilding
-							</a>
-						</td>
-						<td>Joosten RP, Joosten K, Cohen SX, Vriend G, Perrakis A. Automatic rebuilding and
-							optimization of crystallographic structures in the Protein Data Bank. Bioinformatics
-							2011; 27:3392-3398.
-						</td>
-					</tr>
-                                        <tr>
-						<td><a href="https://doi.org/10.1002/pro.3353" target="_blank">Homology restraints</a></td>
-						<td>van
-							Beusekom B, Touw WG, Tatineni M, Somani S, Rajagopal G, Luo J, Gilliland GL,
-							Perrakis A, Joosten RP. Homology-based hydrogen bond information improves
-							crystallographic structures in the PDB. Protein Science. 2018; 27:798-808.
-						</td>
-					</tr>
-                                        <tr>
-						<td><a href="https://doi.org/10.1107/S2052252518010552" target="_blank">Loop building</a></td>
-						<td>van
-							Beusekom B, Joosten K, Hekkelman ML, Joosten RP,
-							Perrakis A. Homology-based loop modeling yields more complete crystallographic protein structures.
-							IUCrJ 2018; 5:585-594.
-						</td>
-					</tr>
-                                        <tr>
-						<td><a href="https://doi.org/10.1107/S2059798319003875" target="_blank">N-glycan building</a></td>
-						<td>van
-							Beusekom B, Wezel N, Hekkelman ML, Perrakis A, Emsley E, Joosten RP. 
-							Building and rebuilding N-glycans in protein structure models.
-							Acta Cryst. 2019; D75:416-425.
-						</td>
-					</tr>
-                                        <tr>
-						<td><a href="https://doi.org/10.1107/S2059798321007610" target="_blank">Nucleic acids</a></td>
-						<td>de Vries I, Kwakman T, Lu X-J, Hekkelman ML, Deshpande M, Velankar S, Perrakis A, Joosten RP. 
-							New restraints and validation approaches for nucleic acid structures in PDB-REDO.
-							Acta Cryst. 2021; D77:1127-1141.
-						</td>
-					</tr>
-                                        <tr>
-						<td><a href="https://doi.org/10.1107/S2059798316013036" target="_blank">Zinc sites</a></td>
-						<td>Touw WG, van Beusekom B, Evers JMG, Vriend G, Joosten RP. 
-							Validation and correction of Zn–CysxHisy complexes.
-							Acta Cryst. 2016; D72:1110-1118.
-						</td>
-					</tr>
-				</table>
-			</div> -->
+			<ul>
+				<li>H = &alpha;-helix</li>
+				<li>B = residue in isolated &beta;-bridge</li>
+				<li>E = extended strand, participates in &beta; ladder</li>
+				<li>G = 3<sub>10</sub>-helix</li>
+				<li>I = &pi;-helix</li>
+				<li>P = &kappa;-helix (poly-proline II helix)</li>
+				<li>T = hydrogen-bonded turn</li>
+				<li>S = bend</li>
+
+			</ul>
+   
+			<p>The full DSSP output is provided in two formats. The legacy DSSP format was origianlly designed for structures that were in 
+			PDB-formatted models. Now, 40 years later, the PDB format has become obsolete as it cannot capture the large structure models that 
+			modern structural biology methods can provide. The mmCIF format is the data format of choice for structural biology as it has no 
+			size limitations for structure models and it can hold extensive annotations and metadata. DSSP now writes its data straight to these 
+			mmCIF files by default. The legacy DSSP format can still be written but only for structure models that fit.</p>
+
+		</article>	
+
+                <article>
+			<a id="DSSP"></a>
+			<h2>DSSP format</h2>
+			<p>The output from DSSP contains secondary structure assignments and other information. Extract from 3kew.dssp (header):</p>
+			<pre>
+==== Secondary Structure Definition by the program DSSP, NKI version 4.3                           ==== DATE=2023-06-08        .
+REFERENCE W. KABSCH AND C.SANDER, BIOPOLYMERS 22 (1983) 2577-2637                                                              .
+HEADER    TRANSFERASE                             26-OCT-09   3KEW                                                             .
+COMPND    MOL_ID: 1; MOLECULE: DHHA1 domain protein; CHAIN: A, B; FRAGMENT: N-TERMINAL FRAGMENT, RESIDUES 1-231; SYNONYM: A... .
+SOURCE    MOL_ID: 1; GENE: ALAS, CPF_0714; STRAIN: ATCC 13124; ORGANISM_SCIENTIFIC: Clostridium perfringens; ORGANISM_TAXID... .
+AUTHOR    Y.Patskovsky; R.Toro; M.Gilmore; S.Miller; J.M.Sauder; S.C.Almo; S.K.Burley; New York SGX Research Center for Str... .
+  458  3  0  0  0 TOTAL NUMBER OF RESIDUES, NUMBER OF CHAINS, NUMBER OF SS-BRIDGES(TOTAL,INTRACHAIN,INTERCHAIN)                .
+ 24682.5   ACCESSIBLE SURFACE OF PROTEIN (ANGSTROM**2)                                                                         .
+  319 69.7   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(J)  , SAME NUMBER PER 100 RESIDUES                              .
+    6  1.3   TOTAL NUMBER OF HYDROGEN BONDS IN     PARALLEL BRIDGES, SAME NUMBER PER 100 RESIDUES                              .
+  144 31.4   TOTAL NUMBER OF HYDROGEN BONDS IN ANTIPARALLEL BRIDGES, SAME NUMBER PER 100 RESIDUES                              .
+    0  0.0   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-5), SAME NUMBER PER 100 RESIDUES                              .
+    2  0.4   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-4), SAME NUMBER PER 100 RESIDUES                              .
+   12  2.6   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-3), SAME NUMBER PER 100 RESIDUES                              .
+    0  0.0   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-2), SAME NUMBER PER 100 RESIDUES                              .
+    0  0.0   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-1), SAME NUMBER PER 100 RESIDUES                              .
+    0  0.0   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+0), SAME NUMBER PER 100 RESIDUES                              .
+    0  0.0   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+1), SAME NUMBER PER 100 RESIDUES                              .
+   50 10.9   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+2), SAME NUMBER PER 100 RESIDUES                              .
+   34  7.4   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+3), SAME NUMBER PER 100 RESIDUES                              .
+   84 18.3   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+4), SAME NUMBER PER 100 RESIDUES                              .
+    2  0.4   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+5), SAME NUMBER PER 100 RESIDUES                              .
+  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30     *** HISTOGRAMS OF ***           .
+  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  1  1  0  0  0  0  0  0  0  2  0  0  0  0  0    RESIDUES PER ALPHA HELIX         .
+  0  2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    PARALLEL BRIDGES PER LADDER      .
+  4  0  4  8  2  6  0  2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    ANTIPARALLEL BRIDGES PER LADDER  .
+  2  2  2  0  0  2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    LADDERS PER SHEET                .
+</pre>
+
+			<p>The first few lines are taken from the input model file, then some general statistics about the model and hydrogen bonding 
+			are given. The histograms describe the distribution of sizes of secondary structure elements. For instance, this structure has 
+			three helices, one short one consisting of 4 residues and two longer ones of 16 and 17 residues. Note that beta sheets are described
+			as a collection of ladders, rather than strands. Ladders can be seen as two strands together with the hydrogen bonds as the rungs 
+			of the ladder. More formal definitions are given in the Kabsch and Sander paper.</p>
+
+
+			<p>The model statistics are followed by a detailed per-residue description. Extract from 3kew.dssp (continued):</p>
+<pre>
+....;....1....;....2....;....3....;....4....;....5....;....6....;....7..
+    .-- sequential resnumber, including chain breaks as extra residues
+    |    .-- original resname, not necessarily sequential, may contain letters for insertion codes
+    |    | .-- one-letter chain ID
+    |    | | .-- amino acid sequence in one letter code
+    |    | | |  .-- secondary structure summary based on columns 19-38
+    |    | | |  |.-- PPII (kappa) helix
+    |    | | |  ||.-- 3-10 helix
+    |    | | |  |||.-- alpha helix
+    |    | | |  ||||.-- pi helix
+    |    | | |  |||||.-- geometrical bend
+    |    | | |  ||||||.-- chirality
+    |    | | |  |||||||.-- beta bridge label
+    |    | | |  ||||||||.-- beta bridge label
+    |    | | |  |||||||||   .-- beta bridge partner resnum
+    |    | | |  |||||||||   |   .-- beta bridge partner resnum
+    |    | | |  |||||||||   |   |.-- beta sheet label
+    |    | | |  |||||||||   |   ||   .-- solvent accessibility
+    |    | | |  |||||||||   |   ||   |
+  #  RESIDUE AA STRUCTURE BP1 BP2  ACC     N-H-->O    O-->H-N    N-H-->O    O-->H-N    TCO  KAPPA ALPHA  PHI   PSI    X-CA   Y-CA   Z-CA
+    1    1 A L              0   0  119      0, 0.0     2,-0.3     0, 0.0    33,-0.2   0.000 360.0 360.0 360.0 168.8    8.7    6.9   63.0
+    2    2 A T  E     -a   34   0A  66     31,-2.0    33,-2.1     1,-0.1     2,-0.7  -0.456 360.0-169.6 -87.8 130.5    7.7    8.8   59.8
+    3    3 A K  E >   -a   35   0A  66     -2,-0.3     3,-1.2    31,-0.2     4,-0.2  -0.850   8.5-179.0-111.3  94.7    7.6    7.5   56.2
+    4    4 A L  G >> S+     0   0   23     31,-2.5     4,-2.9    -2,-0.7     3,-2.0   0.786  71.6  72.4 -65.6 -32.5    7.1   10.6   54.1
+    5    5 A Y  G 34 S+     0   0    2     30,-0.8    -1,-0.3     1,-0.3    31,-0.1   0.709 101.0  46.1 -56.9 -26.7    7.0    8.7   50.7
+    6    6 A Y  G <4 S+     0   0   39     -3,-1.2    -1,-0.3     2,-0.1    -2,-0.2   0.439 115.4  47.1 -93.5  -4.1    3.5    7.4   51.7
+    7    7 A E  T <4 S-     0   0  138     -3,-2.0     2,-0.3     1,-0.2    -2,-0.2   0.825 135.2  -0.3 -99.6 -48.8    2.4   10.9   52.8
+    8    8 A D    ><  -     0   0   57     -4,-2.9     3,-1.4     3,-0.1    -1,-0.2  -0.852  61.5-167.8-144.3 106.0    3.6   13.0   49.9
+</pre>			
+
+    			<p>Below is a brief description of the data columns. More details are described in the Kabsch and Sander paper.</p>
+
+    			<h3>RESIDUE</h3>
+			<p>Two columns of residue numbers. First column is DSSP's sequential residue number, starting at the first residue actually in the model set 
+			and including chain breaks; this number is used to refer to residues throughout. The second column gives the numbering as is used in the 
+			structure model 'residue number','insertion code' and 'chain identifier'; these are given for reference only.</p>
+
+   			<h3>AA</h3>
+			<p>One letter amino acid code, non standard residues are marked as <em>X</em>. CYS in an SS-bridge are marked by a lower case letter. So when cysteines 
+			are bridged, then the first bridged cysteine in the sequence and its partner elsewhere in the sequence are marked <em>a</em>. The next bridged cysteine, 
+			that is not yet marked, and its partner are both marked <em>b</em>, etcetera. Unbridged cysteines remain marked as <em>C</em>.</p>
+
+   			<h3>S (first column in STRUCTURE block)</h3>
+			<p>The one-letter summary of secondary structure, intended to approximate crystallographers' intuition, based on columns 19-38, which are the principal
+			result of DSSP analysis of the atomic coordinates. More details in the Kabsch and Sander paper.</p>
+
+   			<h3>BP1 and BP2</h3>
+			<p>Residue numbers of the first and (if available) second beta bridge partner. The letter marked the B-sheet that contains the bridges.</p> 
+
+   
+			<h3>ACC</h3>
+			<p>Water exposed surface in Angstrom**2. <em>Note:</em>The values for solvent exposure may not mean what you think:
+                        	<ul>
+					<li>Effects leading to larger than expected values: solvent exposure calculation ignores unusual residues, like ACE, or residues with incomplete backbone. 
+					it also ignores HETATOMS, like a heme or metal ligands. Also, side chains may not have all atoms explicitly modeled.</li>
+					<li>Effects leading to smaller than expected values: in complexes, e.g. a dimer, solvent exposure is for the entire assembly, not for the monomer. 
+					Also, atom OXT of c-terminal residues is treated like a side chain atom if it is listed as part of the last residue.</li>
+					<li>Unknown or non-standard residues are named X on output and are not checked for the expected number of sidechain atoms.</li>
+					<li>All explicit water molecules, like other hetatoms, are ignored.</li>
+				</ul>
+			</p>	
+
+   			<h3>N-H-->O etc.</h3>
+			<p>Hydrogen bonds; e.g. -3,-1.4 means that this residue (i) has its HN atom H-bonded to O of residue i-3 with an electrostatic H-bond energy of -1.4 kcal/mol. 
+			There are two columns for each type of H-bond, to allow for bifurcated H-bonds. <em>Note:</em>The marked H-bonds are the best and second best candidate. The second best 
+			and even the best (in rare occasions) may be unrealistically por H-bonds.</p>
+
+			<h3>TCO</h3>
+			<p>The cosine of angle between C=O of residue i and C=O of residue i-1. For &alpha;-helices, TCO is near +1, for &beta;-sheets TCO is near -1. 
+			These values are descriptive and not used for structure definition.</p>
+
+			<h3>KAPPA</h3>
+			<p>Virtual bond angle (bend angle) defined by the three C&alpha; atoms of residues i-2, i, and i+2. Used to define bends (structure code <em>S</em>).</p>
+
+   			<h3>ALPHA</h3>
+			<p>Virtual torsion angle (dihedral angle) defined by the four C&alpha; atoms of residues i-1, i, i+1, and i+2. Used to define chirality (structure code <em>+</em> or <em>-</em>).
+
+   			<h3>PHI and  PSI</h3>
+			<p>The peptide backbone torsion angles as described in the IUPAC standard</p>
+
+   			<h3>X-CA, Y-CA, and Z-CA</h3> 
+			<p>Just a copy of the C&alpha; atom coordinates in the structure model</p>
+		</article>	
+
+		<article>
+                        <a id="mmCIF"></a>
+			<h2>DSSP data in mmCIF files</h2>
+
+			<p>The mmCIF-formatted DSSP output caries the same information as the DSSP format but in a more scalable way and with a formal description caputered in 
+			an mmCIF dictionary. It is designed to be machine readable. Developers who create software to read these annotations can use our 
+			<a href="https://github.com/PDB-REDO/dssp/blob/trunk/mmcif_pdbx/dssp-extension.dic" target="_BLANK">extension to the mmCIF dictionary</a> on GitHub.
+			<em>Note:</em> For sake of speed the solvent accessibility is not calculated by default when using mmCIF output. The command-line switch 
+			<code>--calculate-accessibility</code> can be used to switch this feature on.
+			</p>
 		</article>

 	</div>

--- a/docroot/download.html
+++ b/docroot/download.html
@@ -38,7 +38,7 @@
 			For your convenience we have already run DSSP on the entire Protein Data Bank. All entries are available in <a z2:href="@{/about#mmCIF}">mmCIF format</a> 
 			and, if they fit, in the <a z2:href="@{/about#DSSP}">legacy DSSP format.</a>

-			<h2>Manual download (single entries) via the website or using <code>wget</code>:</h2>
+			<h2>Manually download (single entries) via the website or using <code>wget</code>:</h2>
 			<ul>
 				<li><code>wget https://pdb-redo.eu/dssp/9xyz.cif.gz</code> downloads annotated mmCIF file</li>
 				<li><code>wget https://pdb-redo.eu/dssp/9xyz.dssp</code> downloads the legacy DSSP file (if available)</li>