Even more documentation

b737dd7d · Maarten L. Hekkelman · 137ffaf7 · b737dd7d · b737dd7d · b737dd7d
Commit b737dd7d authored Sep 13, 2023 by Maarten L. Hekkelman
Showing with 148 additions and 29 deletions

CMakeLists.txt
+2 -2

README.md
+28 -15

docs/bitsandpieces.rst
+49 -0

docs/compound.rst
+13 -0

docs/index.rst
+3 -0

docs/model.rst
+37 -0

include/cif++/compound.hpp
+3 -0

include/cif++/model.hpp
+11 -11

src/model.cpp
+2 -1

No files found.
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -53,7 +53,7 @@ endif()
 option(BUILD_SHARED_LIBS "Build a shared library instead of a static one" OFF)
 # Build documentation?
-option(BUILD_DOC "Build the documentation" OFF)
+option(BUILD_DOCUMENTATION "Build the documentation" OFF)
 # We do not want to write an export file for all our symbols...
 set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON)
@@ -480,7 +480,7 @@ if(CIFPP_INSTALL_UPDATE_SCRIPT)
 	target_compile_definitions(cifpp PUBLIC CACHE_DIR="${CIFPP_CACHE_DIR}")
 endif()
-if(BUILD_DOC)
+if(BUILD_DOCUMENTATION)
 	add_subdirectory(docs)
 endif()

--- a/README.md
+++ b/README.md
-libcifpp
+# libcifpp
-========
 This library contains code to work with mmCIF and legacy PDB files.
-Synopsis
+## Synopsis
--------
 ```c++
 // A simple program counting residues with an OXT atom
@@ -57,8 +55,14 @@ int main(int argc, char *argv[])
 }
 ```
-Requirements
+## Installation
------------
+You might be able to use libcifpp from a package manager used by your
+OS distribution. But most likely this package will be out-of-date.
+Therefore it is recommended to build *libcifpp* from code. It is not
+hard to do.
+### Requirements
 The code for this library was written in C++17. You therefore need a
 recent compiler to build it. For the development gcc 9.4 and clang 9.0
@@ -66,6 +70,7 @@ have been used as well as MSVC version 2019.
 Other requirements are:
+- [cmake](https://cmake.org) A build tool.
 - [mrc](https://github.com/mhekkel/mrc), a resource compiler that
  allows including data files into the executable making them easier to
  install. Strictly speaking this is optional, but at the expense of
@@ -76,20 +81,20 @@ Other requirements are:
  `libeigen3-dev`
 - zlib, the development version of this library. On Debian/Ubuntu this
  is the package `zlib1g-dev`.
- [boost](https://www.boost.org). The boost libraries are only needed if
+- [boost](https://www.boost.org).
-  you want to build the testing code.
 When building using MS Visual Studio, you will also need [libzeep](https://github.com/mhekkel/libzeep)
 since MSVC does not yet provide a C++ template required by libcifpp.
-Building
+The Boost libraries are only needed in case you want to build the test
--------
+code or if you are using GCC. That last condition is due to a long
+standing bug in the implementation of std::regex. It simply crashes
+on the regular expressions used in the mmcif_pdbx dictionary and so
+we use the boost regex implementation instead.
-This library uses [cmake](https://cmake.org). The usual way of building
+### Building
-and installing is to create a `build` directory and run cmake there.
-On linux e.g. you would issue the following commands to build and install
+Building the code is as simple as typing:
-libcifpp in your `$HOME/.local` folder:
 ```console
 git clone https://github.com/PDB-REDO/libcifpp.git --recurse-submodules
@@ -104,4 +109,12 @@ where cmake stores its files. Run a configure, build the code and then
 it installs the library and auxiliary files.
 If you want to run the tests before installing, you should add `-DENABLE_TESTING=ON`
-to the first cmake command.
+to the first cmake command. So that would be:
+```console
+ git clone https://github.com/PDB-REDO/libcifpp.git --recurse-submodules
+ cd libcifpp
+ cmake -S . -B build -DCMAKE_INSTALL_PREFIX=$HOME/.local -DCMAKE_BUILD_TYPE=Release -DENABLE_TESTING=ON
+ cmake --build build
+ ctest --test-dir build
+```
--- a/docs/bitsandpieces.rst
+++ b/docs/bitsandpieces.rst
+Bits & Pieces
+=============
+The *libcifpp* library offers some extra code that makes the life of developers a bit easier.
+gzio
+----
+To work with compressed data files a *std::streambuf* implemenation was added based on the code in `gxrio <https://github.com/mhekkel/gxrio>`_. This allows you to read and write compressed data streams transparently.
+When working with files you can use :cpp:class:`cif::gzio::ifstream` and :cpp:class:`cif::gzio::ofstream`. The selection of whether to use compression or not is based on the file extension. If it is ``.gz`` gzip compression is used:
+.. code-block:: cpp
+	cif::gzio::ifstream file("my-file.txt.gz");
+	std::string line;
+	while (std::getline(file, line))
+		std::cout << line << '\n';
+Writing is equally easy:
+.. code-block:: cpp
+	cif::gzio::ofstream file("/tmp/output.txt.gz");
+	file << "Hello, world!";
+	file.close();
+You can also use the :cpp:class:`cif::gzio:istream` and feed it a *std::streambuf* object that may or may not contain compressed data. In that case the first bytes of the input are sniffed and if it is gzip compressed data, decompression will be done.
+A progress bar
+--------------
+Applications based on *libcifpp* may have a longer run time. To give some feedback to the user running your application in a terminal you can use the :cif:class:`cif::progress_bar`. This class will display an ASCII progress bar along with optional status messages, but only if output is to a real TTY (terminal).
+A progress bar is also shown only if the duration is more than two seconds. To avoid having flashing progress bars for short actions.
+The progress bar uses an internal progress counter that starts at zero and ends when the max value has been reached after which it will be removed from the screen. Updating this internal progress counter can be done by adding a number of steps calling :cpp:func:`cif::progress_bar::consumed` or by setting the exact value for the counter by calling :cpp:func:`cif::progress_bar::progress`.
+Colouring output
+----------------
+It is also nice to emphasise some output in the terminal by using colours. For this you can create output manipulators using :cpp:func:`cif::coloured`. To write a string in white, and bold letters on a red background you can do:
+.. code-block:: cpp
+	using namespace cif::colour;
+	std::cout << cif::coloured("Hello, world!", white, red, bold) << '\n';
--- a/docs/compound.rst
+++ b/docs/compound.rst
+Chemical Compounds
+==================
+The data in *CIF* and *mmCIF* files often describes the structure of some chemical compounds. The structure is recorded in the categories *atom_site* and friends. Records in these categories refer to chemical compounds using a compound ID. This compound ID is the ID field of the *chem_comp* category. For all of the known compounds in the PDB there is an entry in the Chemical Compounds Dictionary or `CCD <https://www.wwpdb.org/data/ccd>`_. If *libcifpp* was properly installed you have a copy of this file somewhere on your disk. And if you have installed the update scripts, a fresh version of this file will be retrieved weekly.
+As an alternative to CCD there are the monomer library files from `CCP4 <https://www.ccp4.ac.uk/>`_. These contain somewhat different data but the overlap is good enough for usage in *libcifpp*.
+Information about compounds is captured in the :cpp:class:`cif::compound`. An instance of a compound object for a certain compound ID can be obtained by using the singleton :cpp:class:`cif::compound_factory`.
+If the compound you want to use is not available in the CCD or in CCP4, you can add that information yourself. For this you can use the method :cpp:func:`cif::compound_factory::push_dictionary`.
+So, given that we have CCD, CCP4 monomer library and used defined compound definitions, what will you get when you try to retrieve such a compound by ID? The answer is, the factory has a stack of compound generators. The first thrown on the stack is the one for a CCD file (*components.cif*) if it can be found. Then, if the *CLIBD_MON* environmental variable is defined, a generator for monomer library files is added to the stack. And then all generators for files you added using *push_dictionary* are added in order. The generators are searched in the reverse order in which they were added to see if it creates a compound object for the ID. If no compound was created at all, nullptr is returned.
\ No newline at end of file
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -36,8 +36,11 @@ Using *libcifpp* is easy, if you are familiar with modern C++:
   self
   basics.rst
+   compound.rst
+   model.rst
   resources.rst
   symmetry.rst
+   bitsandpieces.rst
   api/library_root.rst
   genindex.rst
--- a/docs/model.rst
+++ b/docs/model.rst
+Molecular Model
+===============
+Theoretically it is possible to get along with only the classes *cif::file*, *cif::datablock* and *cif::category*. But to keep your data complete and valid you then have to update lots of categories for all but the simplest manipulations. For this *libcifpp* comes with a higher level API modelling atoms, residues, monomers, polymers and complete structures in their respective classes.
+Note that these classes only work properly if you are using *mmCIF* files and have an mmcif_pdbx dictionary available, either compiled in using `mrc <https://github.com/mhekkel/mrc.git>`_ or installed in the proper location.
+.. note::
+	This part of *libcifpp* is the least developed part. What is available should work but functionality should eventually be extended.
+Atom
+----
+The :cpp:class:`cif::mm::atom` is a lightweight proxy class giving access to the data stored in *atom_site* and *atom_site_anisotrop*. It only caches the most often used item data and every modification is directly written back into the *mmCIF* categories.
+Atoms can be copied by value with low cost. The atom class only contains a pointer to an implementation that is reference counted.
+Residue, Monomer and Polymer
+----------------------------
+The :cpp:class:`cif::mm::residue`, :cpp:class:`cif::mm::monomer` and :cpp:class:`cif::mm::polymer` implement what you'd expect. A monomer is a residue that is part of a polymer and thus has a sequence number and siblings.
+Sugars & Branches
+-----------------
+There are also classes for modelling sugars and sugar branches. You can create sugar branches
+Structure
+---------
+The :cpp:class:`cif::mm::structure` can be used to load one of the models from an *mmCIF* file. By default the first model is loaded. (Multiple models are often only available files containing structures defined using NMR).
+A structure holds a reference to a *cif::datablock* and retrieves its data from this datablock and writes any modification back into that datablock.
+One of the most useful parts of the structure class is the ability to create and modify residues. This updates related *chem_comp* and *entity* categories as well.
\ No newline at end of file
--- a/include/cif++/compound.hpp
+++ b/include/cif++/compound.hpp
@@ -48,6 +48,9 @@
 /// may also be generated from the CCP4 monomer library.
 ///
 /// Note that the information in CCP4 and CCD is not equal.
+///
+/// See also :doc:`/compound` for more information.
 namespace cif
 {

--- a/include/cif++/model.hpp
+++ b/include/cif++/model.hpp
@@ -1031,17 +1031,17 @@ class structure
 	/// \brief Create a new and empty (sugar) branch
 	branch &create_branch();
-	/// \brief Create a new (sugar) branch with one first NAG containing atoms constructed from \a atoms
+	// /// \brief Create a new (sugar) branch with one first NAG containing atoms constructed from \a atoms
-	branch &create_branch(std::vector<row_initializer> atoms);
+	// branch &create_branch(std::vector<row_initializer> atoms);
-	/// \brief Extend an existing (sugar) branch identified by \a asymID with one sugar containing atoms constructed from \a atom_info
+	// /// \brief Extend an existing (sugar) branch identified by \a asymID with one sugar containing atoms constructed from \a atom_info
-	///
+	// ///
-	/// \param asym_id      The asym id of the branch to extend
+	// /// \param asym_id      The asym id of the branch to extend
-	/// \param atom_info    Array containing the info for the atoms to construct for the new sugar
+	// /// \param atom_info    Array containing the info for the atoms to construct for the new sugar
-	/// \param link_sugar   The sugar to link to, note: this is the sugar number (1 based)
+	// /// \param link_sugar   The sugar to link to, note: this is the sugar number (1 based)
-	/// \param link_atom    The atom id of the atom linked in the sugar
+	// /// \param link_atom    The atom id of the atom linked in the sugar
-	branch &extend_branch(const std::string &asym_id, std::vector<row_initializer> atom_info,
+	// branch &extend_branch(const std::string &asym_id, std::vector<row_initializer> atom_info,
-		int link_sugar, const std::string &link_atom);
+	// 	int link_sugar, const std::string &link_atom);
 	/// \brief Remove \a branch
 	void remove_branch(branch &branch);

--- a/src/model.cpp
+++ b/src/model.cpp
@@ -2840,7 +2840,8 @@ void reconstruct_pdbx(datablock &db)
 	if (db.get("atom_site") == nullptr)
 		throw std::runtime_error("Cannot reconstruct PDBx file, atom data missing");
+	assert(false);
+	throw std::runtime_error("not implemented yet");
 }
 } // namespace pdbx