More documentation

Version bump

More documentation
Version bump
84af564a · Maarten L. Hekkelman · 86d95767 · 84af564a · 84af564a · 84af564a
Commit 84af564a authored Sep 13, 2023 by Maarten L. Hekkelman
13 changed files
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -404,7 +404,7 @@ install(FILES
 set(cifpp_MAJOR_VERSION ${CMAKE_PROJECT_VERSION_MAJOR})
 set_target_properties(cifpp PROPERTIES
 	VERSION ${PROJECT_VERSION}
-	SOVERSION ${cifpp_MAJOR_VERSION}
+	SOVERSION "${cifpp_MAJOR_VERSION}.${cifpp_MINOR_VERSION}"
 	INTERFACE_cifpp_MAJOR_VERSION ${cifpp_MAJOR_VERSION})
 set_property(TARGET cifpp APPEND PROPERTY

--- a/README.md
+++ b/README.md
 libcifpp
 ========
-This library contains code to work with mmCIF and PDB files.
+This library contains code to work with mmCIF and legacy PDB files.
 Synopsis
 --------
@@ -55,7 +55,6 @@ int main(int argc, char *argv[])
    return 0;
 }
 ```
 Requirements

--- a/docs/basics.rst
+++ b/docs/basics.rst
+Basic usage
+===========
+This library, *libcifpp*, is a generic *CIF* library with some specific additions to work with *mmCIF* files. The main focus of this library is to make sure that files read or written are valid. That is, they are syntactically valid *and* their content is valid with respect to a CIF dictionary, if such a dictionary is available and specified.
+Reading a file is as simple as:
+.. code-block:: cpp
+    #include <cif++.hpp>
+    cif::file f("/path/to/file.cif");
+The file may also be compressed using *gzip* which is detected automatically.
+Writing out the file again is also simple, to write out the terminal you can do:
+.. code-block:: cpp
+    std::cout << f;
+    // or
+    f.save(std::cout);
+    // or write a compressed file using gzip compression:
+    f.save("/tmp/f.cif.gz");
+CIF files contain one or more datablocks. To print out the names of all datablocks in our file:
+.. code-block:: cpp
+    for (auto &db : f)
+        std::cout << db.name() << '\n';
+Most often *libcifpp* is used to read in structure files in mmCIF format. These files only contain one datablock and so you can safely use code like this:
+.. code-block:: cpp
+    // get a reference to the first datablock in f
+    auto &db = f.front();
+But if you know the name of the datablock, this also works:
+.. code-block:: cpp
+    // get a reference to the datablock name '1CBS'
+    auto &db = f["1CBS"];
+Now, each datablock contains categories. To print out all their names:
+.. code-block:: cpp
+    for (auto &cat : db)
+        std::cout << cat.name() << '\n';
+But you probably know what category you need to use, so lets fetch it by name:
+.. _atom_site-label:
+.. code-block:: cpp
+    // get a reference to the atom_site category in db
+    auto &atom_site = db["atom_site"];
+    // and make sure there's some data in it:
+    assert(not atom_site.empty());
+.. note::
+    Note that we omit the leading underscore in the name of the category here.
+Categories contain rows of data and each row has fields or items. Referencing a row in a category results in a :cpp:class:`cif::row_handle` object which you can use to request or manipulate item data.
+.. code-block:: cpp
+    // Get the first row in atom_site
+    auto rh = atom_site.front();
+    // Get the label_atom_id value from this row handle as a std::string
+    std::string atom_id = rh["label_atom_id"].as<std::string>();
+    // Get the x, y and z coordinates using structered binding
+    const auto &[x, y, z] = rh.get<float,float,float>("Cartn_x", "Cartn_y", "Cartn_z");
+    // Assign a new value to the x coordinate or our atom
+    rh["Cartn_x"] = x + 1;
+Querying
+--------
+Walking over the rows in a category is often not very useful. More often you are interested in specific rows in a category. The function :cpp:func:`cif::category::find` and friends are here to help.
+What these functions have in common is that they return data based on a query implemented by :cpp:class:`cif::condition`. These condition objects are built in code using regular C++ syntax. The most basic example of a query is:
+.. code-block:: cpp
+    cif::condition c = cif::key("id") == 1;
+Here the condition is that all rows returned should have a value of 1 in there item named *id*. Likewise you can use other data types and even combine those. Oh, and I said we use regular C++ syntax for conditions, so you may as well use other operators to compare values:
+.. code-block:: cpp
+    // condition for C-alpha atoms having an occupancy less than 1.0
+    cif::condition c = cif::key("occupancy") < 1.0f and cif::key("label_atom_id") == "CA";
+Using the namespace *cif::literals* that code becomes a little less verbose:
+.. code-block:: cpp
+    using namespace cif::literals;
+    cif::condition c = "occupancy"_key < 1.0f and "label_atom_id"_key == "CA";
+Conditions can also be combined:
+.. code-block:: cpp
+    cif::condition c = "occupancy"_key < 1.0f and "label_atom_id"_key == "CA";
+    // extend the condition by requiring the compound ID to be unequal to PRO
+    c = std::move(c) and "label_comp_id"_key != "PRO";
+.. note::
+    Note the use of std::move here. 
+Using queries constructed in this way is simple:
+.. code-block:: cpp
+    cif::condition c = ...
+    auto result = atom_site.find(std::move(c));
+    // or construct a condition inline:
+    auto result = atom_site.find("label_atom_id"_key == "CA");
+In the example above the result is a range of :cpp:class:`cif::row_handle` objects. Often, using individual field values is more useful:
+.. code-block:: cpp
+    // Requesting a single item:
+    for (auto id : atom_site.find<std::string>("label_atom_id"_key == "CA", "id"))
+        std::cout << "ID for CA: " << id << '\n';
+    // Requesting multiple items:
+    for (const auto &[id, x, y, z] : atom_site.find<std::string,float,float,float>("label_atom_id"_key == "CA",
+            "id", "Cartn_x", "Cartn_y", "Cartn_z"))
+    {
+        std::cout << "Atom " << id << " is at [" << x << ", " << y << ", " z << "]\n";
+    }
+Returning a complete set if often not required, if you only want to have the first you can use :cpp:func:`cif::category::find_first` as shown here:
+.. code-block:: cpp
+    // return the ID item for the first C-alpha atom
+    std::string v1 = atom_site.find_first<std::string>("label_atom_id"_key == "CA", "id");
+    // If you're not sure the row exists, use std::optional
+    auto v2 = atom_site.find_first<std::optional<std::string>>("label_atom_id"_key == "CA", "id");
+    if (v2.has_value())
+        ...
+There are cases when you really need exactly one result. The :cpp:func:`cif::category::find1` can be used in that case, it will throw an exception if the query does not result in exactly one row.
+Validation
+----------
+CIF files can have a dictionary attached. And based on such a dictionary a :cpp:class:`cif::validator` object can be constructed which in turn can be used to validate the content of the file.
+A simple case:
+.. code-block:: cpp
+    #include <cif++.hpp>
+    cif::file f("1cbs.cif.gz");
+    f.load_dictionary("mmcif_pdbx");
+    if (not f.is_valid())
+        std::cout << "This file is not valid\n";
+If you want to know why it is not valid, you should set the global variable :cpp:var:`cif::VERBOSE` to something higer than zero. Depending on the value more or less diagnostic output is sent to std::cerr.
+In the case above we load a dictionary based on its name. You can of course also load dictionaries based on a specific file, that's a bit more work:
+.. code-block:: cpp
+    std::filesystem::ifstream dictFile("/tmp/my-dictionary.dic");
+    auto &validator = cif::parse_dictionary("my-dictionary", dictFile);
+    cif::file f("1cbs.cif.gz");
+    // assign the validator
+    f.set_validator(&validator);
+    // alternatively, load it by name
+    f.load_dictionary("my-dictionary");
+    if (not f.is_valid())
+        std::cout << "This file is not valid\n";
+Creating your own dictionary is a lot of work, especially if you are only extending an existing dictionary with a couple of new categories or items. So, what you can do is extend a loaded validator like this (code taken from DSSP):
+.. code-block:: cpp
+    // db is a cif::datablock reference containing an mmCIF file with DSSP annotations
+    auto &validator = const_cast<cif::validator &>(*db.get_validator());
+    if (validator.get_validator_for_category("dssp_struct_summary") == nullptr)
+    {
+        auto dssp_extension = cif::load_resource("dssp-extension.dic");
+        if (dssp_extension)
+            cif::extend_dictionary(validator, *dssp_extension);
+    }
+.. note::
+    In the example above we're loading the data using :doc:`/resources`. See the documentation on that for more information.
+If a validator has been assigned to a file, assignments to items are checked for valid data. So the following code will throw an exception (see: :ref:`_atom_site-label`):
+.. code-block:: cpp
+    auto rh = atom_site.front();
+    rh["Cartn_x"] = "foo";
+Linking
+-------
+Based on information recorded in dictionary files (see :ref:`Validation`) you can locate linked records in parent or child categories.
+To make this example not too complex, lets assume the following example file:
+.. code-block:: cif
+    data_test
+    loop_
+    _cat_1.id
+    _cat_1.name
+    _cat_1.desc
+    1 aap  Aap
+    2 noot Noot
+    3 mies Mies
+    loop_
+    _cat_2.id
+    _cat_2.name
+    _cat_2.num
+    _cat_2.desc
+    1 aap  1 'Een dier'
+    2 aap  2 'Een andere aap'
+    3 noot 1 'walnoot bijvoorbeeld'
+And we have a dictionary containing the following link definition:
+.. code-block:: cif
+    loop_
+    _pdbx_item_linked_group_list.parent_category_id
+    _pdbx_item_linked_group_list.link_group_id
+    _pdbx_item_linked_group_list.parent_name
+    _pdbx_item_linked_group_list.child_name
+    _pdbx_item_linked_group_list.child_category_id
+    cat_1 1 '_cat_1.name' '_cat_2.name' cat_2
+So, there are links between *cat_1* and *cat_2* based on the value in items named *name*. Using this information, we can now locate children and parents:
+.. code-block:: cpp
+    // Assuming the file was loaded in f:
+    auto &cat1 = f.front()["cat_1"];
+    auto &cat2 = f.front()["cat_2"];
+    auto &cat3 = f.front()["cat_3"];
+    // Loop over all ape's in cat2
+    for (auto r : cat1.get_children(cat1.find1("name"_key == "aap"), cat2))
+        std::cout << r.get<std::string>("desc") << '\n';
+Updating a value in an item in a parent category will update the corresponding value in all related children:
+.. code-block:: cpp
+    auto r1 = cat1.find1("id"_key == 1);
+    r1["name"] = "aapje";
+    auto rs1 = cat2.find("name"_key == "aapje");
+    assert(rs1.size() == 2);
+However, changing a value in a child record will not update the parent. This may result in an invalid file since you may then have a child that has no parent:
+.. code-block:: cpp
+    auto r2 = cat2.find1("id"_key == 3);
+    r2["name"] = "wim";
+    assert(f.is_valid() == false);
+So you have to fix this yourself by inserting a new item in cat1 with the new value.
+.. _splitting-rows:
+Another situation is when you change a value in a parent and updating children might introduce a situation where you need to split a child. To give an example, consider this:
+.. code-block:: cif
+    data_test
+    loop_
+    _cat_1.id
+    _cat_1.name
+    _cat_1.desc
+    1 aap  Aap
+    2 noot Noot
+    3 mies Mies
+    loop_
+    _cat_2.id
+    _cat_2.name
+    _cat_2.num
+    _cat_2.desc
+    1 aap  1 'Een dier'
+    2 aap  2 'Een andere aap'
+    3 noot 1 'walnoot bijvoorbeeld'
+    loop_
+    _cat_3.id
+    _cat_3.name
+    _cat_3.num
+    1 aap 1
+    2 aap 2
+And we have a dictionary containing the following link definition (reversed compared to the previous example):
+.. code-block:: cif
+    loop_
+    _pdbx_item_linked_group_list.parent_category_id
+    _pdbx_item_linked_group_list.link_group_id
+    _pdbx_item_linked_group_list.parent_name
+    _pdbx_item_linked_group_list.child_name
+    _pdbx_item_linked_group_list.child_category_id
+    cat_2 1 '_cat_2.name' '_cat_1.name' cat_1
+    cat_3 1 '_cat_3.name' '_cat_2.name' cat_2
+    cat_3 1 '_cat_3.num'  '_cat_2.num'  cat_2
+So *cat3* is a parent of *cat2* and *cat2* is a parent of *cat1*. Now, if you change the *name* value of the first row of *cat3* to 'aapje', the corresponding row in *cat2* is updated as well. But when you update *cat2* you have to update *cat1* too. And simply changing the name field in row 1 of *cat1* is wrong. The default behaviour in libcifpp is to split the record in *cat1* and have a new child with the new name whereas the other remains as is.
+The new *cat1* will thus be like:
+.. code-block:: cif
+    loop_
+    _cat_1.id
+    _cat_1.name
+    _cat_1.desc
+    1 aapje Aap
+    2 noot  Noot
+    3 mies  Mies
+    5 aap   Aap
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -3,9 +3,9 @@ Introduction
 Information on 3D structures of proteins originally came formatted in `PDB <http://www.wwpdb.org/documentation/file-format-content/format33/v3.3.html>`_ files. Although the specification for this format had some real restrictions like a mandatory HEADER and CRYST line, many programs implemented this very poorly often writing out only ATOM records. And users became used to this.
-The PDB format has some severe limitations rendering it useless for all but very small protein structures. A new format called `mmCIF <https://mmcif.wwpdb.org/>`_ has been around for decades and now is the default format for the Protein Data Bank.
+The legacy PDB format has some severe limitations rendering it useless for all but very small protein structures. A new format called `mmCIF <https://mmcif.wwpdb.org/>`_ has been around for decades and now is the default format for the Protein Data Bank.
-The software developed in the `PDB-REDO <https://pdb-redo.eu/>`_ project aims at improving 3D models based on original experimental data. For this, the tools need to be able to work with both PDB and mmCIF files. A decision was made to make mmCIF leading internally in all programs and convert PDB directly into mmCIF before processing the data. A robust conversion had to be developed to make this possible since, as noted above, files can come with more or less information making it sometimes needed to do a sequence alignment to find out the exact residue numbers.
+The software developed in the `PDB-REDO <https://pdb-redo.eu/>`_ project aims at improving 3D models based on original experimental data. For this, the tools need to be able to work with both legacy PDB and mmCIF files. A decision was made to make mmCIF leading internally in all programs and convert legacy PDB directly into mmCIF before processing the data. A robust conversion had to be developed to make this possible since, as noted above, files can come with more or less information making it sometimes needed to do a sequence alignment to find out the exact residue numbers.
 And so libcif++ came to life, a library to work with mmCIF files. Work on this library started early 2017 and has developed quite a bit since then. To reduce dependency on other libraries, some functionality was added that is not strictly related to reading and writing mmCIF files but may be useful nonetheless. This is mostly code that is used in 3D calculations and symmetry operations.
@@ -18,16 +18,26 @@ The main part of the library is a set of classes that work with mmCIF files. The
 * :cpp:class:`cif::datablock`
 * :cpp:class:`cif::category`
-The :cpp:class:`cif::file` class encapsulates, you guessed it, the contents of a mmCIF file. In such a file there are one or more :cpp:class:`cif::datablock`s and each datablock contains one or more :cpp:class:`cif::category`s.
+The :cpp:class:`cif::file` class encapsulates the contents of a mmCIF file. In such a file there are one or more :cpp:class:`cif::datablock` objects and each datablock contains one or more :cpp:class:`cif::category` objects.
+Synopsis
+--------
+Using *libcifpp* is easy, if you are familiar with modern C++:
+.. literalinclude:: ../README.md
+	:language: c++
+	:start-after: ```c++
+	:end-before: ```
 .. toctree::
   :maxdepth: 2
   :caption: Contents
   self
+   basics.rst
   resources.rst
+   symmetry.rst
   api/library_root.rst
   genindex.rst
--- a/docs/symmetry.rst
+++ b/docs/symmetry.rst
+Symmetry & Geometry
+===================
+Although not really a core *CIF* functionality, when working with *mmCIF* files you often need to work with symmetry information. And symmetry works on points in a certain space and thus geometry calculations are also something you need often. Former versions of *libcifpp* used to use `clipper <http://www.ysbl.york.ac.uk/~cowtan/clipper/doc/index.html>`_ to do many of these calculations, but that introduces a dependency and besides, the way clipper numbers symmetry operations is not completely compatible with the way this is done in the PDB.
+Points
+------
+The most basic type in use is :cpp:type:`cif::point`. It can be thought of as a point in space with three coordinates, but it is also often used as a vector in 3d space. To keep the interface simple there's no separate vector type.
+Many functions are available in :ref:`file_cif++_point.hpp` that work on points. There are functions to calculate the :cpp:func:`cif::distance` between two points and also function to calculate dot products, cross products and dihedral angles between sets of points.
+Quaternions
+-----------
+All operations inside *libcifpp* that perform some kind of rotation use :cpp:type:`cif::quaternion`. The reason to use Quaternions is not only that they are cool, they are faster than multiplying with a matrix and the results also suffer less from numerical instability.
+Matrix
+------
+Although Quaternions are the preferred way of doing rotations, not every manipulation is a rotation and thus we need a matrix class as well. Matrices and their operations are encoded as matrix_expressions in *libcifpp* allowing the compiler to generate very fast code. See the :ref:`file_cif++_matrix.hpp` for what is on offer.
+Symmetry operations
+-------------------
+Each basic symmetry operation in the crystallographic world consists of a matrix multiplication followed by a translation. To apply such an operation on a carthesian coordinate you first have to convert the point into a fractional coordinate with respect to the unit cell of the crystal, then apply the matrix and translation operations and then convert the result back into carthesian coordinates. This is all done by the proper routines in *libcifpp*.
+Symmetry operations are encoded as a string in *mmCIF* PDBx files. The format is a string with the rotational number followed by an underscore and then the encoded translation in each direction where 5 means no translation. So, the identity operator is ``1_555`` meaning that we have rotational number 1 (which is always the identity rotation, point multiplied with the identity matrix) and a translation of zero in each direction.
+To give an idea how this works, here's a piece of code copied from one of the unit tests in *libcifpp*. It takes the *struct_conn* records in a certain PDB file and checks wether the distances in each row correspond to what we can calculate.
+.. code:: cpp
+    // Load the file
+    cif::file f(gTestDir / "2bi3.cif.gz");
+    auto &db = f.front();
+    cif::mm::structure s(db);
+    cif::crystal c(db);
+    auto struct_conn = db["struct_conn"];
+    for (const auto &[
+            asym1, seqid1, authseqid1, atomid1, symm1,
+            asym2, seqid2, authseqid2, atomid2, symm2,
+            dist] : struct_conn.find<
+                std::string,int,std::string,std::string,std::string,
+                std::string,int,std::string,std::string,std::string,
+                float>(
+            cif::key("ptnr1_symmetry") != "1_555" or cif::key("ptnr2_symmetry") != "1_555",
+            "ptnr1_label_asym_id", "ptnr1_label_seq_id", "ptnr1_auth_seq_id", "ptnr1_label_atom_id", "ptnr1_symmetry", 
+            "ptnr2_label_asym_id", "ptnr2_label_seq_id", "ptnr2_auth_seq_id", "ptnr2_label_atom_id", "ptnr2_symmetry", 
+            "pdbx_dist_value"
+        ))
+    {
+        auto &r1 = s.get_residue(asym1, seqid1, authseqid1);
+        auto &r2 = s.get_residue(asym2, seqid2, authseqid2);
+        auto a1 = r1.get_atom_by_atom_id(atomid1);
+        auto a2 = r2.get_atom_by_atom_id(atomid2);
+        auto sa1 = c.symmetry_copy(a1.get_location(), cif::sym_op(symm1));
+        auto sa2 = c.symmetry_copy(a2.get_location(), cif::sym_op(symm2));
+        BOOST_TEST(cif::distance(sa1, sa2) == dist);
+        auto pa1 = a1.get_location();
+        const auto &[d, p, so] = c.closest_symmetry_copy(pa1, a2.get_location());
+        BOOST_TEST(p.m_x == sa2.m_x);
+        BOOST_TEST(p.m_y == sa2.m_y);
+        BOOST_TEST(p.m_z == sa2.m_z);
+        BOOST_TEST(d == dist);
+        BOOST_TEST(so.string() == symm2);
+    }
\ No newline at end of file
--- a/include/cif++/item.hpp
+++ b/include/cif++/item.hpp
@@ -283,7 +283,15 @@ struct item_handle
 	// conversion helper class
 	template <typename T, typename = void>
 	struct item_value_as;
+	/** @endcond */
+	/**
+	 * @brief Assign value @a value to the item referenced
+	 * 
+	 * @tparam T Type of the value
+	 * @param value The value
+	 * @return reference to this item_handle
+	 */
 	template <typename T>
 	item_handle &operator=(const T &value)
 	{
@@ -291,7 +299,6 @@ struct item_handle
 		assign_value(v);
 		return *this;
 	}
-	/** @endcond */
 	/**
 	 * @brief A method with a variable number of arguments that will be concatenated and

--- a/include/cif++/iterator.hpp
+++ b/include/cif++/iterator.hpp
@@ -576,7 +576,7 @@ class conditional_iterator_proxy
 			{
 				if (++mBegin == mEnd)
 					break;
 				if (m_condition->operator()(mBegin))
 					break;
 			}
@@ -678,6 +678,8 @@ conditional_iterator_proxy<Category, Ts...>::conditional_iterator_impl::conditio
 	, mEnd(cat.end(), cix)
 	, m_condition(&cond)
 {
+	if (m_condition == nullptr or m_condition->empty())
+		mBegin = mEnd;
 }
 template <typename Category, typename... Ts>
@@ -702,10 +704,15 @@ conditional_iterator_proxy<Category, Ts...>::conditional_iterator_proxy(Category
 {
 	static_assert(sizeof...(Ts) == sizeof...(Ns), "Number of column names should be equal to number of requested value types");
-	m_condition.prepare(cat);
+	if (m_condition)
+	{
+		m_condition.prepare(cat);
-	while (mCBegin != mCEnd and not m_condition(*mCBegin))
+		while (mCBegin != mCEnd and not m_condition(*mCBegin))
-		++mCBegin;
+			++mCBegin;
+	}
+	else
+		mCBegin == mCEnd;
 	uint16_t i = 0;
 	((mCix[i++] = m_cat->get_column_ix(names)), ...);

--- a/include/cif++/row.hpp
+++ b/include/cif++/row.hpp
@@ -300,7 +300,7 @@ class row_handle
 	/** \brief assign the value @a value to the column named @a name 
 	 * 
 	 * If updateLinked it true, linked records are updated as well.
-	 * That means that if column @name is part of the link definition
+	 * That means that if column @a name is part of the link definition
 	 * and the link results in a linked record in another category
 	 * this record in the linked category is updated as well.
 	 * 

--- a/include/cif++/validate.hpp
+++ b/include/cif++/validate.hpp
@@ -191,6 +191,7 @@ struct category_validator
 {
 	std::string m_name;                         ///< The name of the category
 	std::vector<std::string> m_keys;            ///< The list of items that make up the key
+	cif::iset m_groups;							///< The category groups this category belongs to
 	cif::iset m_mandatory_fields;               ///< The mandatory fields for this category
 	std::set<item_validator> m_item_validators; ///< The item validators for the items in this category

--- a/src/category.cpp
+++ b/src/category.cpp
@@ -921,25 +921,30 @@ condition category::get_parents_condition(row_handle rh, const category &parentC
 	condition result;
-	for (auto &link : m_validator->get_links_for_child(m_name))
+	auto links = m_validator->get_links_for_child(m_name);
+	links.erase(remove_if(links.begin(), links.end(), [n=parentCat.m_name](auto &l) { return l->m_parent_category != n; }), links.end());
+	if (not links.empty())
 	{
-		if (link->m_parent_category != parentCat.m_name)
+		for (auto &link : links)
-			continue;
+		{
+			condition cond;
-		condition cond;
+			for (size_t ix = 0; ix < link->m_child_keys.size(); ++ix)
+			{
+				auto childValue = rh[link->m_child_keys[ix]];
-		for (size_t ix = 0; ix < link->m_child_keys.size(); ++ix)
+				if (childValue.empty())
-		{
+					continue;
-			auto childValue = rh[link->m_child_keys[ix]];
-			if (childValue.empty())
+				cond = std::move(cond) and key(link->m_parent_keys[ix]) == childValue.text();
-				continue;
+			}
-			cond = std::move(cond) and key(link->m_parent_keys[ix]) == childValue.text();
+			result = std::move(result) or std::move(cond);
 		}
-		result = std::move(result) or std::move(cond);
 	}
+	else if (cif::VERBOSE > 0)
+		std::cerr << "warning: no child to parent links were found for child " << parentCat.name() << " and parent " << name() << '\n';
 	return result;
 }
@@ -956,30 +961,35 @@ condition category::get_children_condition(row_handle rh, const category &childC
 	if (childCatValidator != nullptr)
 		mandatoryChildFields = childCatValidator->m_mandatory_fields;
-	for (auto &link : m_validator->get_links_for_parent(m_name))
+	auto links = m_validator->get_links_for_parent(m_name);
+	links.erase(remove_if(links.begin(), links.end(), [n=childCat.m_name](auto &l) { return l->m_child_category != n; }), links.end());
+	if (not links.empty())
 	{
-		if (link->m_child_category != childCat.m_name)
+		for (auto &link : links)
-			continue;
+		{
+			condition cond;
-		condition cond;
+			for (size_t ix = 0; ix < link->m_parent_keys.size(); ++ix)
+			{
+				auto childKey = link->m_child_keys[ix];
+				auto parentKey = link->m_parent_keys[ix];
-		for (size_t ix = 0; ix < link->m_parent_keys.size(); ++ix)
+				auto parentValue = rh[parentKey];
-		{
-			auto childKey = link->m_child_keys[ix];
-			auto parentKey = link->m_parent_keys[ix];
-			auto parentValue = rh[parentKey];
+				if (parentValue.empty())
+					cond = std::move(cond) and key(childKey) == null;
+				else if (link->m_parent_keys.size() > 1 and not mandatoryChildFields.contains(childKey))
+					cond = std::move(cond) and (key(childKey) == parentValue.text() or key(childKey) == null);
+				else
+					cond = std::move(cond) and key(childKey) == parentValue.text();
+			}
-			if (parentValue.empty())
+			result = std::move(result) or std::move(cond);
-				cond = std::move(cond) and key(childKey) == null;
-			else if (link->m_parent_keys.size() > 1 and not mandatoryChildFields.contains(childKey))
-				cond = std::move(cond) and (key(childKey) == parentValue.text() or key(childKey) == null);
-			else
-				cond = std::move(cond) and key(childKey) == parentValue.text();
 		}
-		result = std::move(result) or std::move(cond);
 	}
+	else if (cif::VERBOSE > 0)
+		std::cerr << "warning: no parent to child links were found for parent " << name() << " and child " << childCat.name() << '\n';
 	return result;
 }

--- a/test/REA_v2.cif
+++ b/test/REA_v2.cif
@@ -181,7 +181,7 @@ _pdbx_chem_comp_audit.comp_id
 _pdbx_chem_comp_audit.action_type 
 _pdbx_chem_comp_audit.date 
 _pdbx_chem_comp_audit.processing_site 
-REA_v2 "CREA_v2te component"   1999-07-08 RCSB 
+REA_v2 "Create component"   1999-07-08 RCSB 
 REA_v2 "Modify descriptor"  2011-06-04 RCSB 
 REA_v2 "Other modification" 2016-10-18 RCSB 
 # 
--- a/test/unit-3d-test.cpp
+++ b/test/unit-3d-test.cpp
@@ -517,6 +517,55 @@ BOOST_AUTO_TEST_CASE(symm_2bi3_1, *utf::tolerance(0.1f))
 	}
 }
+BOOST_AUTO_TEST_CASE(symm_2bi3_1a, *utf::tolerance(0.1f))
+{
+	using namespace cif::literals;
+	cif::file f(gTestDir / "2bi3.cif.gz");
+	auto &db = f.front();
+	cif::crystal c(db);
+	auto struct_conn = db["struct_conn"];
+	auto atom_site = db["struct_conn"];
+	for (const auto &[
+			asym1, seqid1, authseqid1, atomid1, symm1,
+			asym2, seqid2, authseqid2, atomid2, symm2,
+			dist] : struct_conn.find<
+				std::string,int,std::string,std::string,std::string,
+				std::string,int,std::string,std::string,std::string,
+				float>(
+			cif::key("ptnr1_symmetry") != "1_555" or cif::key("ptnr2_symmetry") != "1_555",
+			"ptnr1_label_asym_id", "ptnr1_label_seq_id", "ptnr1_auth_seq_id", "ptnr1_label_atom_id", "ptnr1_symmetry", 
+			"ptnr2_label_asym_id", "ptnr2_label_seq_id", "ptnr2_auth_seq_id", "ptnr2_label_atom_id", "ptnr2_symmetry", 
+			"pdbx_dist_value"
+		))
+	{
+		cif::point p1 = atom_site.find1<float,float,float>(
+			"label_asym_id"_key == asym1 and "label_seq_id"_key == seqid1 and "auth_seq_id"_key == authseqid1 and "label_atom_id"_key == atomid1,
+			"cartn_x", "cartn_y", "cartn_z");
+		cif::point p2 = atom_site.find1<float,float,float>(
+			"label_asym_id"_key == asym2 and "label_seq_id"_key == seqid2 and "auth_seq_id"_key == authseqid2 and "label_atom_id"_key == atomid2,
+			"cartn_x", "cartn_y", "cartn_z");
+		auto sa1 = c.symmetry_copy(p1, cif::sym_op(symm1));
+		auto sa2 = c.symmetry_copy(p2, cif::sym_op(symm2));
+		BOOST_TEST(cif::distance(sa1, sa2) == dist);
+		const auto &[d, p, so] = c.closest_symmetry_copy(p1, p2);
+		BOOST_TEST(p.m_x == sa2.m_x);
+		BOOST_TEST(p.m_y == sa2.m_y);
+		BOOST_TEST(p.m_z == sa2.m_z);
+		BOOST_TEST(d == dist);
+		BOOST_TEST(so.string() == symm2);
+	}
+}
 BOOST_AUTO_TEST_CASE(symm_3bwh_1, *utf::tolerance(0.1f))
 {

--- a/test/unit-v2-test.cpp
+++ b/test/unit-v2-test.cpp
@@ -1865,6 +1865,15 @@ _test.name
 	BOOST_TEST(db["test"].find_first<int>(cif::key("id") == 1, "id") == 1);
 	BOOST_TEST(db["test"].find_first<int>(cif::all(), "id") == 1);
+	std::optional<int> v;
+	v = db["test"].find_first<std::optional<int>>(cif::key("id") == 1, "id");
+	BOOST_TEST(v.has_value());
+	BOOST_TEST(*v == 1);
+	v = db["test"].find_first<std::optional<int>>(cif::key("id") == 6, "id");
+	BOOST_TEST(not v.has_value());
 	// find1 tests
 	BOOST_TEST(db["test"].find1<int>(cif::key("id") == 1, "id") == 1);
 	BOOST_CHECK_THROW(db["test"].find1<int>(cif::all(), "id"), cif::multiple_results_error);
@@ -1882,7 +1891,7 @@ BOOST_AUTO_TEST_CASE(r1)
 	    of pdbx_nonpoly_scheme which itself is a parent of pdbx_entity_nonpoly. If I want to rename a residue
 	    I cannot update pdbx_nonpoly_scheme since changing a parent changes children, but not vice versa.
-	    But if I change the comp_id in atom_site, the pdbx_nonpoly_scheme is update, that's good, and then
+	    But if I change the comp_id in atom_site, the pdbx_nonpoly_scheme is updated, that's good, and then
 	    pdbx_entity_nonpoly is updated and that's bad.
 	    The idea is now that if we update a parent and a child that must change as well, we first check
@@ -2168,6 +2177,228 @@ _cat_3.num
 	// f.save(std::cout);
 }
+BOOST_AUTO_TEST_CASE(pc_1)
+{
+	/*
+	    Parent/child tests
+		Note that the dictionary is different than the one in test r1
+	*/
+	const char dict[] = R"(
+data_test_dict.dic
+    _datablock.id	test_dict.dic
+    _datablock.description
+;
+    A test dictionary
+;
+    _dictionary.title           test_dict.dic
+    _dictionary.datablock_id    test_dict.dic
+    _dictionary.version         1.0
+     loop_
+    _item_type_list.code
+    _item_type_list.primitive_code
+    _item_type_list.construct
+               code      char
+               '[][_,.;:"&<>()/\{}'`~!@#$%A-Za-z0-9*|+-]*'
+               text      char
+               '[][ \n\t()_,.;:"&<>/\{}'`~!@#$%?+=*A-Za-z0-9|^-]*'
+               int       numb
+               '[+-]?[0-9]+'
+save_cat_1
+    _category.description     'A simple test category'
+    _category.id              cat_1
+    _category.mandatory_code  no
+    _category_key.name        '_cat_1.id'
+    save_
+save__cat_1.id
+    _item.name                '_cat_1.id'
+    _item.category_id         cat_1
+    _item.mandatory_code      yes
+    _item_linked.child_name   '_cat_2.parent_id'
+    _item_linked.parent_name  '_cat_1.id'
+    _item_type.code           int
+    save_
+save__cat_1.name
+    _item.name                '_cat_1.name'
+    _item.category_id         cat_1
+    _item.mandatory_code      yes
+    _item_type.code           code
+    save_
+save__cat_1.desc
+    _item.name                '_cat_1.desc'
+    _item.category_id         cat_1
+    _item.mandatory_code      yes
+    _item_type.code           text
+    save_
+save_cat_2
+    _category.description     'A second simple test category'
+    _category.id              cat_2
+    _category.mandatory_code  no
+    _category_key.name        '_cat_2.id'
+    save_
+save__cat_2.id
+    _item.name                '_cat_2.id'
+    _item.category_id         cat_2
+    _item.mandatory_code      yes
+    _item_type.code           int
+    save_
+save__cat_2.name
+    _item.name                '_cat_2.name'
+    _item.category_id         cat_2
+    _item.mandatory_code      yes
+    _item_type.code           code
+    save_
+save__cat_2.num
+    _item.name                '_cat_2.num'
+    _item.category_id         cat_2
+    _item.mandatory_code      yes
+    _item_type.code           int
+    save_
+save__cat_2.desc
+    _item.name                '_cat_2.desc'
+    _item.category_id         cat_2
+    _item.mandatory_code      yes
+    _item_type.code           text
+    save_
+save_cat_3
+    _category.description     'A third simple test category'
+    _category.id              cat_3
+    _category.mandatory_code  no
+    _category_key.name        '_cat_3.id'
+    save_
+save__cat_3.id
+    _item.name                '_cat_3.id'
+    _item.category_id         cat_3
+    _item.mandatory_code      yes
+    _item_type.code           int
+    save_
+save__cat_3.name
+    _item.name                '_cat_3.name'
+    _item.category_id         cat_3
+    _item.mandatory_code      yes
+    _item_type.code           code
+    save_
+save__cat_3.num
+    _item.name                '_cat_3.num'
+    _item.category_id         cat_3
+    _item.mandatory_code      yes
+    _item_type.code           int
+    save_
+loop_
+_pdbx_item_linked_group_list.parent_category_id
+_pdbx_item_linked_group_list.link_group_id
+_pdbx_item_linked_group_list.parent_name
+_pdbx_item_linked_group_list.child_name
+_pdbx_item_linked_group_list.child_category_id
+cat_1 1 '_cat_1.name' '_cat_2.name' cat_2
+cat_2 1 '_cat_2.name' '_cat_3.name' cat_3
+cat_2 1 '_cat_2.num'  '_cat_3.num'  cat_3
+    )";
+	struct membuf : public std::streambuf
+	{
+		membuf(char *text, size_t length)
+		{
+			this->setg(text, text, text + length);
+		}
+	} buffer(const_cast<char *>(dict), sizeof(dict) - 1);
+	std::istream is_dict(&buffer);
+	auto validator = cif::parse_dictionary("test", is_dict);
+	cif::file f;
+	f.set_validator(&validator);
+	// --------------------------------------------------------------------
+	const char data[] = R"(
+data_test
+loop_
+_cat_1.id
+_cat_1.name
+_cat_1.desc
+1 aap  Aap
+2 noot Noot
+3 mies Mies
+loop_
+_cat_2.id
+_cat_2.name
+_cat_2.num
+_cat_2.desc
+1 aap  1 'Een dier'
+2 aap  2 'Een andere aap'
+3 noot 1 'walnoot bijvoorbeeld'
+loop_
+_cat_3.id
+_cat_3.name
+_cat_3.num
+1 aap 1
+2 aap 2
+    )";
+	using namespace cif::literals;
+	struct data_membuf : public std::streambuf
+	{
+		data_membuf(char *text, size_t length)
+		{
+			this->setg(text, text, text + length);
+		}
+	} data_buffer(const_cast<char *>(data), sizeof(data) - 1);
+	std::istream is_data(&data_buffer);
+	f.load(is_data);
+	auto &cat1 = f.front()["cat_1"];
+	auto &cat2 = f.front()["cat_2"];
+	auto &cat3 = f.front()["cat_3"];
+	// some parent/child tests
+	// find all children in cat2 for the row with id == 1 in cat1
+	auto rs1 = cat1.get_children(cat1.find1("id"_key == 1), cat2);
+	BOOST_TEST(rs1.size() == 2);
+	auto rs2 = cat1.get_children(cat1.find1("id"_key == 2), cat2);
+	BOOST_TEST(rs2.size() == 1);
+	auto rs3 = cat1.get_children(cat1.find1("id"_key == 3), cat2);
+	BOOST_TEST(rs3.size() == 0);
+	// finding parents
+	auto rs4 = cat2.get_parents(cat2.find1("id"_key == 1), cat1);
+	BOOST_TEST(rs4.size() == 1);
+	auto rs5 = cat3.get_parents(cat3.find1("id"_key == 1), cat2);
+	BOOST_TEST(rs5.size() == 1);
+	// This link is not defined:
+	auto rs6 = cat3.get_parents(cat3.find1("id"_key == 1), cat1);
+	BOOST_TEST(rs6.size() == 0);
+}
 // --------------------------------------------------------------------
 // BOOST_AUTO_TEST_CASE(bondmap_1)