Commit 84af564a by Maarten L. Hekkelman

More documentation

Version bump
parent 86d95767
...@@ -404,7 +404,7 @@ install(FILES ...@@ -404,7 +404,7 @@ install(FILES
set(cifpp_MAJOR_VERSION ${CMAKE_PROJECT_VERSION_MAJOR}) set(cifpp_MAJOR_VERSION ${CMAKE_PROJECT_VERSION_MAJOR})
set_target_properties(cifpp PROPERTIES set_target_properties(cifpp PROPERTIES
VERSION ${PROJECT_VERSION} VERSION ${PROJECT_VERSION}
SOVERSION ${cifpp_MAJOR_VERSION} SOVERSION "${cifpp_MAJOR_VERSION}.${cifpp_MINOR_VERSION}"
INTERFACE_cifpp_MAJOR_VERSION ${cifpp_MAJOR_VERSION}) INTERFACE_cifpp_MAJOR_VERSION ${cifpp_MAJOR_VERSION})
set_property(TARGET cifpp APPEND PROPERTY set_property(TARGET cifpp APPEND PROPERTY
......
libcifpp libcifpp
======== ========
This library contains code to work with mmCIF and PDB files. This library contains code to work with mmCIF and legacy PDB files.
Synopsis Synopsis
-------- --------
...@@ -55,7 +55,6 @@ int main(int argc, char *argv[]) ...@@ -55,7 +55,6 @@ int main(int argc, char *argv[])
return 0; return 0;
} }
``` ```
Requirements Requirements
......
Basic usage
===========
This library, *libcifpp*, is a generic *CIF* library with some specific additions to work with *mmCIF* files. The main focus of this library is to make sure that files read or written are valid. That is, they are syntactically valid *and* their content is valid with respect to a CIF dictionary, if such a dictionary is available and specified.
Reading a file is as simple as:
.. code-block:: cpp
#include <cif++.hpp>
cif::file f("/path/to/file.cif");
The file may also be compressed using *gzip* which is detected automatically.
Writing out the file again is also simple, to write out the terminal you can do:
.. code-block:: cpp
std::cout << f;
// or
f.save(std::cout);
// or write a compressed file using gzip compression:
f.save("/tmp/f.cif.gz");
CIF files contain one or more datablocks. To print out the names of all datablocks in our file:
.. code-block:: cpp
for (auto &db : f)
std::cout << db.name() << '\n';
Most often *libcifpp* is used to read in structure files in mmCIF format. These files only contain one datablock and so you can safely use code like this:
.. code-block:: cpp
// get a reference to the first datablock in f
auto &db = f.front();
But if you know the name of the datablock, this also works:
.. code-block:: cpp
// get a reference to the datablock name '1CBS'
auto &db = f["1CBS"];
Now, each datablock contains categories. To print out all their names:
.. code-block:: cpp
for (auto &cat : db)
std::cout << cat.name() << '\n';
But you probably know what category you need to use, so lets fetch it by name:
.. _atom_site-label:
.. code-block:: cpp
// get a reference to the atom_site category in db
auto &atom_site = db["atom_site"];
// and make sure there's some data in it:
assert(not atom_site.empty());
.. note::
Note that we omit the leading underscore in the name of the category here.
Categories contain rows of data and each row has fields or items. Referencing a row in a category results in a :cpp:class:`cif::row_handle` object which you can use to request or manipulate item data.
.. code-block:: cpp
// Get the first row in atom_site
auto rh = atom_site.front();
// Get the label_atom_id value from this row handle as a std::string
std::string atom_id = rh["label_atom_id"].as<std::string>();
// Get the x, y and z coordinates using structered binding
const auto &[x, y, z] = rh.get<float,float,float>("Cartn_x", "Cartn_y", "Cartn_z");
// Assign a new value to the x coordinate or our atom
rh["Cartn_x"] = x + 1;
Querying
--------
Walking over the rows in a category is often not very useful. More often you are interested in specific rows in a category. The function :cpp:func:`cif::category::find` and friends are here to help.
What these functions have in common is that they return data based on a query implemented by :cpp:class:`cif::condition`. These condition objects are built in code using regular C++ syntax. The most basic example of a query is:
.. code-block:: cpp
cif::condition c = cif::key("id") == 1;
Here the condition is that all rows returned should have a value of 1 in there item named *id*. Likewise you can use other data types and even combine those. Oh, and I said we use regular C++ syntax for conditions, so you may as well use other operators to compare values:
.. code-block:: cpp
// condition for C-alpha atoms having an occupancy less than 1.0
cif::condition c = cif::key("occupancy") < 1.0f and cif::key("label_atom_id") == "CA";
Using the namespace *cif::literals* that code becomes a little less verbose:
.. code-block:: cpp
using namespace cif::literals;
cif::condition c = "occupancy"_key < 1.0f and "label_atom_id"_key == "CA";
Conditions can also be combined:
.. code-block:: cpp
cif::condition c = "occupancy"_key < 1.0f and "label_atom_id"_key == "CA";
// extend the condition by requiring the compound ID to be unequal to PRO
c = std::move(c) and "label_comp_id"_key != "PRO";
.. note::
Note the use of std::move here.
Using queries constructed in this way is simple:
.. code-block:: cpp
cif::condition c = ...
auto result = atom_site.find(std::move(c));
// or construct a condition inline:
auto result = atom_site.find("label_atom_id"_key == "CA");
In the example above the result is a range of :cpp:class:`cif::row_handle` objects. Often, using individual field values is more useful:
.. code-block:: cpp
// Requesting a single item:
for (auto id : atom_site.find<std::string>("label_atom_id"_key == "CA", "id"))
std::cout << "ID for CA: " << id << '\n';
// Requesting multiple items:
for (const auto &[id, x, y, z] : atom_site.find<std::string,float,float,float>("label_atom_id"_key == "CA",
"id", "Cartn_x", "Cartn_y", "Cartn_z"))
{
std::cout << "Atom " << id << " is at [" << x << ", " << y << ", " z << "]\n";
}
Returning a complete set if often not required, if you only want to have the first you can use :cpp:func:`cif::category::find_first` as shown here:
.. code-block:: cpp
// return the ID item for the first C-alpha atom
std::string v1 = atom_site.find_first<std::string>("label_atom_id"_key == "CA", "id");
// If you're not sure the row exists, use std::optional
auto v2 = atom_site.find_first<std::optional<std::string>>("label_atom_id"_key == "CA", "id");
if (v2.has_value())
...
There are cases when you really need exactly one result. The :cpp:func:`cif::category::find1` can be used in that case, it will throw an exception if the query does not result in exactly one row.
Validation
----------
CIF files can have a dictionary attached. And based on such a dictionary a :cpp:class:`cif::validator` object can be constructed which in turn can be used to validate the content of the file.
A simple case:
.. code-block:: cpp
#include <cif++.hpp>
cif::file f("1cbs.cif.gz");
f.load_dictionary("mmcif_pdbx");
if (not f.is_valid())
std::cout << "This file is not valid\n";
If you want to know why it is not valid, you should set the global variable :cpp:var:`cif::VERBOSE` to something higer than zero. Depending on the value more or less diagnostic output is sent to std::cerr.
In the case above we load a dictionary based on its name. You can of course also load dictionaries based on a specific file, that's a bit more work:
.. code-block:: cpp
std::filesystem::ifstream dictFile("/tmp/my-dictionary.dic");
auto &validator = cif::parse_dictionary("my-dictionary", dictFile);
cif::file f("1cbs.cif.gz");
// assign the validator
f.set_validator(&validator);
// alternatively, load it by name
f.load_dictionary("my-dictionary");
if (not f.is_valid())
std::cout << "This file is not valid\n";
Creating your own dictionary is a lot of work, especially if you are only extending an existing dictionary with a couple of new categories or items. So, what you can do is extend a loaded validator like this (code taken from DSSP):
.. code-block:: cpp
// db is a cif::datablock reference containing an mmCIF file with DSSP annotations
auto &validator = const_cast<cif::validator &>(*db.get_validator());
if (validator.get_validator_for_category("dssp_struct_summary") == nullptr)
{
auto dssp_extension = cif::load_resource("dssp-extension.dic");
if (dssp_extension)
cif::extend_dictionary(validator, *dssp_extension);
}
.. note::
In the example above we're loading the data using :doc:`/resources`. See the documentation on that for more information.
If a validator has been assigned to a file, assignments to items are checked for valid data. So the following code will throw an exception (see: :ref:`_atom_site-label`):
.. code-block:: cpp
auto rh = atom_site.front();
rh["Cartn_x"] = "foo";
Linking
-------
Based on information recorded in dictionary files (see :ref:`Validation`) you can locate linked records in parent or child categories.
To make this example not too complex, lets assume the following example file:
.. code-block:: cif
data_test
loop_
_cat_1.id
_cat_1.name
_cat_1.desc
1 aap Aap
2 noot Noot
3 mies Mies
loop_
_cat_2.id
_cat_2.name
_cat_2.num
_cat_2.desc
1 aap 1 'Een dier'
2 aap 2 'Een andere aap'
3 noot 1 'walnoot bijvoorbeeld'
And we have a dictionary containing the following link definition:
.. code-block:: cif
loop_
_pdbx_item_linked_group_list.parent_category_id
_pdbx_item_linked_group_list.link_group_id
_pdbx_item_linked_group_list.parent_name
_pdbx_item_linked_group_list.child_name
_pdbx_item_linked_group_list.child_category_id
cat_1 1 '_cat_1.name' '_cat_2.name' cat_2
So, there are links between *cat_1* and *cat_2* based on the value in items named *name*. Using this information, we can now locate children and parents:
.. code-block:: cpp
// Assuming the file was loaded in f:
auto &cat1 = f.front()["cat_1"];
auto &cat2 = f.front()["cat_2"];
auto &cat3 = f.front()["cat_3"];
// Loop over all ape's in cat2
for (auto r : cat1.get_children(cat1.find1("name"_key == "aap"), cat2))
std::cout << r.get<std::string>("desc") << '\n';
Updating a value in an item in a parent category will update the corresponding value in all related children:
.. code-block:: cpp
auto r1 = cat1.find1("id"_key == 1);
r1["name"] = "aapje";
auto rs1 = cat2.find("name"_key == "aapje");
assert(rs1.size() == 2);
However, changing a value in a child record will not update the parent. This may result in an invalid file since you may then have a child that has no parent:
.. code-block:: cpp
auto r2 = cat2.find1("id"_key == 3);
r2["name"] = "wim";
assert(f.is_valid() == false);
So you have to fix this yourself by inserting a new item in cat1 with the new value.
.. _splitting-rows:
Another situation is when you change a value in a parent and updating children might introduce a situation where you need to split a child. To give an example, consider this:
.. code-block:: cif
data_test
loop_
_cat_1.id
_cat_1.name
_cat_1.desc
1 aap Aap
2 noot Noot
3 mies Mies
loop_
_cat_2.id
_cat_2.name
_cat_2.num
_cat_2.desc
1 aap 1 'Een dier'
2 aap 2 'Een andere aap'
3 noot 1 'walnoot bijvoorbeeld'
loop_
_cat_3.id
_cat_3.name
_cat_3.num
1 aap 1
2 aap 2
And we have a dictionary containing the following link definition (reversed compared to the previous example):
.. code-block:: cif
loop_
_pdbx_item_linked_group_list.parent_category_id
_pdbx_item_linked_group_list.link_group_id
_pdbx_item_linked_group_list.parent_name
_pdbx_item_linked_group_list.child_name
_pdbx_item_linked_group_list.child_category_id
cat_2 1 '_cat_2.name' '_cat_1.name' cat_1
cat_3 1 '_cat_3.name' '_cat_2.name' cat_2
cat_3 1 '_cat_3.num' '_cat_2.num' cat_2
So *cat3* is a parent of *cat2* and *cat2* is a parent of *cat1*. Now, if you change the *name* value of the first row of *cat3* to 'aapje', the corresponding row in *cat2* is updated as well. But when you update *cat2* you have to update *cat1* too. And simply changing the name field in row 1 of *cat1* is wrong. The default behaviour in libcifpp is to split the record in *cat1* and have a new child with the new name whereas the other remains as is.
The new *cat1* will thus be like:
.. code-block:: cif
loop_
_cat_1.id
_cat_1.name
_cat_1.desc
1 aapje Aap
2 noot Noot
3 mies Mies
5 aap Aap
...@@ -3,9 +3,9 @@ Introduction ...@@ -3,9 +3,9 @@ Introduction
Information on 3D structures of proteins originally came formatted in `PDB <http://www.wwpdb.org/documentation/file-format-content/format33/v3.3.html>`_ files. Although the specification for this format had some real restrictions like a mandatory HEADER and CRYST line, many programs implemented this very poorly often writing out only ATOM records. And users became used to this. Information on 3D structures of proteins originally came formatted in `PDB <http://www.wwpdb.org/documentation/file-format-content/format33/v3.3.html>`_ files. Although the specification for this format had some real restrictions like a mandatory HEADER and CRYST line, many programs implemented this very poorly often writing out only ATOM records. And users became used to this.
The PDB format has some severe limitations rendering it useless for all but very small protein structures. A new format called `mmCIF <https://mmcif.wwpdb.org/>`_ has been around for decades and now is the default format for the Protein Data Bank. The legacy PDB format has some severe limitations rendering it useless for all but very small protein structures. A new format called `mmCIF <https://mmcif.wwpdb.org/>`_ has been around for decades and now is the default format for the Protein Data Bank.
The software developed in the `PDB-REDO <https://pdb-redo.eu/>`_ project aims at improving 3D models based on original experimental data. For this, the tools need to be able to work with both PDB and mmCIF files. A decision was made to make mmCIF leading internally in all programs and convert PDB directly into mmCIF before processing the data. A robust conversion had to be developed to make this possible since, as noted above, files can come with more or less information making it sometimes needed to do a sequence alignment to find out the exact residue numbers. The software developed in the `PDB-REDO <https://pdb-redo.eu/>`_ project aims at improving 3D models based on original experimental data. For this, the tools need to be able to work with both legacy PDB and mmCIF files. A decision was made to make mmCIF leading internally in all programs and convert legacy PDB directly into mmCIF before processing the data. A robust conversion had to be developed to make this possible since, as noted above, files can come with more or less information making it sometimes needed to do a sequence alignment to find out the exact residue numbers.
And so libcif++ came to life, a library to work with mmCIF files. Work on this library started early 2017 and has developed quite a bit since then. To reduce dependency on other libraries, some functionality was added that is not strictly related to reading and writing mmCIF files but may be useful nonetheless. This is mostly code that is used in 3D calculations and symmetry operations. And so libcif++ came to life, a library to work with mmCIF files. Work on this library started early 2017 and has developed quite a bit since then. To reduce dependency on other libraries, some functionality was added that is not strictly related to reading and writing mmCIF files but may be useful nonetheless. This is mostly code that is used in 3D calculations and symmetry operations.
...@@ -18,16 +18,26 @@ The main part of the library is a set of classes that work with mmCIF files. The ...@@ -18,16 +18,26 @@ The main part of the library is a set of classes that work with mmCIF files. The
* :cpp:class:`cif::datablock` * :cpp:class:`cif::datablock`
* :cpp:class:`cif::category` * :cpp:class:`cif::category`
The :cpp:class:`cif::file` class encapsulates, you guessed it, the contents of a mmCIF file. In such a file there are one or more :cpp:class:`cif::datablock`s and each datablock contains one or more :cpp:class:`cif::category`s. The :cpp:class:`cif::file` class encapsulates the contents of a mmCIF file. In such a file there are one or more :cpp:class:`cif::datablock` objects and each datablock contains one or more :cpp:class:`cif::category` objects.
Synopsis
--------
Using *libcifpp* is easy, if you are familiar with modern C++:
.. literalinclude:: ../README.md
:language: c++
:start-after: ```c++
:end-before: ```
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
:caption: Contents :caption: Contents
self self
basics.rst
resources.rst resources.rst
symmetry.rst
api/library_root.rst api/library_root.rst
genindex.rst genindex.rst
Symmetry & Geometry
===================
Although not really a core *CIF* functionality, when working with *mmCIF* files you often need to work with symmetry information. And symmetry works on points in a certain space and thus geometry calculations are also something you need often. Former versions of *libcifpp* used to use `clipper <http://www.ysbl.york.ac.uk/~cowtan/clipper/doc/index.html>`_ to do many of these calculations, but that introduces a dependency and besides, the way clipper numbers symmetry operations is not completely compatible with the way this is done in the PDB.
Points
------
The most basic type in use is :cpp:type:`cif::point`. It can be thought of as a point in space with three coordinates, but it is also often used as a vector in 3d space. To keep the interface simple there's no separate vector type.
Many functions are available in :ref:`file_cif++_point.hpp` that work on points. There are functions to calculate the :cpp:func:`cif::distance` between two points and also function to calculate dot products, cross products and dihedral angles between sets of points.
Quaternions
-----------
All operations inside *libcifpp* that perform some kind of rotation use :cpp:type:`cif::quaternion`. The reason to use Quaternions is not only that they are cool, they are faster than multiplying with a matrix and the results also suffer less from numerical instability.
Matrix
------
Although Quaternions are the preferred way of doing rotations, not every manipulation is a rotation and thus we need a matrix class as well. Matrices and their operations are encoded as matrix_expressions in *libcifpp* allowing the compiler to generate very fast code. See the :ref:`file_cif++_matrix.hpp` for what is on offer.
Symmetry operations
-------------------
Each basic symmetry operation in the crystallographic world consists of a matrix multiplication followed by a translation. To apply such an operation on a carthesian coordinate you first have to convert the point into a fractional coordinate with respect to the unit cell of the crystal, then apply the matrix and translation operations and then convert the result back into carthesian coordinates. This is all done by the proper routines in *libcifpp*.
Symmetry operations are encoded as a string in *mmCIF* PDBx files. The format is a string with the rotational number followed by an underscore and then the encoded translation in each direction where 5 means no translation. So, the identity operator is ``1_555`` meaning that we have rotational number 1 (which is always the identity rotation, point multiplied with the identity matrix) and a translation of zero in each direction.
To give an idea how this works, here's a piece of code copied from one of the unit tests in *libcifpp*. It takes the *struct_conn* records in a certain PDB file and checks wether the distances in each row correspond to what we can calculate.
.. code:: cpp
// Load the file
cif::file f(gTestDir / "2bi3.cif.gz");
auto &db = f.front();
cif::mm::structure s(db);
cif::crystal c(db);
auto struct_conn = db["struct_conn"];
for (const auto &[
asym1, seqid1, authseqid1, atomid1, symm1,
asym2, seqid2, authseqid2, atomid2, symm2,
dist] : struct_conn.find<
std::string,int,std::string,std::string,std::string,
std::string,int,std::string,std::string,std::string,
float>(
cif::key("ptnr1_symmetry") != "1_555" or cif::key("ptnr2_symmetry") != "1_555",
"ptnr1_label_asym_id", "ptnr1_label_seq_id", "ptnr1_auth_seq_id", "ptnr1_label_atom_id", "ptnr1_symmetry",
"ptnr2_label_asym_id", "ptnr2_label_seq_id", "ptnr2_auth_seq_id", "ptnr2_label_atom_id", "ptnr2_symmetry",
"pdbx_dist_value"
))
{
auto &r1 = s.get_residue(asym1, seqid1, authseqid1);
auto &r2 = s.get_residue(asym2, seqid2, authseqid2);
auto a1 = r1.get_atom_by_atom_id(atomid1);
auto a2 = r2.get_atom_by_atom_id(atomid2);
auto sa1 = c.symmetry_copy(a1.get_location(), cif::sym_op(symm1));
auto sa2 = c.symmetry_copy(a2.get_location(), cif::sym_op(symm2));
BOOST_TEST(cif::distance(sa1, sa2) == dist);
auto pa1 = a1.get_location();
const auto &[d, p, so] = c.closest_symmetry_copy(pa1, a2.get_location());
BOOST_TEST(p.m_x == sa2.m_x);
BOOST_TEST(p.m_y == sa2.m_y);
BOOST_TEST(p.m_z == sa2.m_z);
BOOST_TEST(d == dist);
BOOST_TEST(so.string() == symm2);
}
\ No newline at end of file
...@@ -283,7 +283,15 @@ struct item_handle ...@@ -283,7 +283,15 @@ struct item_handle
// conversion helper class // conversion helper class
template <typename T, typename = void> template <typename T, typename = void>
struct item_value_as; struct item_value_as;
/** @endcond */
/**
* @brief Assign value @a value to the item referenced
*
* @tparam T Type of the value
* @param value The value
* @return reference to this item_handle
*/
template <typename T> template <typename T>
item_handle &operator=(const T &value) item_handle &operator=(const T &value)
{ {
...@@ -291,7 +299,6 @@ struct item_handle ...@@ -291,7 +299,6 @@ struct item_handle
assign_value(v); assign_value(v);
return *this; return *this;
} }
/** @endcond */
/** /**
* @brief A method with a variable number of arguments that will be concatenated and * @brief A method with a variable number of arguments that will be concatenated and
......
...@@ -576,7 +576,7 @@ class conditional_iterator_proxy ...@@ -576,7 +576,7 @@ class conditional_iterator_proxy
{ {
if (++mBegin == mEnd) if (++mBegin == mEnd)
break; break;
if (m_condition->operator()(mBegin)) if (m_condition->operator()(mBegin))
break; break;
} }
...@@ -678,6 +678,8 @@ conditional_iterator_proxy<Category, Ts...>::conditional_iterator_impl::conditio ...@@ -678,6 +678,8 @@ conditional_iterator_proxy<Category, Ts...>::conditional_iterator_impl::conditio
, mEnd(cat.end(), cix) , mEnd(cat.end(), cix)
, m_condition(&cond) , m_condition(&cond)
{ {
if (m_condition == nullptr or m_condition->empty())
mBegin = mEnd;
} }
template <typename Category, typename... Ts> template <typename Category, typename... Ts>
...@@ -702,10 +704,15 @@ conditional_iterator_proxy<Category, Ts...>::conditional_iterator_proxy(Category ...@@ -702,10 +704,15 @@ conditional_iterator_proxy<Category, Ts...>::conditional_iterator_proxy(Category
{ {
static_assert(sizeof...(Ts) == sizeof...(Ns), "Number of column names should be equal to number of requested value types"); static_assert(sizeof...(Ts) == sizeof...(Ns), "Number of column names should be equal to number of requested value types");
m_condition.prepare(cat); if (m_condition)
{
m_condition.prepare(cat);
while (mCBegin != mCEnd and not m_condition(*mCBegin)) while (mCBegin != mCEnd and not m_condition(*mCBegin))
++mCBegin; ++mCBegin;
}
else
mCBegin == mCEnd;
uint16_t i = 0; uint16_t i = 0;
((mCix[i++] = m_cat->get_column_ix(names)), ...); ((mCix[i++] = m_cat->get_column_ix(names)), ...);
......
...@@ -300,7 +300,7 @@ class row_handle ...@@ -300,7 +300,7 @@ class row_handle
/** \brief assign the value @a value to the column named @a name /** \brief assign the value @a value to the column named @a name
* *
* If updateLinked it true, linked records are updated as well. * If updateLinked it true, linked records are updated as well.
* That means that if column @name is part of the link definition * That means that if column @a name is part of the link definition
* and the link results in a linked record in another category * and the link results in a linked record in another category
* this record in the linked category is updated as well. * this record in the linked category is updated as well.
* *
......
...@@ -191,6 +191,7 @@ struct category_validator ...@@ -191,6 +191,7 @@ struct category_validator
{ {
std::string m_name; ///< The name of the category std::string m_name; ///< The name of the category
std::vector<std::string> m_keys; ///< The list of items that make up the key std::vector<std::string> m_keys; ///< The list of items that make up the key
cif::iset m_groups; ///< The category groups this category belongs to
cif::iset m_mandatory_fields; ///< The mandatory fields for this category cif::iset m_mandatory_fields; ///< The mandatory fields for this category
std::set<item_validator> m_item_validators; ///< The item validators for the items in this category std::set<item_validator> m_item_validators; ///< The item validators for the items in this category
......
...@@ -921,25 +921,30 @@ condition category::get_parents_condition(row_handle rh, const category &parentC ...@@ -921,25 +921,30 @@ condition category::get_parents_condition(row_handle rh, const category &parentC
condition result; condition result;
for (auto &link : m_validator->get_links_for_child(m_name)) auto links = m_validator->get_links_for_child(m_name);
links.erase(remove_if(links.begin(), links.end(), [n=parentCat.m_name](auto &l) { return l->m_parent_category != n; }), links.end());
if (not links.empty())
{ {
if (link->m_parent_category != parentCat.m_name) for (auto &link : links)
continue; {
condition cond;
condition cond; for (size_t ix = 0; ix < link->m_child_keys.size(); ++ix)
{
auto childValue = rh[link->m_child_keys[ix]];
for (size_t ix = 0; ix < link->m_child_keys.size(); ++ix) if (childValue.empty())
{ continue;
auto childValue = rh[link->m_child_keys[ix]];
if (childValue.empty()) cond = std::move(cond) and key(link->m_parent_keys[ix]) == childValue.text();
continue; }
cond = std::move(cond) and key(link->m_parent_keys[ix]) == childValue.text(); result = std::move(result) or std::move(cond);
} }
result = std::move(result) or std::move(cond);
} }
else if (cif::VERBOSE > 0)
std::cerr << "warning: no child to parent links were found for child " << parentCat.name() << " and parent " << name() << '\n';
return result; return result;
} }
...@@ -956,30 +961,35 @@ condition category::get_children_condition(row_handle rh, const category &childC ...@@ -956,30 +961,35 @@ condition category::get_children_condition(row_handle rh, const category &childC
if (childCatValidator != nullptr) if (childCatValidator != nullptr)
mandatoryChildFields = childCatValidator->m_mandatory_fields; mandatoryChildFields = childCatValidator->m_mandatory_fields;
for (auto &link : m_validator->get_links_for_parent(m_name)) auto links = m_validator->get_links_for_parent(m_name);
links.erase(remove_if(links.begin(), links.end(), [n=childCat.m_name](auto &l) { return l->m_child_category != n; }), links.end());
if (not links.empty())
{ {
if (link->m_child_category != childCat.m_name) for (auto &link : links)
continue; {
condition cond;
condition cond; for (size_t ix = 0; ix < link->m_parent_keys.size(); ++ix)
{
auto childKey = link->m_child_keys[ix];
auto parentKey = link->m_parent_keys[ix];
for (size_t ix = 0; ix < link->m_parent_keys.size(); ++ix) auto parentValue = rh[parentKey];
{
auto childKey = link->m_child_keys[ix];
auto parentKey = link->m_parent_keys[ix];
auto parentValue = rh[parentKey]; if (parentValue.empty())
cond = std::move(cond) and key(childKey) == null;
else if (link->m_parent_keys.size() > 1 and not mandatoryChildFields.contains(childKey))
cond = std::move(cond) and (key(childKey) == parentValue.text() or key(childKey) == null);
else
cond = std::move(cond) and key(childKey) == parentValue.text();
}
if (parentValue.empty()) result = std::move(result) or std::move(cond);
cond = std::move(cond) and key(childKey) == null;
else if (link->m_parent_keys.size() > 1 and not mandatoryChildFields.contains(childKey))
cond = std::move(cond) and (key(childKey) == parentValue.text() or key(childKey) == null);
else
cond = std::move(cond) and key(childKey) == parentValue.text();
} }
result = std::move(result) or std::move(cond);
} }
else if (cif::VERBOSE > 0)
std::cerr << "warning: no parent to child links were found for parent " << name() << " and child " << childCat.name() << '\n';
return result; return result;
} }
......
...@@ -181,7 +181,7 @@ _pdbx_chem_comp_audit.comp_id ...@@ -181,7 +181,7 @@ _pdbx_chem_comp_audit.comp_id
_pdbx_chem_comp_audit.action_type _pdbx_chem_comp_audit.action_type
_pdbx_chem_comp_audit.date _pdbx_chem_comp_audit.date
_pdbx_chem_comp_audit.processing_site _pdbx_chem_comp_audit.processing_site
REA_v2 "CREA_v2te component" 1999-07-08 RCSB REA_v2 "Create component" 1999-07-08 RCSB
REA_v2 "Modify descriptor" 2011-06-04 RCSB REA_v2 "Modify descriptor" 2011-06-04 RCSB
REA_v2 "Other modification" 2016-10-18 RCSB REA_v2 "Other modification" 2016-10-18 RCSB
# #
...@@ -517,6 +517,55 @@ BOOST_AUTO_TEST_CASE(symm_2bi3_1, *utf::tolerance(0.1f)) ...@@ -517,6 +517,55 @@ BOOST_AUTO_TEST_CASE(symm_2bi3_1, *utf::tolerance(0.1f))
} }
} }
BOOST_AUTO_TEST_CASE(symm_2bi3_1a, *utf::tolerance(0.1f))
{
using namespace cif::literals;
cif::file f(gTestDir / "2bi3.cif.gz");
auto &db = f.front();
cif::crystal c(db);
auto struct_conn = db["struct_conn"];
auto atom_site = db["struct_conn"];
for (const auto &[
asym1, seqid1, authseqid1, atomid1, symm1,
asym2, seqid2, authseqid2, atomid2, symm2,
dist] : struct_conn.find<
std::string,int,std::string,std::string,std::string,
std::string,int,std::string,std::string,std::string,
float>(
cif::key("ptnr1_symmetry") != "1_555" or cif::key("ptnr2_symmetry") != "1_555",
"ptnr1_label_asym_id", "ptnr1_label_seq_id", "ptnr1_auth_seq_id", "ptnr1_label_atom_id", "ptnr1_symmetry",
"ptnr2_label_asym_id", "ptnr2_label_seq_id", "ptnr2_auth_seq_id", "ptnr2_label_atom_id", "ptnr2_symmetry",
"pdbx_dist_value"
))
{
cif::point p1 = atom_site.find1<float,float,float>(
"label_asym_id"_key == asym1 and "label_seq_id"_key == seqid1 and "auth_seq_id"_key == authseqid1 and "label_atom_id"_key == atomid1,
"cartn_x", "cartn_y", "cartn_z");
cif::point p2 = atom_site.find1<float,float,float>(
"label_asym_id"_key == asym2 and "label_seq_id"_key == seqid2 and "auth_seq_id"_key == authseqid2 and "label_atom_id"_key == atomid2,
"cartn_x", "cartn_y", "cartn_z");
auto sa1 = c.symmetry_copy(p1, cif::sym_op(symm1));
auto sa2 = c.symmetry_copy(p2, cif::sym_op(symm2));
BOOST_TEST(cif::distance(sa1, sa2) == dist);
const auto &[d, p, so] = c.closest_symmetry_copy(p1, p2);
BOOST_TEST(p.m_x == sa2.m_x);
BOOST_TEST(p.m_y == sa2.m_y);
BOOST_TEST(p.m_z == sa2.m_z);
BOOST_TEST(d == dist);
BOOST_TEST(so.string() == symm2);
}
}
BOOST_AUTO_TEST_CASE(symm_3bwh_1, *utf::tolerance(0.1f)) BOOST_AUTO_TEST_CASE(symm_3bwh_1, *utf::tolerance(0.1f))
{ {
......
...@@ -1865,6 +1865,15 @@ _test.name ...@@ -1865,6 +1865,15 @@ _test.name
BOOST_TEST(db["test"].find_first<int>(cif::key("id") == 1, "id") == 1); BOOST_TEST(db["test"].find_first<int>(cif::key("id") == 1, "id") == 1);
BOOST_TEST(db["test"].find_first<int>(cif::all(), "id") == 1); BOOST_TEST(db["test"].find_first<int>(cif::all(), "id") == 1);
std::optional<int> v;
v = db["test"].find_first<std::optional<int>>(cif::key("id") == 1, "id");
BOOST_TEST(v.has_value());
BOOST_TEST(*v == 1);
v = db["test"].find_first<std::optional<int>>(cif::key("id") == 6, "id");
BOOST_TEST(not v.has_value());
// find1 tests // find1 tests
BOOST_TEST(db["test"].find1<int>(cif::key("id") == 1, "id") == 1); BOOST_TEST(db["test"].find1<int>(cif::key("id") == 1, "id") == 1);
BOOST_CHECK_THROW(db["test"].find1<int>(cif::all(), "id"), cif::multiple_results_error); BOOST_CHECK_THROW(db["test"].find1<int>(cif::all(), "id"), cif::multiple_results_error);
...@@ -1882,7 +1891,7 @@ BOOST_AUTO_TEST_CASE(r1) ...@@ -1882,7 +1891,7 @@ BOOST_AUTO_TEST_CASE(r1)
of pdbx_nonpoly_scheme which itself is a parent of pdbx_entity_nonpoly. If I want to rename a residue of pdbx_nonpoly_scheme which itself is a parent of pdbx_entity_nonpoly. If I want to rename a residue
I cannot update pdbx_nonpoly_scheme since changing a parent changes children, but not vice versa. I cannot update pdbx_nonpoly_scheme since changing a parent changes children, but not vice versa.
But if I change the comp_id in atom_site, the pdbx_nonpoly_scheme is update, that's good, and then But if I change the comp_id in atom_site, the pdbx_nonpoly_scheme is updated, that's good, and then
pdbx_entity_nonpoly is updated and that's bad. pdbx_entity_nonpoly is updated and that's bad.
The idea is now that if we update a parent and a child that must change as well, we first check The idea is now that if we update a parent and a child that must change as well, we first check
...@@ -2168,6 +2177,228 @@ _cat_3.num ...@@ -2168,6 +2177,228 @@ _cat_3.num
// f.save(std::cout); // f.save(std::cout);
} }
BOOST_AUTO_TEST_CASE(pc_1)
{
/*
Parent/child tests
Note that the dictionary is different than the one in test r1
*/
const char dict[] = R"(
data_test_dict.dic
_datablock.id test_dict.dic
_datablock.description
;
A test dictionary
;
_dictionary.title test_dict.dic
_dictionary.datablock_id test_dict.dic
_dictionary.version 1.0
loop_
_item_type_list.code
_item_type_list.primitive_code
_item_type_list.construct
code char
'[][_,.;:"&<>()/\{}'`~!@#$%A-Za-z0-9*|+-]*'
text char
'[][ \n\t()_,.;:"&<>/\{}'`~!@#$%?+=*A-Za-z0-9|^-]*'
int numb
'[+-]?[0-9]+'
save_cat_1
_category.description 'A simple test category'
_category.id cat_1
_category.mandatory_code no
_category_key.name '_cat_1.id'
save_
save__cat_1.id
_item.name '_cat_1.id'
_item.category_id cat_1
_item.mandatory_code yes
_item_linked.child_name '_cat_2.parent_id'
_item_linked.parent_name '_cat_1.id'
_item_type.code int
save_
save__cat_1.name
_item.name '_cat_1.name'
_item.category_id cat_1
_item.mandatory_code yes
_item_type.code code
save_
save__cat_1.desc
_item.name '_cat_1.desc'
_item.category_id cat_1
_item.mandatory_code yes
_item_type.code text
save_
save_cat_2
_category.description 'A second simple test category'
_category.id cat_2
_category.mandatory_code no
_category_key.name '_cat_2.id'
save_
save__cat_2.id
_item.name '_cat_2.id'
_item.category_id cat_2
_item.mandatory_code yes
_item_type.code int
save_
save__cat_2.name
_item.name '_cat_2.name'
_item.category_id cat_2
_item.mandatory_code yes
_item_type.code code
save_
save__cat_2.num
_item.name '_cat_2.num'
_item.category_id cat_2
_item.mandatory_code yes
_item_type.code int
save_
save__cat_2.desc
_item.name '_cat_2.desc'
_item.category_id cat_2
_item.mandatory_code yes
_item_type.code text
save_
save_cat_3
_category.description 'A third simple test category'
_category.id cat_3
_category.mandatory_code no
_category_key.name '_cat_3.id'
save_
save__cat_3.id
_item.name '_cat_3.id'
_item.category_id cat_3
_item.mandatory_code yes
_item_type.code int
save_
save__cat_3.name
_item.name '_cat_3.name'
_item.category_id cat_3
_item.mandatory_code yes
_item_type.code code
save_
save__cat_3.num
_item.name '_cat_3.num'
_item.category_id cat_3
_item.mandatory_code yes
_item_type.code int
save_
loop_
_pdbx_item_linked_group_list.parent_category_id
_pdbx_item_linked_group_list.link_group_id
_pdbx_item_linked_group_list.parent_name
_pdbx_item_linked_group_list.child_name
_pdbx_item_linked_group_list.child_category_id
cat_1 1 '_cat_1.name' '_cat_2.name' cat_2
cat_2 1 '_cat_2.name' '_cat_3.name' cat_3
cat_2 1 '_cat_2.num' '_cat_3.num' cat_3
)";
struct membuf : public std::streambuf
{
membuf(char *text, size_t length)
{
this->setg(text, text, text + length);
}
} buffer(const_cast<char *>(dict), sizeof(dict) - 1);
std::istream is_dict(&buffer);
auto validator = cif::parse_dictionary("test", is_dict);
cif::file f;
f.set_validator(&validator);
// --------------------------------------------------------------------
const char data[] = R"(
data_test
loop_
_cat_1.id
_cat_1.name
_cat_1.desc
1 aap Aap
2 noot Noot
3 mies Mies
loop_
_cat_2.id
_cat_2.name
_cat_2.num
_cat_2.desc
1 aap 1 'Een dier'
2 aap 2 'Een andere aap'
3 noot 1 'walnoot bijvoorbeeld'
loop_
_cat_3.id
_cat_3.name
_cat_3.num
1 aap 1
2 aap 2
)";
using namespace cif::literals;
struct data_membuf : public std::streambuf
{
data_membuf(char *text, size_t length)
{
this->setg(text, text, text + length);
}
} data_buffer(const_cast<char *>(data), sizeof(data) - 1);
std::istream is_data(&data_buffer);
f.load(is_data);
auto &cat1 = f.front()["cat_1"];
auto &cat2 = f.front()["cat_2"];
auto &cat3 = f.front()["cat_3"];
// some parent/child tests
// find all children in cat2 for the row with id == 1 in cat1
auto rs1 = cat1.get_children(cat1.find1("id"_key == 1), cat2);
BOOST_TEST(rs1.size() == 2);
auto rs2 = cat1.get_children(cat1.find1("id"_key == 2), cat2);
BOOST_TEST(rs2.size() == 1);
auto rs3 = cat1.get_children(cat1.find1("id"_key == 3), cat2);
BOOST_TEST(rs3.size() == 0);
// finding parents
auto rs4 = cat2.get_parents(cat2.find1("id"_key == 1), cat1);
BOOST_TEST(rs4.size() == 1);
auto rs5 = cat3.get_parents(cat3.find1("id"_key == 1), cat2);
BOOST_TEST(rs5.size() == 1);
// This link is not defined:
auto rs6 = cat3.get_parents(cat3.find1("id"_key == 1), cat1);
BOOST_TEST(rs6.size() == 0);
}
// -------------------------------------------------------------------- // --------------------------------------------------------------------
// BOOST_AUTO_TEST_CASE(bondmap_1) // BOOST_AUTO_TEST_CASE(bondmap_1)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment