Major upgrade to WebCSD – The small molecule crystal structure database

[Update 16th June: Added Note about Pymol 1.8.4]

The new version of WebCSD has arrived and with it, a few new important features, such as improved substructure searching, and advanced text searching. The Cambridge Crystallographic Data Centre (CCDC) houses the the Cambridge Structural Database (CSD), a repository of over over 875,000 small molecule structures from x-ray and neutron diffraction analyses.

Growth in the CSD. Graphic copyright The Cambridge Crystallographic Data Centre The Cambridge Structural Database, C. R. Groom, I. J. Bruno, M. P.
Lightfoot and S. C. Ward, Acta Cryst. (2016). B72, 171-179 DOI:

Traditionally, use of the CSD has been by institutional licensing of the desktop client Conquest, and by downloading periodic updates. The current release is May 2017. Recently a Java/Web interface WebCSD has become available for use, which contains fewer features compared to Conquest, but which is useful for accessing structures if you are out of the office or working from home. Unfortunately, I frequently had problems with the previous version which used to throw up lots of erroneous Java errors on Mac, claiming that the user had an out-of-date Java version. Happily this annoying bug seems to have gone with the new version.

Java Error with WebCSD v1.1.2 on Mac

Another confusing thing about the old WebCSD was that it used a different login system from the CCDC data deposition interface, requiring crystallographers or IT departments to maintain separate logins. In the new version you log in with your data deposition username/password at the main CCDC home page. The first thing you will notice is that the new CCDC home page now has two large button options:
Deposit structures and Access Structures.

new CCDC home page

Choosing Access structures takes you to the new WebCSD text and structure search interfaces in a familiar tab format.

Text Query Interface in the new WebCSD

Substructure search in the new WebCSD

I first gave it a quick try in text mode looking for structures of my current favourite mushroom toxin cyclic peptide α-amanitin. The results page has all the structures preselected in a tickbox format and encouragingly there is a “Download Selected” button.

Hits Page in WebCSD

This may not seem like a big deal but one of my major gripes with previous versions of WebCSD was that it was not possible to do a bulk download of hits. For that you have to use the Conquest interface, something that is impractical to do off-site.

So finally you can download all the hits, but currently only as a multiCIF file. This is not the easiest option but better than nothing. I’m currently trying to find reliable ways to convert or read multiCIFs. Pymol loads them all but only puts bonds in for the first entry (see pictures below). Update: Newer versions of Pymol (I used 1.8.4) read in all the files in a multiCIF correctly.

MultiCIF files opened in Pymol (v.1.6). First entry shows bonds, later entries just show atoms.

New versions of Pymol (v.1.8.4). show all atoms and bonds correctly*

Schrödinger’s Maestro only reads only first of a multiCIF file. The CCDC’s own software Mercury reads them all into the viewer window, from which you can select and export the files in a variety of formats, but you still can’t bulk export them all to another format.

Amanitin hits from WebCSD displayed in Mercury

I next tried searching for a substructure, a cyclic octapeptide equivalent with four thiazoles. Excuse the drawing, I’m still coming to grips with the interface and rotating etc. This came up with one hit, which happens to be a symmetric compound with four prolines occupying the remaining four amino acid positions. (Link to paper)

Substructure search interface. Had trouble finding rotate or cleanup tools

Nicely, there is a JSmol rotating 3D model of the structure as well. This hit also shows how this class of modified peptide macrocycles frequently bind solvent molecules and metal ions, In this case I have caught the rotating JSmol at the right angle to see a nice molecule of acetonitrile bound in the centre of the macrocycle.

WebCSD substructure hit interface

This has been only a very short overview of the new version. Stay tuned for anything else I discover. In particular one of the best things about new version is that is much much faster. I look forward to the products continued evolution, particularly the ability to extract hits to SD files, which is a standard feature of the desktop version.

*I should note that multiCIF files often contain coordinates for disordered atoms and these may display weirdly in Pymol. Pymol uses distances to estimate bonds, so some downloaded structures may appear to have chemically unfeasible connectivities.

About martin

almost on holidays
This entry was posted in Chem, software, Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s