[Update 16th June: Added Note about Pymol 1.8.4]
The new version of WebCSD has arrived and with it, a few new important features, such as improved substructure searching, and advanced text searching. The Cambridge Crystallographic Data Centre (CCDC) houses the the Cambridge Structural Database (CSD), a repository of over over 875,000 small molecule structures from x-ray and neutron diffraction analyses.
Traditionally, use of the CSD has been by institutional licensing of the desktop client Conquest, and by downloading periodic updates. The current release is May 2017. Recently a Java/Web interface WebCSD has become available for use, which contains fewer features compared to Conquest, but which is useful for accessing structures if you are out of the office or working from home. Unfortunately, I frequently had problems with the previous version which used to throw up lots of erroneous Java errors on Mac, claiming that the user had an out-of-date Java version. Happily this annoying bug seems to have gone with the new version.
Another confusing thing about the old WebCSD was that it used a different login system from the CCDC data deposition interface, requiring crystallographers or IT departments to maintain separate logins. In the new version you log in with your data deposition username/password at the main CCDC home page. The first thing you will notice is that the new CCDC home page now has two large button options:
Deposit structures and Access Structures.
Choosing Access structures takes you to the new WebCSD text and structure search interfaces in a familiar tab format.
I first gave it a quick try in text mode looking for structures of my current favourite mushroom toxin cyclic peptide α-amanitin. The results page has all the structures preselected in a tickbox format and encouragingly there is a “Download Selected” button.
This may not seem like a big deal but one of my major gripes with previous versions of WebCSD was that it was not possible to do a bulk download of hits. For that you have to use the Conquest interface, something that is impractical to do off-site.
So finally you can download all the hits, but currently only as a multiCIF file. This is not the easiest option but better than nothing. I’m currently trying to find reliable ways to convert or read multiCIFs. Pymol loads them
all but only puts bonds in for the first entry (see pictures below). Update: Newer versions of Pymol (I used 1.8.4) read in all the files in a multiCIF correctly.
Schrödinger’s Maestro only reads only first of a multiCIF file. The CCDC’s own software Mercury reads them all into the viewer window, from which you can select and export the files in a variety of formats, but you still can’t bulk export them all to another format.
I next tried searching for a substructure, a cyclic octapeptide equivalent with four thiazoles. Excuse the drawing, I’m still coming to grips with the interface and rotating etc. This came up with one hit, which happens to be a symmetric compound with four prolines occupying the remaining four amino acid positions. (Link to paper)
Nicely, there is a JSmol rotating 3D model of the structure as well. This hit also shows how this class of modified peptide macrocycles frequently bind solvent molecules and metal ions, In this case I have caught the rotating JSmol at the right angle to see a nice molecule of acetonitrile bound in the centre of the macrocycle.
This has been only a very short overview of the new version. Stay tuned for anything else I discover. In particular one of the best things about new version is that is much much faster. I look forward to the products continued evolution, particularly the ability to extract hits to SD files, which is a standard feature of the desktop version.
*I should note that multiCIF files often contain coordinates for disordered atoms and these may display weirdly in Pymol. Pymol uses distances to estimate bonds, so some downloaded structures may appear to have chemically unfeasible connectivities.