The push towards open publishing of academic research continues to gather pace with proponents of openness encouraged by signs from government research bodies often now requiring publications being made available via the various models available.
As we enter this brave new world of free and open publishing we will of course encounter unforeseen roadblocks to make this process a bit less smooth than it could otherwise be. The first one of these that I wish to discuss is that of indexing. There is little point in putting a copy of your thesis or a preprint of your paper into an institutional repository if that research cannot be found. At the current time there is no consistent way in which Australian universities warehouse, index and abstract such works. In my own field of organic chemistry the most important information conveyed is the new chemical structures and their transformations. In the current paradigm the main chemical indexing of published papers is a paid service performed by the Chemical Abstracts Service (CAS). Chemical structures and their reactions are painstakingly extracted and put into a searchable database, including the Registry File which currently contains details of some 91 million chemical substances.
CAS do not index institutional repositories as they have no way of knowing what information is available, where it is, and how to get it. Other chemical structure databases such as Chemspider face similar challenges in indexing chemical knowledge. So if researcher wishes to synthesise a compound often the first step is to see if it has been done before by searching that structure or related sub structures in the CAS registry file. If the substance is not found there, then the researcher has to assume it has never been made and design a new synthesis from scratch. If that structure actually exists in a document in an institutional repository effectively hidden from abstracting services the resulting synthetic effort is essentially wasted.
Similarly in the biological sphere the first port of call is often NCBI PubMed which republishes the abstracts of published works in a searchable manner. Valuable work hidden in repositories or preprint servers that never appears in a final published journal article may similarly be lost to science. Some years ago NCBI branched out to the chemical realm with their PubChem database, which after initial opposition from CAS has come to be a mostly reliable database of chemical structures, often with associated information such as biological activities. PubChem is an aggregator of chemical structures, they do no curation of the contents. As such it is very difficult to correct entries in Pubchem.
The physics community has for many years been well served by the Arxiv prepress server. Articles are uploaded there in advance of their publication in conventional journals so that the scientific community can immediately learn of the new results and build on them. Some work in Arxiv may never end up in final published form but the scientific community can call upon it, cite it and use it in a useful manner. Importantly CAS does index physical chemistry papers in Arxiv.
As of now there is no way to comprehensively search chemical structures and their transformations without using a paid service like CAS, something hard core OA advocates may have trouble with. But not everything in life can be free. So what is the solution? I’m afraid I don’t know. If I did I’d have patented it ( that was a joke folks).