Letter sent by Peter Willett to the EPSRC

Peter Willett sent a letter in early December 2006 to Carmine Ruggeiro at the EPSRC about the proposed closure of the Chemical Database Service. He also sent a copy to John Baird - the Chemistry Programme Manager.

Willett (P.Willett@sheffield.ac.uk) is a professor in the Department of Information Studies at Sheffield University. He is a world leading expert in the area of Chemoinformatics, and was the Chair (1993) of the panel of an SERC Working Party on the provision of chemical databases to UK academia. He was a committee member (1994-96) and Chairman (1996-1999) of the Management Advisory Panel of the EPSRC Chemical Database Service.

When contacted just before Christmas he had had no reply from the EPSRC. He is happy for the letter to be distributed more widely, and we within the Service certainly wish as many people as possible to raise their heads above the parapet!

We welcome feedback on the views of the community (both supportive and otherwise). Also there is still time to make your views know to the EPSRC (Carmine.Ruggiero@epsrc.ac.uk & John.Baird@epsrc.ac.uk). You can contact the CDS Team at cdsbb@dl.ac.uk.

Incidentally Willett mentioned that he could have written much more but chose to focus on the ones that he did. This was because many of his potential criticisms had already been addressed in the CDS response. The details of the various Reports and Responses are included at the link:

http://cds.dl.ac.uk/epsrc_reports/review2006.html

Below is a copy of the Willett letter:

Dear Carmine

I have now seen a copy of the Panel's report on the CDS website, and
wish to make the following comments on the report. I must emphasise
that I am not a synthetic, physical or structural chemist, but I have
been associated with one of the world's leading chemoinformatics
groups for three decades and thus have some familiarity with factors
considered by the panel. In particular, it seems to me that
insufficient attention may have been paid to the practicalities of
database access for the current users of CDS if the Service is to be
terminated as planned next year. In essence, it is suggested that
access could be provided by the GRID, by institutional repositories,
by alternative national licensing schemes, by charging database costs
using the FEC mechanism, or by a third party. Each of these points is
addressed below.

The GRID is an access mechanism, not a content mechanism: databases
can of course be made available via a grid but the latter requires
that the data is available somewhere on the network, i.e. somebody has
to provide the content.

Institutional repositories are, or will shortly be, amassing large
volumes of institutional data but accessing multiple archives is the
complete antithesis of the database approach, where somebody, i.e.,
the database provider, has done all the data collection, validation
and formatting and made it available in a coherent, supported form.

Alternative licensing schemes to those negotiated by CDS are obviously
possible but EPSRC should surely ensure that such licenses are in
place and operational before terminating the current ones: the report
suggests some organisations that might be able to do this but there is
nothing there to suggest that one of these will agree to it and/or be
able to obtain better terms than those already available.  Similar
comments apply to the provision of access by a third party (i.e.
splitting the licensing of the databases from their provision, rather
than having them done by one organisation as at present). There may
well be case for a further competitive tendering exercise to identify
the provider who can provide the most cost-effective access - I helped
organise a previous such competition in which CDS was successful - but
that exercise should be carried out prior to termination of the
current provision to ensure continuity of access for the current,
large user population.

Charging the costs of database access to research grants via FEC has
two problems associated with it. First, it would mean that only those
who were successful in obtaining EPSRC research awards would be able
to gain access to the data, and I would assume that the user base is
far larger than the number of holders of current EPSRC grants. Second,
it would mean that each successful applicant would have to pay for a
full academic licence and to develop the expertise necessary to access
the particular database(s) of interest.  Even for a dedicated
chemoinformatics group such as the one I work in, the plethora of data
formats and command types mean that we restrict the number of systems
with which we work on grounds of feasibility: this data variability is
likely to be a significant problem for non-specialist users.

I make two final points. First, the report states that "the service is
competing against commercial database providers": if this was correct
then the providers would not be willing to make their data available
for repackaging via CDS and subsequent processing - most conventional
database licenses strictly forbid any such manipulations. Second, the
impact of CDS is evident in a recent citation analysis of the most
important chemoinformatics journal, the American Chemical Society's
Journal of Chemical Information and Computer Sciences, which showed
that the standard CDS reference (Fletcher, D.A. et al., J. Chem. Inf.
Comput. Sci. 1996, 36, 746-749) has been the second most-cited of all
the papers published in the journal (see 
http://www.warr.com/25years.html).

I trust that these comments are helpful.

Yours sincerely 

Peter Willett


Incidentally there a more up to date survey of the current Service produced as a chapter of a recent book:

Bob McMeeking & Dave Fletcher, in Cheminformatics Developments: History, Reviews and Current Research (Ed. J. H. Noordik)