Databases in academia
University research isn't always up on the latest in business IT
Last week I was at Cambridge, learning what Henslow taught Darwin (Kohn, Murrell, Parker and Whitehorn, Nature, vol. 436, 4 August 2005, p643 – available online if you subscribe/register).
Henslow, elected Professor of Botany at Cambridge in 1825, was a careful scientist, the first university lecturer to illustrate his lectures (yes, even before PowerPoint), and a creationist who investigated the variation within species in order to show that species were created as fundamentally stable things that just varied widely in response to conditions.
Darwin was his pupil (Henslow helped arrange for Darwin’s presence on the Beagle), but Darwin made the intellectual leap that allowed him to interpret Henslow’s records of variation - not as evidence of a fixed set of created species with variations, but as evidence of the evolution of new species in action.
Why was I there representing Reg Developer? Well, John Parker’s research establishing exactly what Henslow was doing and its importance to Darwin’s work was assisted by Mark Whitehorn, Reg Developer columnist and database expert, who got his PhD with Parker many years ago.
The research team was cross-disciplinary in the first place – it included David Kohn, a historian from Drew University in New Jersey, USA (who “went white” when he learnt what Henslow had been doing, since he had to rewrite a chunk of his book, yet to be published, on Darwin); Gina Murrell from the Cambridge University Herbarium; as well as Parker, who is from the Cambridge University Botanic Garden.
However, it was largely chance that Mark was around to point out that correlating Henslow’s plant collections with the time of collection, the people involved, Darwin’s published work and so on using a card index, was woefully inefficient. He designed a database to hold all the information available from Henslow’s collections (found in sheds and attics around Cambridge, as I remember it) and advised and assisted with the extensive data cleansing needed.
He chose Microsoft SQL Server (although he says any reasonable relational database would have done) to store the data, because he considers its query and analysis facilities to be unparalleled today – and he used SQL Server 2005 in its beta incarnation, simply because it made the management of the database and analysis very much easier than with the previous version. And, the research team’s enthusiasm for the way they could now ask questions of their data and get immediate answers and visualisations was palpable.
Of course, Henslow’s sheets of paper with collections of plants stuck to them, illustrating variations within a single species, is also a database of sorts. These days, we’d photograph the plants and store them in an electronic database as an extended datatype (although whether recreating the database from a set of CDs in a box in a cupboard some 150 years later would be as feasible as recreating Henslow’s work is moot). But perhaps we wouldn’t.
Although computers are widely used in theoretical physics and such research, the tools taken as routine in business are being overlooked in academia – if Mark hadn’t taken a PhD with John Parker and then moved into databases (he’s in the Department of Applied Computing at the University of Dundee) this research would have been based on shuffling index cards in a card index box (or, at best, on something like a spreadsheet).
Makes you think. And one thing it makes me think is that there are still unexplored opportunities for database specialists out there. And, frankly, 20 years or more after James Martin first excited me with the potential of Relational Databases, that rather surprises me.
Photographs by David Norfolk, who is also the author of IT Governance, published by Thorogood. More details here.
BI Killer Feature?
"Sometimes 'the latest in business IT' isn't of very much use w.r.t. academic use of databases."
Fair comment. Of course, I didn't actually google for "BI in academic databases", I talked to an academic who is getting enquiries about BI techniques from other academics <grin>.
And FileMaker is a fine product (far better than Excel used as a database) and, I'm sure, fit for an awful lot of academic work where SQL Server would be overkill.
My point was, that advanced BI may be being neglected, not because it isn't necessary for a particular research project but because some academics aren't aware of what is now available and what it can do. You sound as if this doesn't apply to you, but the general feedback we're getting isn't doing much to change my opinion generally. But it is only an opinion.
"Register research isn't always up on the latest in Academic Research" ;-)
I found it interesting that someone else brought up the FileMaker thing. Having worked with a number of academics in the past, in various fields, FMPro is often used as a 'better Excel' for manipulation of data, and is seen more as an 'everyday tool' - i.e. something which they use for more than one task, something which they'll use to knock up something to capture data, and to test hypotheses quickly.
A syllogistic search for database use in Academia which starts off with 'BI is the latest killer feature in database tools' is bound to turn up the conclusion "University research isn't always up on the latest in business IT". Sometimes 'the latest in business IT' isn't of very much use w.r.t. academic use of databases.
For example, one bit of work I was involved with was ended up as part of a PhD submission (back in 1996), that was partly written in FileMaker - in Art & Design Ceramics of all things! It definitely had to be done in a database, but not sure how BI would have applied to it ;-)
Thanks, I enjoyed the article, which corroborated a discussion I had last month with a physicist at the ETH in Zurich. It started by my asking whether universities appreciated using the recent batch of free Express databases. Put politely, his institute was well equipped with hard and software, costs no issue, but the academics' use varied in sophistication. (He had at least improved his own somewhat by volunteering to be a IT liason.)
A main point in the article is that wonderful things happen when programmers' skills and users' needs meet and amalgamate. I'd suggest that it takes several years with RDBs for serious users to appreciate what they can do. Worse, in my non-Finance neck of the large company woods, the OLAP abilities of latest DB versions is inaccessible, because current applications have fixed reporting, set up like card files or RDBs. So users can't even imagine what improvements are possible.
Any suggestions on, or good examples of, improving RDB and OLAP use?