More on databases in academia
Can I tread on your blue suede shoes?
Well, my (this is David Norfolk writing) Blog on Databases in Academia excited some robust comment – and, strangely, it didn’t come from the Intelligent Design people I was expecting to upset. As the implications of some of the comments were pretty personal, I felt I should give my main informant, Mark Whitehorn, the “right to reply”, and at the same time go into some of the detail I skipped over.
So, Mark, exactly what is your background in databases?
As both the person who actually did the database work on this project and (strangely) a columnist for Reg Developer, I feel sure that my contribution to this debate will cause yet more confusion and controversy. Sigh.
As well as working as a writer and a consultant, I'm also an academic. I was once a geneticist (hence my Ph.D. with John Parker many moons ago) but then I turned to the dark side and went to play with databases. Amongst other academic posts, I am an honorary lecturer at the University of Dundee and I teach advanced data structures there.
And, how did you get involved with this research effort?
I was delighted when John approached me about the Henslow work and it rapidly became clear that the database side of the problem was trivial. Perhaps 100,000 data items needed to be collected in total. We certainly could have used Access, possibly not CardBox; but you never know. This is why I said to you that “any reasonable relational database” would have done. It was true. We could have used Oracle, we could have used DB2. The problem wasn’t collecting the data; it was the subsequent analysis. The data is essentially multi-dimensional in nature (each plant specimen has many characteristics that we wanted to cross correlate) so I needed to be able to structure the data as a multi-dimensional set.
Aren’t there many multi-dimensional BI tools that you could have used for this?
In my opinion (and it is only an opinion) SQL Server was the best tool for the job because it has a world class multi-dimensional engine included in the box. Is it, in my opinion, not only the best, but the easiest to set up and use? Yes. Cost was not a consideration here because I was on the beta test program for 2005 and had a copy. However, had it been an issue, there is no doubt in my mind that this combination is also the most cost effective. All the other major vendors charge significant sums for their BI (Business Intelligence) tools.
Could we have achieved the same end result using, say, the set of BI tools that can be purchased along with DB2 or Oracle? Without doubt. Would it have been as easy, fast, and cost-effective? No. (Could CardBox have been used? Well, as far as I am aware it is a fine tool but currently lacks a multi-dimensional database engine. Perhaps that is coming in the next release.)
As a side issue, the main reason for using the beta of SQL Server 2005 rather than SQL Server 2000 was that 2005 allows one-to-many relationships (one herbarium sheet can have many plants) to be modelled very easily and elegantly in a multi-dimensional engine. This is a feature that is still currently rare in BI systems.
OK, but I imagine that the main focus of this research wasn’t the multidimensional BI tools. Just why was it carried out?
The reason our paper was considered important enough for acceptance in Nature was that it essentially re-writes our understanding of how Darwin came to develop the theory of evolution – nothing at all to do with the analysis. Indeed, we don’t even mention the tools in that paper. However the paper caused a considerable stir in the academic community, not just for the Darwin angle, but also amongst academics keen to know how we had achieved the end result. Given that I am the database geek on the team, I fielded most of these enquires. These tended to confirm my opinion that many academics are not, currently, making the best use of the BI tools that are now available.
So, the angle I picked up on, the use of advanced business technology in Academia, is a real issue?
Despite what some commentators seem to be reading into your article, you didn’t say that academics don’t use databases. You said, “Although computers are widely used in theoretical physics and such research, the tools taken as routine in business are being overlooked in academia.” This is my experience.
Hence the Press Day at the Botanic Garden. As a research team we wanted to show off its work; I wanted to try to convince the academic world that it should at least consider looking at the new analytical tools that have been developed for the business world.
Well, you certainly succeeded in stirring up some discussion. I also had an interesting email exchange with one Tom Finnie, who was defending academics’ use of databases, In the end, however, he conceded that advanced BI tools might not be used in academia because of the costs associated with what are, usually, commercial products; and because much academic research is based on analysing restricted data sets targeted on answering specific questions. “This means that you are probably looking at significantly less than 100,000 (in many cases less than 1,000) records,” he says, “easily within the comfort zone of the resident statistician and powerful modern stats packages”. Fair points, although I don’t think that they invalidate what you’re saying.
Finally, I suppose must ask whether you see any conflicts of interest between being an academic, a professional database consultant and a writer on technology?
For me, this raises a very important point. We are all aware that software tends to polarise opinions and, amongst developers, the choice of database engine is blue-suede-shoe territory.
When I joined The Register, I asked Drew Cullen about the Register’s policy towards the different companies out there. He told me that I could praise or criticise any company/product, without let or hindrance, as long as I had my facts right and that the opinions I expressed were ones I honestly held. The Register “bites the hand that feeds IT” but that doesn’t mean that it’s pro open-source, or anti Microsoft. It means that we are cynical about marketing hype but genuinely interested in technical excellence.
We must be independent enough to criticise Microsoft whenever it falls short of the behaviour it should display. But if we were ever to move to a position where we were frightened to highlight it when Microsoft does do things well, how honestly would we be serving our readership?®
Some good points, but, yes, seriously FileMaker - when it is appropriate <g> It's not designed to compete with mainframe DB2, with thousands of concurrent users.
As always, you really need to look at the requirements first - and only then choose whatever technology is appropriate...
Filemaker may be a decent product for small business or Mac users, but as far as supportability, scalability and reliability go, it is a far, far cry from MS SQL on Windows, or even Oracle on Windows (Oracle on Unix or Linux would be more reliable and scalable than Oracle on Windows, which is buggy, and often poorly ported). I, and all of my colleagues whom with I have discussed Filemaker, and who have supported and designed database implementations with thousands of GB of data and thousands of concurrent users (including me), universally panned this product. In my experience with a few implementations, the reliability and scalability are shameful, and Access on a server performed better.
On their website, Filemaker compares their product with Access, and their server product supports "up to 100 simultaneous users". It isn't even an advanced DB, much less does it have a robust BI package, if it has one at all.
Businesses generally choose MS SQL or Oracle, and there are plenty of good reasons, mainly that:
a) budget is generally available and they can spend money on projects or processes which are important to them
b) efficiency, scalability and stability are usually requirements.
Hopefully some of the good IT practices used in business will migrate over time into other industries, such as academia, non-profits, health care, and public sector.
Have you tried Google-ing for a database?????
I'm amazed by the lack of research you carried in trying to find "any reasonable relational database". What about trying to Google for "database" - there at the top (a sponsored link, yes) is the "world's best selling databse". And you didn't even know about it? I hope your research methods within your chosen acedemic subject is a tad better!
MS SQL easiest to use?!?! God help us! There are millions of users of FileMaker out there, especially within acedemia - it's a shame you didn't realise this before you took the route of "if it's hard to do it must be powerful" and "if it's MS it's probably OK".
It's not too late for you, though - I'm sure you must have other projects on which you could actually try the best of breed apps rather than the de facto MS offering.