Educating Verity

Original URL: https://www.theregister.com/2008/09/13/verity_stob_open_university/

A breeze is riffling academia's pubic hair

Posted in Columnists, 13th September 2008 17:51 GMT

Stob It's my own fault. If you've told me once, you've told me a hundred times to ignore them. You know the sort of thing:

Bacheelor, MasteerMBA, and Doctoraate diplomas available in the field of your choice that's right, you can even become a Doctor and receive all the benefits that comes with it!

Last year, I fell victim to a hankering for a MasteerMSC in Software Development and receiving all the benefits that comes with it. I clicked the link and authorised a payment for the wrong end of a £thou from my Visa card.

But I should admit I have kidded you a little. The advertisement was not worded quite as I have indicated. For the website where I found it was none other than El Reg itself, and the organisation pushing the qualifications was the Open University.

(Pause for a Proustian moment while everybody of my generation plays this trumpet involuntary on their mental televisions.)

M885 Analysis and design of enterprise systems: an object oriented approach arrives as a plastic, self-assembly box file, a three volume set of A4 books containing the course itself, an infuriating and dispiriting tome called The Good Study Guide full of timekeeping and studying advice that is meant to encourage but actually would put off all but the most hardened swot, and a vast amount of miscellaneous paperwork containing many assorted further instructions and prohibitions.

Especially homework. One might expect a week or two's grace at the start of a course since, not yet having learned anything, one has nothing to regurgitate. However, at M885 they subscribe to the principle, familiar from schooldays, that if the hockey pitch is waterlogged, you can always send the kids on a run. Out of the frying pan and into the mud.

Week two's question required us to read a paper, originally published in the prestigious, peer-reviewed journal IEEE Software, called 'Open Source Reuse in Commercial Firms' by T. R. Madanmohan and Rahul De', associate professors at the Indian Institute of Management, Bangalore. When we were done reading it, we had to answer questions. (You may wonder what open source use has to do with object-oriented analysis and design. That would be both of us, then.)

Now I am going to have to ask you to be very brave, because if anything of what I have to say is to make sense then you, too, are going to taste a little of this paper. Calm down. By its own account, it's mostly just a survey asking people how they use open source software - how hard to read can that be? The following is one of the passages we were instructed to discuss in our homework. It purports to explain how companies choose open source software.

Open source projects that are too platform-specific aren't good either. For example, many open source content management system developers have based their spawning, multiple (often competing), and derivative projects on a single platform. To develop them into useful applications requires excessive, code-based customization. While extensibility is important, customers expect to see the inclusion of core features, together with the ability to configure key settings. Open source components with low code volatility, high platform heterogeneity, and high configuration and optimization space are the best choices. Robust test cases and user credibility are other dimensions developers must consider to identify the right components.

I assure you that context improves the sense of the above not one pica-jot. If you don't believe me, click here and you can read the whole thing for yourself… for $29.

My analysis

So what did you make of that? I made nothing at all of it [like this]:

Open source projects that are too platform-specific aren't good either. [Use of 'either' here implies that the previous paragraph discussed an alternative - most likely open source projects that are NOT too platform-specific. It doesn't. Nothing remotely like it. I swear.] For example, [Aha, a clarifying example] many open source content management system developers have based their spawning, multiple (often competing), and derivative projects [Ooh goody, we're playing Pile Up Adjectives: spawning, multiple (often competing), derivative and ginger. Your turn] on a single platform [Still banging on about that single platform, I see. I bet we are going to find out why it is a problem in a moment. Pass me the chocs]. To develop them into useful applications [Sorry, are they not only spawning, multiple (often competing), derivative but also not useful? Where did that spring from? How did you sneak that one in?] requires excessive, code-based customization. While extensibility is important, [What? What? How did we start discussing extensibility? I thought we were discussing platform specificity. By the way, have we had the example yet?] customers expect to see the inclusion of core features, together with the ability to configure key settings. [So why won't their expectations be met? What is it about being platform-specific that has anything to do with the inclusion or otherwise of core features?] Open source components with low code volatility, ['There's a funny smell in here!' 'Excuse me, it's my code. It's a bit volatile.' Yes, I know; but if they mean 'it doesn't change much', why not just say so, for heaven's sake?] high platform heterogeneity, [Hello - are we lapsing back to the nominal topic?] and high configuration and optimization space are the best choices. [Nope.] Robust test cases [Please tell: what is a 'robust' test case? As opposed to a 'delicate' test case, I suppose] and user credibility are other dimensions developers must consider to identify the right components. [First 'optimization space', now 'dimensions'. Giving us both barrels today.]

Protracted analysis of this and many other similar passages benefited me not at all. I began to panic. I printed Open Source Reuse out and spent hours reading it with furrowed brow and moving lips, and annotating its margins liberally with comments such as 'What?' and 'Eh?' and 'Huh?'. Didn't help. I experimented with its odd language in conversations with colleagues ('Your test cases are looking very robust this morning, Brian'), but they were baffled too. My OU tutor also blew me out; he volunteered that he hadn't 'read the paper in detail yet'.

You will think me silly, but I got really worked up about this. I began to wonder if I had not bitten off more than I could chew with this whole OU business. It seemed to me that if could not make head nor tail of more or less the first thing I had been given to read, I should perhaps give up and tackle something more my level - say GCSE Wii Computing for Media Studies.

So it was in the small hours of one morning that I began to Google phrases from Madanmohan and De''s paper. My idea - fairly desperate, but there you go - was to see if I could find somebody else using the same words, but in a comprehensible manner. I started off trying to discover if phrases that appeared in Open Source Reuse had a technical meaning not immediately apparent. I had no luck with this, so after a while I began googling longer sentences, including some from the passage I have just quoted to you.

Always call it 'research'

I can't remember, now, the exact phrase that led me to Tony Byrne's article Open-Source CMS: Prohibitively Fractured? on the CMS Watch site. But something, perhaps 'spawning, multiple (often competing)', led me to this:

Many leading open-source CMS projects have resigned themselves to becoming development "platforms," spawning multiple (often competing) derivative projects to undertake the difficult work of actually fashioning products that will appeal to real business users. To be sure, building a good platform is hard, too. It takes a lot of architectural savvy, trial and error, and constant refactoring. (Certain Apache projects fulfill important lower-level functions and properly remain platforms rather than polished products.) But a platform doth not a CMS application make.

This trend is ironic, because much of the criticism of bloated or failed CMS projects has centered around the commercial products involved being too "platform-oriented" and therefore requiring excessive, code-based customization to convert into practical CMS applications. Extensibility is important, but savvy customers expect to see the inclusion of core features, together with the ability to configure key settings via simple browser interfaces.

It seems familiar, does it not? Except this time, the words are arranged in such a way as to convey meaning.

For example, in Madanmohan and De', the phrase 'to configure key settings' is one of those dangling, sounds-as-though-it-means-something-but-not-quite-sure-what phrases with which their paper abounds. Tony Byrne's version makes things clear: you must be able to configure the damn software with your browser, rather than faff around with a text editor, or write more code. By deleting the key words 'via simple browser interfaces', Madanmohan and De' convert a straightforward observation into a non-committal abstraction.

I should point out, because you are bound to be wondering by now, that Mr Byrne's article is dated May 2003, whereas Open Source Reuse was published in the November/December 2004 issue of IEEE Software. Mr Byrne is not cited in the Madanmohan and De' paper.

After this I got interested. The next paragraph in Madanmohan and De' turned out to be hacked out of another passage of Mr Byrne's, with a sentence in the middle bizarrely plucked from a much earlier paragraph. As before, the boys from Bangalore made an exemplary job of murdering the meaning, while allowing technical words to live.

I went back to the paper, and started googling other dubious-looking passages. A bizarre earlier section (although very readable by Madanmohan and De''s standards) caught my eye; it claimed that the surveyed companies used, among other things, artificial intelligence to locate open source code.

Just think about that for a moment. 'Which open source programmer's editor do you recommend, K9?' 'I hear Notepad++ is very good, master. I have transmitted its url to the Tardis's browser.' 'Good dog, K9!'

I googled diligently, and discovered this pdf presentation. This (earlier, uncited) publication by a group of Spanish and Italian academics lists various techniques for locating open source code (with no claim that they were in commercial use). Madanmohan and De' present this list as their own - indeed as their own results, established by their survey.

Two hits felt like enough. I retired to bed, triumphant.

Spreading the news

I don't know what one should do in this situation. What I have done doesn't seem to have got me anywhere, so you should probably treat the following as a recipe for failure.

The first thing that occurred to me was, that before bandying the word 'plagiarism' about, I had better check my conclusions with a grown up. Kevlin Henney needs no introduction to the serious C++ community; his is a computing name of considerable weight, and experience showed him to be a soft touch when invited to perform unrewarding favours.

I sent Kevlin the paper, and a longer, more detailed version of the explanation above. He replied, concurring that Madanmohan and De' had indeed copied their paper as I have described.

I sent my data to my OU tutor. His reply left me with the strong impression that he had still neither read the paper nor my critique of it:

I'm not qualified to comment on whether a paper published by the IEEE is acceptable or not. If you have an issue with the paper, it would be worth contacting the OU e.g. the Course Manager […] As regards your study, I'd suggest taking the paper at face value irrespective of any doubts you may have over its provenance, and continuing to answer the question.

I approached the course manager, and, after a few days, got a long reply from the chair of the course team, expertly disclaiming any responsibility for the article, and advising me to bog off and buckle down:

[...] the course team agrees that you should follow your tutor's advice and answer the TMA question as set. Note that we have used the article […] in good faith (IEEE Software, the journal in which the article is published, is a prestigious peer reviewed journal). Note also that when we choose an article for the reader it does not necessarily mean that we agree with it or even think it is a good article […] You may be right about plagiarism and we would encourage you to raise the issue with IEEE Software, as it is the responsibility of the journal's editorial board to assure the quality of all their published material…

The admission that the OU course M885 knowingly gives its students 'non-good' articles to read notwithstanding, this took me no further forward. I submitted my homework without answering Question 2 - for how could I answer it, believing to be based on deliberate gibberish? Should I have constructed more gibberish of my own? - and duly paid a 15% mark forfeit for my trouble.

So I sent my complaint to IEEE Software, and here, at last, I found someone who took the complaint seriously. The current editor in chief, appointed long after publication of the article in question and therefore quite innocent of direct involvement, took the trouble to examine the Madanmohan and De' paper, and grokked it. However, I soon discovered he was limited in the action he could take.

The cogs of the IEEE turn at a similar rate to those engraved on the obverse side of a £2 coin. The editor in chief initiated two inquiries: one to determine whether the article was plagiarised, and a second to determine if the plagiarism invalidated the paper as a whole.

After six months, Madanmohan and De' were deemed guilty of Type 4 plagiarism by the first enquiry. (Type 1 plagiarism, the worst, is stealing a whole paper, for example publishing Jeremy Bottlewasher's Theory of Relativity. Type 4 is just swiping the odd sentence, Type 3 is theft of paragraph sized chunks. The penalty for Type 4 plagiarism is simply to apologise to the 'plagiarees'.)

A few months after that, the second committee, the committee that was to determine if Open Source Reuse was invalidated, determined that

53% of the statements are actual well supported pieces of evidence, 20% are weakly supported statement, [sic] and eventually we have 27% of statements that are pure claims […] The framework of OSS usage in industry depicted in the paper is close to reality. Although [sic, I think they mean 'altogether' - VS] circa half of the pretended [sic, presumably a Freud-influenced attempt at 'presented' - VS] pieces of evidence are actually supported by the empirical investigation.

We cannot recommend the removal of the paper from the digital library.

A fascinating judgement, but not without precedent. For was there not once a curate who, in so many words, famously observed that 53% of his egg was good?

Aftermath

So, despite my best efforts, for 29 bucks you too can enjoy Open Source Reuse in the privacy of your own internet café. Other authors of other papers will presumably continue to cite it. The OU course M885 is free to include it in its selection of 'bad' papers, and more unfortunate students everywhere must continue, unknowing, to struggle through the bastardised, meaningless version of Tony Byrne's words.

The main practical upshot that I am aware of is that the little dialog box that the OU's website puts up when you upload homework, threatening dire consequences for submitting plagiarised work, now makes me laugh like a drain.

I realise that I am being naïve, but I really have been quite shocked by this experience. It is a commonplace belief that less good academics, securely tenured, can publish any old rubbish. But, sharing xkcd's discipline snobbery, I had previously associated this sort of thing with fashionable humanities, not our own beloved Comp. Sci. And it is one thing to suspect in a vague, generalised way that local government can be corrupt, quite another to witness your own local councillor trousering a wad of £50 notes.

Oh, and by the way, despite Open Source Reuse: I passed. ®