Feeds

Running queries on the HMRC database fiasco

Dis-information systems management

The essential guide to IT transformation

Why wasn't the sensitive, non-required, data removed?

According to the Telegraph, the National Audit Office (NAO) asked for the names, National Insurance numbers, and child benefit numbers of every child so that it could select 100 cases at random for its annual audit of Revenue and Customs. The NAO asked for bank and other details to be removed, but an HMRC official replied that, to keep costs down, the HMRC could only provide all of the details on the database.

Now, let's go back to Database 101 for a moment (not a theoretical Database 101; I actually designed and teach on the database course at Dundee University). The first example in the first lecture on querying shows the students how to subset the data by column and then by row. Database engines are built, from the ground up, to perform sub-setting. It doesn't get any easier than this. So, where's the problem?

But, let's assume the worst: that the HMRC uses a very complex, unwieldy engine that cannot subset data easily. Well, no matter how complex the original database was, it is reasonable to assume that it was reduced to a simple format for the CDs. In which case, importing it into an engine that can subset easily is trivial.

So, unless there are some very odd circumstances to which we are not privy, I find it impossible to believe that removing the bank and other details would have involved significant cost.

Go on, Mark, stick your neck out. How much?

Well, the Telegraph has an estimate:

However The Telegraph has established that a typical clean-up operation would cost around £5,000 and take a software engineer less than a week. A spokesman for HMRC said that the £5,000 cost of removing the information "was not a figure we recognise" and declined to discuss the cost because the matter is the subject of a review.

I don't recognise the figure either, but that's because I think the Telegraph is being far, far too generous to the HMRC. Assuming 25 million CSV records, I would estimate half a day's work to subset by column. If I was familiar with the data structure and had done the job before, maybe an hour. Any competent DBA/DBA could do it in the same time. Now DBAs are expensive, but not £10,000 per day. I'd do it for £500.

A spokesperson for the HMRC said: "We don't have infinite resources, we have to use our resources rationally."

One has to wonder about the definition of the word "rationally" here.

Next gen security for virtualised datacentres

More from The Register

next story
Ice cream headache as black hat hacks sack Dairy Queen
I scream, you scream, we all scream 'DATA BREACH'!
Goog says patch⁵⁰ your Chrome
64-bit browser loads cat vids FIFTEEN PERCENT faster!
NIST to sysadmins: clean up your SSH mess
Too many keys, too badly managed
Scratched PC-dispatch patch patched, hatched in batch rematch
Windows security update fixed after triggering blue screens (and screams) of death
Researchers camouflage haxxor traps with fake application traffic
Honeypots sweetened to resemble actual workloads, complete with 'secure' logins
Attack flogged through shiny-clicky social media buttons
66,000 users popped by malicious Flash fudging add-on
New Snowden leak: How NSA shared 850-billion-plus metadata records
'Federated search' spaffed info all over Five Eyes chums
Three quarters of South Korea popped in online gaming raids
Records used to plunder game items, sold off to low lifes
Oz fed police in PDF redaction SNAFU
Give us your metadata, we'll publish your data
prev story

Whitepapers

5 things you didn’t know about cloud backup
IT departments are embracing cloud backup, but there’s a lot you need to know before choosing a service provider. Learn all the critical things you need to know.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Backing up Big Data
Solving backup challenges and “protect everything from everywhere,” as we move into the era of big data management and the adoption of BYOD.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?