Feeds

Running queries on the HMRC database fiasco

Dis-information systems management

Beginner's guide to SSL certificates

Why wasn't the sensitive, non-required, data removed?

According to the Telegraph, the National Audit Office (NAO) asked for the names, National Insurance numbers, and child benefit numbers of every child so that it could select 100 cases at random for its annual audit of Revenue and Customs. The NAO asked for bank and other details to be removed, but an HMRC official replied that, to keep costs down, the HMRC could only provide all of the details on the database.

Now, let's go back to Database 101 for a moment (not a theoretical Database 101; I actually designed and teach on the database course at Dundee University). The first example in the first lecture on querying shows the students how to subset the data by column and then by row. Database engines are built, from the ground up, to perform sub-setting. It doesn't get any easier than this. So, where's the problem?

But, let's assume the worst: that the HMRC uses a very complex, unwieldy engine that cannot subset data easily. Well, no matter how complex the original database was, it is reasonable to assume that it was reduced to a simple format for the CDs. In which case, importing it into an engine that can subset easily is trivial.

So, unless there are some very odd circumstances to which we are not privy, I find it impossible to believe that removing the bank and other details would have involved significant cost.

Go on, Mark, stick your neck out. How much?

Well, the Telegraph has an estimate:

However The Telegraph has established that a typical clean-up operation would cost around £5,000 and take a software engineer less than a week. A spokesman for HMRC said that the £5,000 cost of removing the information "was not a figure we recognise" and declined to discuss the cost because the matter is the subject of a review.

I don't recognise the figure either, but that's because I think the Telegraph is being far, far too generous to the HMRC. Assuming 25 million CSV records, I would estimate half a day's work to subset by column. If I was familiar with the data structure and had done the job before, maybe an hour. Any competent DBA/DBA could do it in the same time. Now DBAs are expensive, but not £10,000 per day. I'd do it for £500.

A spokesperson for the HMRC said: "We don't have infinite resources, we have to use our resources rationally."

One has to wonder about the definition of the word "rationally" here.

Protecting users from Firesheep and other Sidejacking attacks with SSL

More from The Register

next story
Spies would need SUPER POWERS to tap undersea cables
Why mess with armoured 10kV cables when land-based, and legal, snoop tools are easier?
Early result from Scots indyref vote? NAW, Jimmy - it's a SCAM
Anyone claiming to know before tomorrow is telling porkies
Jihadi terrorists DIDN'T encrypt their comms 'cos of Snowden leaks
Intel bods' analysis concludes 'no significant change' after whistle was blown
TOR users become FBI's No.1 hacking target after legal power grab
Be afeared, me hearties, these scoundrels be spying our signals
Home Depot: 56 million bank cards pwned by malware in our tills
That's about 50 per cent bigger than the Target tills mega-hack
Hackers pop Brazil newspaper to root home routers
Step One: try default passwords. Step Two: Repeat Step One until success
China hacked US Army transport orgs TWENTY TIMES in ONE YEAR
FBI et al knew of nine hacks - but didn't tell TRANSCOM
Microsoft to patch ASP.NET mess even if you don't
We know what's good for you, because we made the mess says Redmond
NORKS ban Wi-Fi and satellite internet at embassies
Crackdown on tardy diplomatic sysadmins providing accidental unfiltered internet access
prev story

Whitepapers

Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.
Protecting users from Firesheep and other Sidejacking attacks with SSL
Discussing the vulnerabilities inherent in Wi-Fi networks, and how using TLS/SSL for your entire site will assure security.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.