Feeds

Achtung! Use maths to smash the German tank problem – and your rival

Leaky data is a double agent in intelligence analysis

Beginner's guide to SSL certificates

Leaky data spills your secrets

For a start, not all tanks may have serial numbers drawn from the same pool – different makes and even marks of tanks may use different sequences of numbers. They may not start at one.

Then there is the problem of random (or not) sampling. Older tanks might be sent to Africa and, if we are disabling more tanks there, then the sample will be biased towards the lower end of the population. And so on.

But that’s OK, it is possible to compensate for these factors. We can check to see if different makes of tank share serial numbers (if any do then we know there are multiple series), we can look at the geographical distribution of serial numbers, identify the more modern marks of the same tank and so on. We can also apply simple logic. The serial number of the engine of my first motorbike was CB450E 1008841. Logic tells me that a production run of 8,841 is more likely than 1,008,841 so I would guess that the bike maker, Honda, started the serial numbers at 1,000,000 rather than 1.

And in practice, there are certain factors in the real example that helped considerably. It turned out that the German tanks had different serial numbers on different components; the gear box numbers were the most helpful, but all contributed to the overall picture.

And it worked. As an example, in June 1941, intelligence suggested tank production was 1,550; the estimate from serial numbers was 244. The actual value (verified after the war) was 271.

In essence the Germans were unwittingly "publishing" a set of data; we could say that they had a data “leak.” That data could be extrapolated to yield valuable information that the Germans certainly did not wish to disclose. All that was required was some early data science work by the boffins of the day.

What resonance does that have for us 70 years later in a much more data-savvy world?

I would argue that there is “leaky” data all around us; all we have to do is to think outside the box. Companies very often don’t think about the data they publish and we can either extrapolate from that data (as in the German Tank problem) or simply extract useful information from it.

For example, it is rumoured that a store in America has commissioned frequent aerial photographs of its competitor’s parking lot; it counts the cars and uses this to estimate footfall. There are also opportunities where the leaked data itself is innocuous but yields valuable information when combined with another set.

So, why is this worth thinking about?

First, your career. Now that you know it is possible, all you have to do is to get used to spotting leaked data – once you are sensitised to it you will find it all around you. Your competitors' websites can be a valuable hunting ground. Think about whether you can use it to estimate some missing data (as with the serial numbers) and/or combine that data with other, seemingly innocuous, sets to produce some vital information. If that information gives your company a commercial advantage then you deserve a bonus and a promotion.

Another reason to look at this is to ponder whether your own company is leaking data in some way that could be turned into useful information by the opposition. If so, plug the hole. You are far less likely to be promoted for this but at least you should get some credit.

Finally, if all else fails, send in the tanks. ®

Beginner's guide to SSL certificates

More from The Register

next story
That dreaded syncing feeling: Will Microsoft EVER fix OneDrive?
Microsoft's long history of broken Windows sync
Mozilla, EFF, Cisco back free-as-in-FREE-BEER SSL cert authority
Let’s Encrypt to give HTTPS-everywhere a boost in 2015
SLURP! Flick your TONGUE around our LOLLIPOP – Google
Android 5 is coming – IF you're lucky enough to have the right gadget
Nokia's N1 fondleslab's HIDDEN BRILLIANCE: The 'Z Launcher'
Sugarcoating Android's Lollipop makes tab easier to swallow
Bug fixes! Get your APPLE BUG FIXES! iOS and OS X updates right here!
Yosemite fixes Wi-Fi hiccup, older iOS devices get performance boost
Facebook, working on Facebook at Work, works on Facebook. At Work
You don't want your cat or drunk pics at the office
Soz, web devs: Google snatches its Wallet off the table
Killing off web service in 3 months... but app-happy bonkers are fine
Meet Windows 10's new UI for OneDrive – also known as File Explorer
New preview build continues Redmond's retreat to the desktop
Microsoft: Your Linux Docker containers are now OURS to command
New tool lets admins wrangle Linux apps from Windows
prev story

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Simplify SSL certificate management across the enterprise
Simple steps to take control of SSL across the enterprise, and recommendations for a management platform for full visibility and single-point of control for these Certificates.
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.