Loads of mis-sold PPI, but WHO will claim? This man's paid to find out

Data mining to fathom the depths of banking's balls-up

Build a business case: developing custom apps

Called to account

While all these banks seek to utilise big data to both harmonise accounts and clarify their position regarding the PPI payouts, one of the major players has another big data task on its hands. The UK government bailed out Lloyds during the financial crisis of 2009 with 43.4 per cent stake in the ailing bank. However, European Union law regarded this as state aid and demanded a sell-off to comply with competition rules.

In a project codenamed Verde, Lloyds set about divesting some 630 branches. Its attempts to sell them off to the Co-operative Bank failed recently, as this potential buyer got cold feet in this current economic climate. Yet Lloyds continues the work unabated and intends to offer this ready-made bank, branded TSB, as an IPO instead.

Lloyds has had its own PPI issues to deal with, but this an entirely different project and is nonetheless interesting as it is the reverse of merging – a necessary process in order to select the 630 branches for the sell-off and find out what they can do with the customers. Lloyds has even set up its own bank transfer website to explain the situation to its various account-holders.

Cole has his own take on the issues that this task involves. "Customers don’t think of a branch, they think they are a customer of a bank. Now they’re going to be banking with a newly formed company. So there will be cases where you have a joint account, your wife can have an account in another company, but you have a joint mortgage and things like that, it’s massively complicated. From what I can see, it’s an equally difficult exercise to split up the data as it is to merge it.”

Quality control

Regardless of whether you’re separating out the data or bringing it all together, data quality is the biggest issue that needs to be addressed before any major number-crunching begins. Cole also speaks of "holes" in the data, where information is missing – such as home address or date of birth. “You’d be surprised to see how many unknown genders there are. That’s interesting.” says Cole.

Determining different data classifications is another aspect that clarifies the picture that’s being built up around a customer. Cole says he usually distinguishes between two types of data: behavioural and profile. With behavioural data it’s typically an accumulation of transactions relating to customer activity, such as purchases or website visits. It ends up in a database and that remains unchanged, and simply builds up over time. According to Cole it’s probably the most valuable source of data that can be collected.

“You can ask someone how often they shop in that supermarket and they will say once a week or twice a month but behavioural data will show exactly how often they shop and what they buy.”

By contrast, profile data or research data is data that can change. Marital status, where you live and what you do. Working on filling in these gaps is just one aspect of a data-mining project, as Cole explains.

“One part of the process is to try to make your data better. So where there is missing information, you try to guess. This includes what the gender would be or if you don’t know the income for that person, you make an estimate or you model it based on all the information [you have] on all the other customers.”

This goes beyond just using a post code but can refer to particular spending patterns. When it comes to filling in the gaps, nothing goes to waste. While there are exceptions, it’s far too time-consuming to laboriously go through every customer profile with missing income details to fathom out a likely figure.

“That’s where the data mining comes in,” says Cole. “You would then build algorithms that will use all the data to make that prediction. Alternatively, you can look at the average or examine a certain range of data – there are a lot of different ways to approach it. In your application you are cleaning the data. That means filling out the blanks and simply checking for errors. For example, a phone number typed into the age field, things like that. Looking for outliers. Again in the analytics you’re interested in the breadth, but you’re also interested in what is coming across.”

Meaningful relationships

"Data quality is the biggest issue when you start getting into your task and working with the data. You have a lot of data and you look for relationships, but if you then have something extreme [an outlier] appearing then that could change the whole relationship and create an inaccurate picture. So it’s all about cleaning. Then, by creating these other factors from thousands of data [fields], you’re creating a more manageable amount of factors. You know what you’re looking at in terms of data on the screen.
Revolution Analytics R Studio application

Coding with the open source Revolution Analytics R Studio
Click for a larger image

"The big exercise in the data prep is to get to understand the distribution in the data and the variation. You need variation between two things in order to assess if there is a relationship. If there is no variation, you can’t really say anything from that data. So there’s a lot preparation going on and you’re also normalising data – you’re splitting it up – all sorts of statistical things. You have to massage the data to put it in a form that you can use to run your algorithms. All that you’re doing is programming, writing code.

"You then run your algorithms and select your best algorithms. You get statistics on your screen and you make decisions – it’s often a rigid process. The output could be a credit score or a just number. Or it can be a segment which you would then profile after that. You would send that segment to a marketeer who would then come up with a fancy name for it.

"There’s an operation, a commercial aspect and there’s an insight. And you always try to gain insights because that will help you next time you do the same exercise."

Bridging the IT gap between rising business demands and ageing tools

Next page: Prophecy and loss

More from The Register

next story
NO MORE ALL CAPS and other pleasures of Visual Studio 14
Unpicking a packed preview that breaks down ASP.NET
DARPA-derived secure microkernel goes open source tomorrow
Hacker-repelling, drone-protecting code will soon be yours to tweak as you see fit
Cheer up, Nokia fans. It can start making mobes again in 18 months
The real winner of the Nokia sale is *drumroll* ... Nokia
Put down that Oracle database patch: It could cost $23,000 per CPU
On-by-default INMEMORY tech a boon for developers ... as long as they can afford it
Google shows off new Chrome OS look
Athena springs full-grown from Chromium project's head
Apple: We'll unleash OS X Yosemite beta on the MASSES on 24 July
Starting today, regular fanbois will be guinea pigs, it tells Reg
HIDDEN packet sniffer spy tech in MILLIONS of iPhones, iPads – expert
Don't panic though – Apple's backdoor is not wide open to all, guru tells us
prev story


Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Seven Steps to Software Security
Seven practical steps you can begin to take today to secure your applications and prevent the damages a successful cyber-attack can cause.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.