Feeds

Loads of mis-sold PPI, but WHO will claim? This man's paid to find out

Data mining to fathom the depths of banking's balls-up

A new approach to endpoint data protection

Called to account

While all these banks seek to utilise big data to both harmonise accounts and clarify their position regarding the PPI payouts, one of the major players has another big data task on its hands. The UK government bailed out Lloyds during the financial crisis of 2009 with 43.4 per cent stake in the ailing bank. However, European Union law regarded this as state aid and demanded a sell-off to comply with competition rules.

In a project codenamed Verde, Lloyds set about divesting some 630 branches. Its attempts to sell them off to the Co-operative Bank failed recently, as this potential buyer got cold feet in this current economic climate. Yet Lloyds continues the work unabated and intends to offer this ready-made bank, branded TSB, as an IPO instead.

Lloyds has had its own PPI issues to deal with, but this an entirely different project and is nonetheless interesting as it is the reverse of merging – a necessary process in order to select the 630 branches for the sell-off and find out what they can do with the customers. Lloyds has even set up its own bank transfer website to explain the situation to its various account-holders.

Cole has his own take on the issues that this task involves. "Customers don’t think of a branch, they think they are a customer of a bank. Now they’re going to be banking with a newly formed company. So there will be cases where you have a joint account, your wife can have an account in another company, but you have a joint mortgage and things like that, it’s massively complicated. From what I can see, it’s an equally difficult exercise to split up the data as it is to merge it.”

Quality control

Regardless of whether you’re separating out the data or bringing it all together, data quality is the biggest issue that needs to be addressed before any major number-crunching begins. Cole also speaks of "holes" in the data, where information is missing – such as home address or date of birth. “You’d be surprised to see how many unknown genders there are. That’s interesting.” says Cole.

Determining different data classifications is another aspect that clarifies the picture that’s being built up around a customer. Cole says he usually distinguishes between two types of data: behavioural and profile. With behavioural data it’s typically an accumulation of transactions relating to customer activity, such as purchases or website visits. It ends up in a database and that remains unchanged, and simply builds up over time. According to Cole it’s probably the most valuable source of data that can be collected.

“You can ask someone how often they shop in that supermarket and they will say once a week or twice a month but behavioural data will show exactly how often they shop and what they buy.”

By contrast, profile data or research data is data that can change. Marital status, where you live and what you do. Working on filling in these gaps is just one aspect of a data-mining project, as Cole explains.

“One part of the process is to try to make your data better. So where there is missing information, you try to guess. This includes what the gender would be or if you don’t know the income for that person, you make an estimate or you model it based on all the information [you have] on all the other customers.”

This goes beyond just using a post code but can refer to particular spending patterns. When it comes to filling in the gaps, nothing goes to waste. While there are exceptions, it’s far too time-consuming to laboriously go through every customer profile with missing income details to fathom out a likely figure.

“That’s where the data mining comes in,” says Cole. “You would then build algorithms that will use all the data to make that prediction. Alternatively, you can look at the average or examine a certain range of data – there are a lot of different ways to approach it. In your application you are cleaning the data. That means filling out the blanks and simply checking for errors. For example, a phone number typed into the age field, things like that. Looking for outliers. Again in the analytics you’re interested in the breadth, but you’re also interested in what is coming across.”

Meaningful relationships

"Data quality is the biggest issue when you start getting into your task and working with the data. You have a lot of data and you look for relationships, but if you then have something extreme [an outlier] appearing then that could change the whole relationship and create an inaccurate picture. So it’s all about cleaning. Then, by creating these other factors from thousands of data [fields], you’re creating a more manageable amount of factors. You know what you’re looking at in terms of data on the screen.
Revolution Analytics R Studio application

Coding with the open source Revolution Analytics R Studio
Click for a larger image


"The big exercise in the data prep is to get to understand the distribution in the data and the variation. You need variation between two things in order to assess if there is a relationship. If there is no variation, you can’t really say anything from that data. So there’s a lot preparation going on and you’re also normalising data – you’re splitting it up – all sorts of statistical things. You have to massage the data to put it in a form that you can use to run your algorithms. All that you’re doing is programming, writing code.

"You then run your algorithms and select your best algorithms. You get statistics on your screen and you make decisions – it’s often a rigid process. The output could be a credit score or a just number. Or it can be a segment which you would then profile after that. You would send that segment to a marketeer who would then come up with a fancy name for it.

"There’s an operation, a commercial aspect and there’s an insight. And you always try to gain insights because that will help you next time you do the same exercise."

7 Elements of Radically Simple OS Migration

Next page: Prophecy and loss

More from The Register

next story
PEAK LANDFILL: Why tablet gloom is good news for Windows users
Sinofsky's hybrid strategy looks dafter than ever
Leaked Windows Phone 8.1 Update specs tease details of Nokia's next mobes
New screen sizes, dual SIMs, voice over LTE, and more
Fiendishly complex password app extension ships for iOS 8
Just slip it in, won't hurt a bit, 1Password makers urge devs
Mozilla keeps its Beard, hopes anti-gay marriage troubles are now over
Plenty on new CEO's todo list – starting with Firefox's slipping grasp
Apple: We'll unleash OS X Yosemite beta on the MASSES on 24 July
Starting today, regular fanbois will be guinea pigs, it tells Reg
Another day, another Firefox: Version 31 is upon us ALREADY
Web devs, Mozilla really wants you to like this one
Secure microkernel that uses maths to be 'bug free' goes open source
Hacker-repelling, drone-protecting code will soon be yours to tweak as you see fit
Cloudy CoreOS Linux distro declares itself production-ready
Lightweight, container-happy Linux gets first Stable release
prev story

Whitepapers

7 Elements of Radically Simple OS Migration
Avoid the typical headaches of OS migration during your next project by learning about 7 elements of radically simple OS migration.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Solving today's distributed Big Data backup challenges
Enable IT efficiency and allow a firm to access and reuse corporate information for competitive advantage, ultimately changing business outcomes.
A new approach to endpoint data protection
What is the best way to ensure comprehensive visibility, management, and control of information on both company-owned and employee-owned devices?