Feeds

Loads of mis-sold PPI, but WHO will claim? This man's paid to find out

Data mining to fathom the depths of banking's balls-up

Combat fraud and increase customer satisfaction

Called to account

While all these banks seek to utilise big data to both harmonise accounts and clarify their position regarding the PPI payouts, one of the major players has another big data task on its hands. The UK government bailed out Lloyds during the financial crisis of 2009 with 43.4 per cent stake in the ailing bank. However, European Union law regarded this as state aid and demanded a sell-off to comply with competition rules.

In a project codenamed Verde, Lloyds set about divesting some 630 branches. Its attempts to sell them off to the Co-operative Bank failed recently, as this potential buyer got cold feet in this current economic climate. Yet Lloyds continues the work unabated and intends to offer this ready-made bank, branded TSB, as an IPO instead.

Lloyds has had its own PPI issues to deal with, but this an entirely different project and is nonetheless interesting as it is the reverse of merging – a necessary process in order to select the 630 branches for the sell-off and find out what they can do with the customers. Lloyds has even set up its own bank transfer website to explain the situation to its various account-holders.

Cole has his own take on the issues that this task involves. "Customers don’t think of a branch, they think they are a customer of a bank. Now they’re going to be banking with a newly formed company. So there will be cases where you have a joint account, your wife can have an account in another company, but you have a joint mortgage and things like that, it’s massively complicated. From what I can see, it’s an equally difficult exercise to split up the data as it is to merge it.”

Quality control

Regardless of whether you’re separating out the data or bringing it all together, data quality is the biggest issue that needs to be addressed before any major number-crunching begins. Cole also speaks of "holes" in the data, where information is missing – such as home address or date of birth. “You’d be surprised to see how many unknown genders there are. That’s interesting.” says Cole.

Determining different data classifications is another aspect that clarifies the picture that’s being built up around a customer. Cole says he usually distinguishes between two types of data: behavioural and profile. With behavioural data it’s typically an accumulation of transactions relating to customer activity, such as purchases or website visits. It ends up in a database and that remains unchanged, and simply builds up over time. According to Cole it’s probably the most valuable source of data that can be collected.

“You can ask someone how often they shop in that supermarket and they will say once a week or twice a month but behavioural data will show exactly how often they shop and what they buy.”

By contrast, profile data or research data is data that can change. Marital status, where you live and what you do. Working on filling in these gaps is just one aspect of a data-mining project, as Cole explains.

“One part of the process is to try to make your data better. So where there is missing information, you try to guess. This includes what the gender would be or if you don’t know the income for that person, you make an estimate or you model it based on all the information [you have] on all the other customers.”

This goes beyond just using a post code but can refer to particular spending patterns. When it comes to filling in the gaps, nothing goes to waste. While there are exceptions, it’s far too time-consuming to laboriously go through every customer profile with missing income details to fathom out a likely figure.

“That’s where the data mining comes in,” says Cole. “You would then build algorithms that will use all the data to make that prediction. Alternatively, you can look at the average or examine a certain range of data – there are a lot of different ways to approach it. In your application you are cleaning the data. That means filling out the blanks and simply checking for errors. For example, a phone number typed into the age field, things like that. Looking for outliers. Again in the analytics you’re interested in the breadth, but you’re also interested in what is coming across.”

Meaningful relationships

"Data quality is the biggest issue when you start getting into your task and working with the data. You have a lot of data and you look for relationships, but if you then have something extreme [an outlier] appearing then that could change the whole relationship and create an inaccurate picture. So it’s all about cleaning. Then, by creating these other factors from thousands of data [fields], you’re creating a more manageable amount of factors. You know what you’re looking at in terms of data on the screen.
Revolution Analytics R Studio application

Coding with the open source Revolution Analytics R Studio
Click for a larger image


"The big exercise in the data prep is to get to understand the distribution in the data and the variation. You need variation between two things in order to assess if there is a relationship. If there is no variation, you can’t really say anything from that data. So there’s a lot preparation going on and you’re also normalising data – you’re splitting it up – all sorts of statistical things. You have to massage the data to put it in a form that you can use to run your algorithms. All that you’re doing is programming, writing code.

"You then run your algorithms and select your best algorithms. You get statistics on your screen and you make decisions – it’s often a rigid process. The output could be a credit score or a just number. Or it can be a segment which you would then profile after that. You would send that segment to a marketeer who would then come up with a fancy name for it.

"There’s an operation, a commercial aspect and there’s an insight. And you always try to gain insights because that will help you next time you do the same exercise."

3 Big data security analytics techniques

Next page: Prophecy and loss

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Inside the Hekaton: SQL Server 2014's database engine deconstructed
Nadella's database sqares the circle of cheap memory vs speed
Oh no, Joe: WinPhone users already griping over 8.1 mega-update
Hang on. Which bit of Developer Preview don't you understand?
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
Batten down the hatches, Ubuntu 14.04 LTS due in TWO DAYS
Admins dab straining server brows in advance of Trusty Tahr's long-term support landing
IRS boss on XP migration: 'Classic fix the airplane while you're flying it attempt'
Plus: Condoleezza Rice at Dropbox 'maybe she can find ... weapons of mass destruction'
OpenSSL Heartbleed: Bloody nose for open-source bleeding hearts
Bloke behind the cockup says not enough people are helping crucial crypto project
prev story

Whitepapers

Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.