Gold in the BI hills

Savvy developers are keeping beady eyes on the BI job market

  • alert
  • submit to reddit

Protecting users from Firesheep and other Sidejacking attacks with SSL

Comment Organisations are rarely short of data, but the information it contains is often elusive.

Business Intelligence (BI) gives the business user an amazing tool: it turns data into information, making BI an area of huge growth and one where skilled developers are in short supply. It is worth knowing the job roles in a standard BI project so that you can see where you fit in and can secure a lucrative (sorry, useful and effective) role on a BI project.

ETL developer

The first task in any BI system is pulling data from a variety of disparate source systems (Finance, HR, Sales etc), cleaning it, standardising it, and finally loading it into a central repository (often known as a data warehouse). This whole process is known as ETL, Extract, Transform and Load. A new and crucial role, therefore, is that of ETL Developer.

There are many tools for this work: Integration Services within Microsoft’s SQL Server 2005, ETI Solution and Cognos DecisionStream to name but three. Apart from knowing how to drive a particular ETL application (and once you’re familiar with one, conversion to the next is relatively easy) you need to be good at devising the algorithms that will effectively clean a particular set of data.

Proficiency with SQL and the ability to use it imaginatively are vital as you tease out inconsistencies, duplications and other horrors from source system data. It’s a fascinating challenge, especially to anyone with a penchant for problem-solving: I love it. There is no step-by-step ‘do this, this and this and the data’ll all be fine’ approach available for data cleaning; instead you build up a collection of techniques which you apply as and when seems appropriate. If this all sounds a tad informal and lacking rigour, that’s because data cleaning is as much an art as a science.

Before the source data can enter the central repository, someone (or probably a committee of someones) has to agree a set of terminology and business rules to which the cleaned data must comply. It’s common for different systems within an organisation to use the same term to mean different things, and the differences can be fundamental. What is a ‘product’ within the organisation? One that’s manufactured in-house? One you buy in? Both? Neither?

Business rules describe limits or processes: for instance, orders for less than £500 can never attract more than five per cent discount. Without an all-pervasive set of definitions to which all data must conform, information derived from the central data store will lack accuracy and consistency.

Once a set of rules is agreed, cleaned data from the sources are ‘transformed’ in order to meet those rules: achieving this consistency is likely to involve tweaking data types, extensive use of look-up tables and domain checking. Finally the data is ready for loading into the data warehouse.

Once the data warehouse has been constructed, it requires maintenance. The data in the data warehouse is refreshed on a scheduled basis, nightly in many cases, to ensure that analysis is performed against the most up-to-date set of data possible. As a business evolves, changes are inevitable in the source data so new or re-written ETL routines will be required from time to time.

Ultimately the data warehouse is a just complex database and the clues for the next job roles are right there. All databases need to be developed (so BI systems need DBDs) and, once they have been developed, databases need to be administered (DBAs).

OLAP developer

Another role is that of deploying BI solutions as OLAP (On-Line Analytical Processing) cubes. The data in OLAP cubes is not held in familiar relational tables but based on a way of structuring data that involves denormalised data held in dimensional tables which are grouped around a central fact table. The structure is often described as a star schema. The resulting cube of multi-dimensional data is ideal for analysis: it’s dedicated to servicing analysis and rapid response times make it possible for the user to follow ‘train of thought’ analysis paths. Data can be inspected from as many angles as there are dimensions: you might look at sales over time, or by geographical region, by product (now you know what one of those is) or outlet or sales personnel, or indeed all of those or any subset thereof.

Think about DBDs for a minute. They need to understand the relational structure, normalisation, indexing, referential integrity and so on, and also the language for manipulating relational data, SQL.

Developers who are designing and creating OLAP cubes will often need to extract data from relational tables, so they need to have all of the skills of a DBD. In addition they have to be able to think about, and manipulate data, as multi-dimensional sets.

The fundamental difference is that relational data is unordered while multi-dimensional data is highly ordered. This doesn't sound like much of a difference, but it is huge. Think about a very simple problem. A user asks to see the sales figures for two different product groups over a year but displayed as a monthly running total.

Graph showing results of MDX in action

A simple (and perfectly reasonable request) that happens to be very difficult to deliver using pure SQL because SQL knows nothing about the order of the data. When we put data into a multi-dimensional cube, it is perfectly ordered so the problem is trivial to solve, as long as we have a language (essentially an equivalent of SQL) that allows us to manipulate multi-dimensional data.

That language is MDX (Multi-Dimensional eXpressions) – and knowledge of MDX is a very saleable skill. With MDX you can manipulate data (for tasks such as calculating sums, averages and so on) and also query it with commands that ‘understand’ the hierarchically-ordered data in the cube. For example, the MDX to create the running total shown above is:

Sum(YTD(Time.CurrentMember), Measures.Sales)

So for developers, the business intelligence is suggesting that BI is not only here to stay but is also presenting a host of opportunities. Now is the time to take the plunge. ®

Website security in corporate America

More from The Register

next story
Phones 4u slips into administration after EE cuts ties with Brit mobe retailer
More than 5,500 jobs could be axed if rescue mission fails
Israeli spies rebel over mass-snooping on innocent Palestinians
'Disciplinary treatment will be sharp and clear' vow spy-chiefs
Apple CEO Tim Cook: TV is TERRIBLE and stuck in the 1970s
The iKing thinks telly is far too fiddly and ugly – basically, iTunes
Huawei ditches new Windows Phone mobe plans, blames poor sales
Giganto mobe firm slams door shut on Microsoft. OH DEAR
Phones 4u website DIES as wounded mobe retailer struggles to stay above water
Founder blames 'ruthless network partners' for implosion
Found inside ISIS terror chap's laptop: CELINE DION tunes
REPORT: Stash of terrorist material found in Syria Dell box
Show us your Five-Eyes SECRETS says Privacy International
Refusal to disclose GCHQ canteen menus and prices triggers Euro Human Rights Court action
prev story


Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.
Security and trust: The backbone of doing business over the internet
Explores the current state of website security and the contributions Symantec is making to help organizations protect critical data and build trust with customers.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.