Feeds

Sorting the ETL men from the boys

Diverging paths

Choosing a cloud hosting partner with confidence

Comment The ETL (extract, transform and load) market, far from commoditising, is diverging. To begin with, ETL is no longer an appropriate term to use, both because operations are no longer limited to the order indicated but also because the technology encompasses far more than just moving data into a warehouse. However, I don't like the alternatives such as "data movement" and "data transfer" much, while "data integration" is too broad, so I guess we are stuck with ETL. However, this is by no means the only area of divergence.

Perhaps the most obvious change in the market is the growth in code generating products and there is now a clear split in the market between black box solutions and code generating approaches. While the former saw off the previous generation of code-based products a decade ago, it is by no means clear cut that they will do so again: SQL and Java are much more portable than the Cobol-based products of the early nineties.

Code-based approaches are also helped by the many ISVs that want the ability to embed specific ETL capabilities within their own products, and there are a number of newer ETL suppliers specifically targeting this market either directly or in a complementary fashion. For example, Baycastle focuses on doing things like moving data into contact management systems.

Another major change has been the advent of Open Source (Clover and Kinetic Networks' KETL) products and even shareware products (DB Software), which should help to drive user acceptance of the "don't hand code" message and which can only benefit everybody.

However, returning to the established players versus the new entrants discussion, the big advantage that the former have is that they provide lots of complementary functionality, notably with data quality, enterprise information and application integration and so on, though this is not limited to black-box solutions (witness Sunopsis).

Finally, the latest area of divergence is in the ability to support the extraction, transformation and loading of unstructured and semi-structured content. Of course, the concept of unstructured content is a nonsense – if it was really unstructured it would collapse into a heap – but, for the purposes of this discussion I mean Word and pdf documents and the like on the one hand (unstructured) and HIPAA, EDIFACT, SWIFT and similar documents (semi-structured on the other).

Of course, this is not entirely new: Ascential has had abilities in the area of semi-structured data ever since it bought Mercator (now DataStage TX), while Hummingbird has offered the ability to extract unstructured content for some time, largely because it is the only ETL vendor that is also a major content/document management provider. However, Informatica has now added this capability as generic functionality and other vendors are likely to follow suit.

If the ability to build applications that combine content and data is to be the major growth area that many suspect that it will be, then the ability to support ETL functions against content as opposed to data is likely to be a defining factor and will sort out the ETL men from the boys.

Copyright © 2005, IT-Analysis.com

Business security measures using SSL

More from The Register

next story
'Windows 9' LEAK: Microsoft's playing catchup with Linux
Multiple desktops and live tiles in restored Start button star in new vids
Not appy with your Chromebook? Well now it can run Android apps
Google offers beta of tricky OS-inside-OS tech
New 'Cosmos' browser surfs the net by TXT alone
No data plan? No WiFi? No worries ... except sluggish download speed
SUSE Linux owner Attachmate gobbled by Micro Focus for $2.3bn
Merger will lead to mainframe and COBOL powerhouse
iOS 8 release: WebGL now runs everywhere. Hurrah for 3D graphics!
HTML 5's pretty neat ... when your browser supports it
Greater dev access to iOS 8 will put us AT RISK from HACKERS
Knocking holes in Apple's walled garden could backfire, says securo-chap
NHS grows a NoSQL backbone and rips out its Oracle Spine
Open source? In the government? Ha ha! What, wait ...?
Google extends app refund window to two hours
You now have 120 minutes to finish that game instead of 15
Intel: Hey, enterprises, drop everything and DO HADOOP
Big Data analytics projected to run on more servers than any other app
prev story

Whitepapers

Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.
Security and trust: The backbone of doing business over the internet
Explores the current state of website security and the contributions Symantec is making to help organizations protect critical data and build trust with customers.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.