Teradata puts feisty start-ups on notice with appliance charge
Grand kit refresh and free 12.0 tester
Teradata, the big daddy of data warehousing, has finally responded to the appliance challenge by rolling out a new family of systems that cater to customers with various budgets.
Rather than shipping a single, beefy system, Teradata will now sell a range of gear that starts with the 550 SMP (symmetric multiprocessing) on the low-end at $67,000 per TB. Next, customers will find the mid-range 2500 at $125,000 per TB and then the high-end 5550 at $200,000 per TB. The company looks for this hardware to offset the attack of data warehouse appliance makers such as Netezza and DATAllegro.
Along with the fresh hardware, Teradata has added another twist. Those tempted by the magic of the Teradata 12.0 database engine can now try out the software for free or pay $40,000 to run it on the Windows box of their choosing. Those interested in the Express Edition will find it here.
To date, Teradata has mostly argued that basing the older 5500 system on "industry standard" Xeon chips kept it in a solid position with regard to price performance. But this new, more diverse gear seems to show that Teradata felt it needed to do a bit more to protect the middle parts of its business.
So, customers can now take the Teradata 12.0 database engine – the company's secret sauce – and slap it on the 550. This gives smaller customers a chance to get into data warehousing and larger players a test and development box.
The 550 runs on a pair of dual-core Xeon 5100 (2.66GHz) chips and supports Suse Enterprise Linux 9 (Wot? No 10?) and Windows Server 2003. The system also supports up to 8GB of memory, four 73GB SAS drives and six PCI slots. You're then meant to combine this server with some additional storage in a shared cabinet.
For a starting price of $67,000, you might have expected Teradata to include some four-core chips and a much higher memory support ceiling in this box. But, hey, this is Teradata's main answer to the appliance market, and appliances are all about getting higher margins on hardware. The company has followed the lead of its rivals well.
Teradata's web site and supporting documentation fails to provide much advice about how customers should position these boxes against each other, but we understand that the 550 can reach up to 6TB.
The mid-range 2500 counts as a proper "integrated" cabinet design that can handle a 6TB data warehouse on its own or be linked with up to 24 of its peers to form a 146TB unit.
It again runs on dual-core 2.66GHz Xeons, but we're not told how many. In addition, the system supports Teradata's BYNET interconnect and ships with a number of other packages, including Parallel Transporter Load and Export Operators for data loading; Meta Data Services and Administrator for management; and SQL Assistant and Basic Teradata Query Utility for SQL generation.
Basically, this system is Teradata's entry to a serious data warehouse-in-a-can. The company boasts that just about anyone could turn this thing on and have a data warehouse running in minutes. So, for some fun around the office, order up one of these suckers, pound a keg and throw a configuration party. (Blows to the head and blindfolds are optional.)
When you want to teach your inventory database a lesson, you step up to the 5550, which Teradata says is twice as fast as its predecessor thanks to new Xeons. This system again ships in cabinet form, and you can fill the cabinet with three different kinds of boards. The entry level E board holds up to two Xeon chips and can be paired with another E board. The C "Coexistence" board holds just one Xeon but can be linked with up to 1,024 other boards, while the H "High Performance" board can hold up to two Xeons (four-core chips available here) and connect with 1,024 of its brethren through multiple cabinets. The 5550 is what you have come to expect from Teradata in that the box works as a hulking data warehouse beast with speedy BYNET interconnects, big-time memory support and fault tolerant parts. You'll find the full specifications here.
Having looked over the new hardware, Netezza executives have come out saying that they're glad that Teradata has finally recognized the success of the data warehouse appliance market by deciding to enter it. Such rhetoric is a bit rich since Teradata pioneered data warehousing in the first place.
Teradata's good name has afforded it the ability to charge top dollar for its sexy software. And customers looking to tackle the hardest data warehousing jobs don't mind paying for what they view as the best kit on the market.
Netezza and others have seized on the growth of data warehousing as a whole and tried to cater to those customers who want to explore the technology without paying for a Wal-Mart-capable system. Now it would seem that Teradata is more prepared to go after these same types of customers.
We still feel for those of you trying to get a handle on the state of the data warehouse market. Even the entry-level systems here start at $70,000, and that price results in a two-way server and some storage showing up at your shipping dock. Other entry-level units come in at about $250,000.
With IBM, HP, Oracle, Sun and a host of start-ups eyeing this market, you can expect those entry-level prices to just keep going down. So, ultimately, there's hope in sight that data warehousing technology will become more available and easier to grasp for the common
The Teradata press release is here. ®
Enterprises do pay a lot for hardware, there is no doubt about that. Warehousing is probably of most benefit for large corporations that have large independent systems servicing different requirements of their customer base. Eg Banks with different personal loan and housing loan systems, Telcos with different pre or post paid mobile provisioning or billing systems etc The result is a need to have a system that can load, store (and keep history) plus relate/match all of the data from these different systems.
My understanding of why Teradata can't be built in house is because it has a component of proprietary hardware which balances the load sharing.
I think it's a different perspective to MSSQL doling work out on an SMP machine to different cpu's. Teradata uses the proprietary hardware (and software) to break apart the query and distribute the data across all the cpu's in the machine - so that each cpu is only looking at information that is a subset of the overall join being performed at that point. This means that for whatever fraction of a second it takes, each join is run by whole machine. Then for the next join the data is transferred or redistributed across all cpus, and repeat.
Because Teradata redistributes the data around the machine, you don't have to be picky with creating indexes on tables (apart from initially specifying which column(s) are used as a basis to distribute the data across the nodes). This helps if you have a big database but can't predict in what way people are going to query it, ie what columns are they going to join tables on, or have lower skillsets running analysis and writing queries that don't match expectations at design time.
Re the processors being virtualised is something to do with allowing load sharing to be tuned, presumably between i/o and cpu.
Well, that's the enterprise hardware market for you. Large companies will joyfully spend like 10x (or more) the cost of building a system for... well, I don't know what for. Everything I've read about Teradata is good, but in general color me cynical about this stuff. It seems like a lot of these big companies will buy a pre-made enterprise product to avoid the costs of doing it from scratch, only to spend more time and effort "integrating" or "customizing" the product than they would just building from scratch. My suspicion is the way many companies are structured, the red tape simply makes it impossible to do this stuff in-house.
@What do they add...
I argue ("that's not fair") that you had to be adding more, hence my question. I don't think you've answered it though. Step by step:;;
> Well, try an optimiser that almost always come up with an excellent query plan, no matter how complex the SQL...that's MUCH harder than it sounds.
* I know how hard it is, and in general it's totally impossible. I can give you simple sql that I'm sure you cannot automatically optimise (I discussed the example in email with Hugh Darwen and he agreed). And if by some magic you could, I guarantee I could find you one you could not. And it would not be large either.
But from experience I know MSSQL can produce excellent query plans for some of the most horrid SQL I've ever seen. So buy MSSQL and drop it onto your stock hardware. NB. I don't work for MS.
> Try unconditional intra-node parallelism - 10 or 12 virtual processors, each owning a virtual disk, running on a single SMP node to tackle each query in parallel.
* Why not use real processors? Why virtualise the disk - all you risk getting is simultaneous reads fighting each other for access to the real disk. At that price you could use real processors each running a disk cluster. In other words, a roomful of bog-standard servers.
And MSSQL can run nicely on an SMP multiprocessor, doling out work to each core as necessary.
> Try automatic table space management, no matter how big the system.
* yeah right. Big DBs need big management. You may provide remote DBA time as part of the package, but that's not quite what you've described.
> Try linear scalability, certified to >1,000 SMP nodes.
* WTF are you doing with that much processing power? And how much would it cost? and how many big (yet bog-standard servers) could you buy and shove in a big room for the price you quote?
> Big banks, telcos and retailers have been using Teradata for decades because "it just works".
* hmmm. And again hmmmmm. Given how brilliantly bankers have recently proven to manage trillions of pounds of assets, says loads for their judgement.
But these days you can buy *big* stock hardware and run big DBs on it, with (what I understand to be) decent analysis tools. And analysis of data warehouses tends to be on large static snapshots, so it can be distributed freely, usually nightly after updating, and processed by multiple different boxes simultaneously. So what are you offering?
I'm afraid you haven't answered my question. And I'm not trying to be destructive, I'd really like to know.