Actian daubs go-faster stripes on cheapo database kit
Code crammed into nimble x86 processor caches
Comment Ingres descendent Actian says its Vectorwise analytics database tech doesn't need to rely on a flash memory boost: it uses multicore x86 features so well it's more than twice as fast as Oracle and SQL Server, and uses server, storage and networking hardware up to 40 times cheaper - or so we're told.
Actian is actually Ingres, a survivor from the Unix and minicomputer days when relational databases first sprang into prominence. Ingres was a major player back then, but sort of faded, outshone by Oracle's database, IBM's DB2 and Microsoft's SQL Server. The company was bought by ASK in the early 1990s; ASK was acquired by CA in 1994. A private equity group, Garnett and Helfrich, bought the Ingres assets from CA in 2005. It's been renamed and relaunched. Today there are about 10,000 Ingres deployments and the company is profitable, according to corporate marketing VP Kevin Cox.
Vectorwise general manager Fred Gallagher said Actian has used intellectual property obtained from the CWI research institute in Amsterdam to develop the new analytics database that executes TPC-H benchmark SQL queries as fast as the French took to the guillotine. It does this using ordinary x86 servers, and not the hyped-up rigs used by Oracle, Microsoft and others.
Gallagher's pitch is that a lot of modern CPU horsepower is simply wasted by mainstream RDBs and analytics databases. Where such software runs in servers using networked storage arrays the code is slowed by a combination of processor capability wastage, network latency, and storage array latency. Certainly you can reduce and eliminate network and disk array latency by having a server-located flash memory store, but that doesn't fix the wasteful CPU problem.
Actian has focussed on processor throughput, and found that astonishing speed increases can be obtained by rewriting database code to use modern CPUs effectively. It reached this realisation by becoming aware of the CWI research work.
Gallagher says traditional relational databases, with legacy 1970s and 1980s era code, don't use modern multicore and multithread x86 silicon effectively, particularly by not using SIMD (Single Instruction, Multiple Data) instructions, using SISD (Single Instruction, Single Data) instead.
Actian has this to say about SIMD exploitation:
SIMD enables a single operation to be applied on a set of data at once. Vectorwise takes advantage of SIMD instructions by processing vectors of data through the Streaming SIMD Extensions instruction set. Because typical data analysis queries process large volumes of data, the use of SIMD may result in the average computation against a single data value taking less than a single CPU cycle.
The traditional RDBs also tend not to take advantage of large on-chip caches, out-of-order execution and hardware-accelerated string-based operations. The CWI wrote a C++ program to execute SQL queries using all these CPU features they say that traditional RDBs ignore. Gallagher said it ran 100 times faster than a mainstream RDB executing the same queries against the same data. So Actian bought the technology and used it to develop Vectorwise.
Vectorwise clings to the super-fast processor caches, rather a server's main memory, saying that an speed gap has opened up between CPUs and DRAM. While chips have got faster and faster, memory speed has not increased at the same rate. Vectorwise is thus a ground-up rewrite of database software which uses processor hardware features to better effect, and so sprint through SQL queries instead of lumbering through them like a hippo with a limp.
The net of this is that the processors suck up data from the server's DRAM like a reverse waterfall.
The company has participated in TPC-H benchmarks and produced the chart below for our delectation. Two things stand out: first is Vectorwise's speed. At 445,529 QphH its score is more than twice that of the highest Oracle and SQL Server results. However its price-performance ratio is even better when the server, storage and network hardware costs of the systems are taken into account. The highest-scoring Oracle system runs on hardware that costs 44 times more than the vanilla x86 server hardware Vectorwise runs on. It needs an awful lot of disk spindles for a start.
OK - it's a benchmark, and these are list prices, but the differential is so great that it must surely be worth having a look at if you need a fast and affordable data analytics facility.
Actian has a deal with Lenovo, and the two have produced the Vectorwise Data Mart Appliance: a pre-configured software and hardware bundle using Lenovo ThinkServer RD240 hardware. The two say it's compliant with industry standards for SQL, JDBC, ODBC, and .NET support, and operates seamlessly alongside legacy platforms.
There is a hosted Vectorwise service available through Rackspace and another server partner deal may be forthcoming with Huawei.
The Vectorwise analytics database is certified with commonly used business intelligence tools including IBM Cognos, MicroStrategy, Pentaho, SAP BusinessObjects, Tableau and Yellowfin. There are lots of customers, and they include Dixons in the UK, Xerox, ZOHO and Barclays bank. Competing products include IBM's Netezza and HP Vertica.
Gallagher says the software works with VMware, "but we prefer to be on the bare metal". You could certainly prototype Vectorwise in a virtual machine before buying dedicated hardware.
Suppose your organisation sets prices on a weekly or daily basis, based on an assessment of demand revealed by a weekly or daily analytics run. Suppose you could run the same analytics two or three times a day and better match demand to your sellable tickets or seats or whatever via faster pricing and discount decisions? That's what it's about, isn't it? Getting more business by analysing your customer data. Do it faster and you gain more business than your competitors who are doing it slower.
Actian accelerates analytics affordably; that's the El Reg storage desk's summing up, not a corporate Actian marketing slogan. Seems worth a look if you're in the market for that kind of thing. ®
Sponsored: Hyper-scale data management