Feeds

ParAccel flashes data warehouses

Thinking in columns

Choosing a cloud hosting partner with confidence

Redmondian roots

ParAccel got its start as an appliance maker front-ending Microsoft's SQL Server database to speed up queries and has gradually transformed itself into a seller of a free-standing database for analytics and data warehousing. The software runs on a stripped down version of Red Hat Linux, which ParAccel has cut all the fat out of and is given just the features needed to run the database.

The software is supported on just about any x64 server, and starting at the end of 2008, the company tapped EMC's Clarrion CX4 arrays as the preferred SAN storage for companies that wanted to use a mix of local and SAN storage for their data warehouses. This was called the Scalable Analytic Appliance.

In May of this year, the SAA II appliance was announced using the Clariion CX4 arrays. The base configuration of this setup comes with eight x64 server nodes and a CX4 model 240 array. (This is a soft bundle, meaning you have to buy the parts yourself, but they are certified to work together). With today's announcement, customers can plug in any flash-based storage device that goes directly into the PCI-Express bus of the server, which is what the database and the operating system can see. You can't use a disk controller with lots of flash drives hanging off it since the database doesn't know how to talk to the controllers; it wants to talk directly to flash.

In the sample rack configuration of the SAA II appliance, ParAccel has eight two-socket Dell 2U PowerEdge servers as compute nodes; each has 24 small form factor 500 GB SAS disks, four Gigabit Ethernet or two 10 Gigabit Ethernet ports. The rack includes one leader server node for managing the database cluster nodes in the rack and a hot standby server. The rack has a CX4-240 or CX4-480 array, which can house up to 60 2 TB disks. With compression on the data, this setup has an effective capacity north of 500 TB.

This setup can deliver database scans on the order of 2,400 MB/sec per server, according to Stanick. Shifting to a flash configuration that uses eighteen dual-socket Xeon 5500 1U PowerEdge servers, each with three SAS drives and two Fusion-io 640 GB flash drives plus the hot spares and the CX4 SAN also has an effective capacity of 500 TB (compressed). Given this, a single server node in the SAA II setup can deliver 2,800 MB/sec per server in database scans and takes up a smaller footprint and uses a lot less energy, too. By switching to flash for some of the local storage, you can get 2.6 times as much oomph chewing on that 500 TB of data.

In June, with the launch of PADB 2.0, a feature called blended scan started shipping, which is one of the reasons why adding lots of flash doesn't boost performance on an individual server. This feature already is boosting performance. Here's how it works. In a typical server node in a data warehouse cluster, each disk is mirrored (RAID 1) so a disk failure doesn't result in the loss of data. So if you have a typical four-node database cluster, with each node having eight drives, only half of them are doing useful work, yielding a scan rate of about 800 MB/sec.

If you hook the four nodes up to a SAN that has 56 mirrored disks, you might see a scan rate of 1,200 MB/sec. With blended scanning, which ParAccel is trying to get a patent on, you designate the disks out on the SAN as being the authorized copy of the data and you mirror there and then use the local disks on the server nodes as a cache for data. The scans run across a mix of the local and SAN disks, yielding a scan rate of 2,800 MB/sec (twice the rate of the four nodes because all the disks are doing useful work in the nodes plus making use of the SAN bandwidth).

The PADB 2.0 analytics database has a list price of $100,000 per TB, but discounts are available for volume purchases. ParAccel, which has several dozen paying customers (including PriceChopper, OfficeMax, Merkle, and Autometrics), also sells the software under a subscription model for $5,000 per TB per month. ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
Azure TITSUP caused by INFINITE LOOP
Fat fingered geo-block kept Aussies in the dark
NASA launches new climate model at SC14
75 days of supercomputing later ...
Yahoo! blames! MONSTER! email! OUTAGE! on! CUT! CABLE! bungle!
Weekend woe for BT as telco struggles to restore service
Cloud unicorns are extinct so DiData cloud mess was YOUR fault
Applications need to be built to handle TITSUP incidents
NSA SOURCE CODE LEAK: Information slurp tools to appear online
Now you can run your own intelligence agency
BOFH: WHERE did this 'fax-enabled' printer UPGRADE come from?
Don't worry about that cable, it's part of the config
Stop the IoT revolution! We need to figure out packet sizes first
Researchers test 802.15.4 and find we know nuh-think! about large scale sensor network ops
DEATH by COMMENTS: WordPress XSS vuln is BIGGEST for YEARS
Trio of XSS turns attackers into admins
SanDisk vows: We'll have a 16TB SSD WHOPPER by 2016
Flash WORM has a serious use for archived photos and videos
prev story

Whitepapers

Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Designing and building an open ITOA architecture
Learn about a new IT data taxonomy defined by the four data sources of IT visibility: wire, machine, agent, and synthetic data sets.
How to determine if cloud backup is right for your servers
Two key factors, technical feasibility and TCO economics, that backup and IT operations managers should consider when assessing cloud backup.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.