Databases

This article is more than 1 year old

MemSQL makes it easier to hook up to Apache Spark

Spark Streamliner coming at you via GitHub

Thu 24 Sep 2015 // 16:09 UTC

Apache Spark may be the fastest data processing engine around for big data, but unless you are conversant in Scala or Java, this cluster computing framework can be a pain to set up and manage.

So here is some help from MemSQL, the in-memory database start-up: a way of letting organisations use Spark without writing code, the company says.

The company today released Spark Streamliner, described as a one click deployment of integrated Apache for fast installs and a single Web-based UI to managed multiple data pipelines. The software is open sourced and available via Github.

These days big data and analytics is all about processing data in real or near-real time, and Spark is the enabling tool to eliminate nowhere-near-real-time batch ETL, says chief marketing officer Gary Orenstein. His company is “backing Spark one hundred per cent”.

And the datasets are getting bigger by the day. The company cites the case of Pinterest, which is already using Spark Streamliner to process 72TB of data a day – or 1GB/sec. Other users/ use cases are not revealed, but include an oil exploration company that is processing reams of sensor data to conduct predictive real-time analytics, according to Orenstein.

MemSQL sits as the data store on top of Apache Spark and makes its contribution to speed by storing and serving data using memory, compared with traditional relational databases which use slower disk storage. But in one key aspect it is a traditional RDBMS in its use – as the company name suggests - of familiar, cosy, ubiquitous SQL.

Throw as much CPU horsepower at the data as they can

Founded in 2011 MemSQL is a 100-person company, venture-backed and shy in disclosing revenues. So it is both a stripling and a minnow compared with the traditional database kings, Oracle and IBM, as well as the enterprise giant SAP, which is chalking up big sales for the new-ish HANA analytics line.

Also in the Forrester Wave report-cum-league table for in-memory database platforms, released August 2015, MemSQL is ranked at the head of the chasing pack behind SAP, Oracle, IBM Teradata and Microsoft.

But the company thinks it has detected a soft underbelly in the market leaders – for starters they can be extremely costly and customers are “essentially handcuffed” to expensive SGI and Exadata machines in SAP Hana and Oracle installations, Eric Frenkiel, CEO and co-founder, says.

In contrast MemSQL deploys a horizontal scale-out approach using commodity hardware and prices by the amount of DRAM used to store the data – a welcome shift in the market, more used to paying by CPU cores, according to Frankiel. These two pillars, encourage customers to “throw as much CPU horsepower at the data as they can,” he says. ®

Topics

Special Features

Vendor Voice

Resources

Databases

MemSQL makes it easier to hook up to Apache Spark

Spark Streamliner coming at you via GitHub

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

Linux Foundation marshals support for open source alternative to Redis

North American S/4HANA migrations ramping among SAP users

Progress outbids private equity in offer for MariaDB plc

Getting on board with AI

PostgreSQL pioneer's latest brainchild promises time travel to dodge ransomware

Beijing issues list of approved CPUs – with no Intel or AMD

Whistleblower raises alarm over UK Nursing and Midwifery Council's DB

Nutanix catapults IP theft sueball at DBaaS startup Tessell

Voltron Data revs up hyper-speed analytics, leaves Snowflake in the dust

Don't be like these 900+ websites and expose millions of passwords via Firebase

PlanetScale ends free tier bid, sheds staff in profitability bid

Oracle investors hear the magic word 'Nvidia' and boom! Buy, buy, buy

About Us

Our Websites

Your Privacy