Startup Iguaz.io is creating real-time Big Data analytics storage
Surviving under the waterfall deluge of Big Data
One-year-old Iguaz.io, an Israeli Big Data startup, has just won a $15m A-round from Magma Venture Partners, JVP and large strategic investors. So what's the magic product that grabbed funding so early in the game?
It seems it's all about big data-handling, which is criticised for being rigid and inflexible, with repeated reads of full data sets and select/extract/load cycles to get additional data sets into the HDFS system.
It all takes far too long. Alternative in-memory processing, such as Spark, to speed up the iterations is expensive in terms of servers and memory. The new upstart seems to be aiming at solving this problem.
The firm was co-founded by CEO Asaf Somekh and CTO Yaron Haviv in 2014.
A March 2015 blog by Yaron Haviv provides some clues.
He writes: "The challenge with Spark is the need to store the entire dataset in memory, and run over all the data, as opposed to read and process only relevant data. This is a challenge since memory and additional servers are quite more expensive than disk or even flash. Spark also lacks key database semantics like record updates, indexing, and transactions, so it is mostly applicable to analyzing medium sized datasets as a whole, or iterative machine learning, not for incremental updates to stored data records or for processing data in the tens of terabytes or more."
Haviv says a better way to deal with real time Big Data analytics includes four elements:
- Scalable and high-speed messaging layer for ingestion (e.g. Kafka)
- Stream or In-memory processing layer (e.g. Spark)
- Batch processing for crunching and digesting large datasets (e.g. MapReduce)
- Interactive Real-Time Analytics and SQL tools for presenting the data
A critical part in the solution is having a shared High-Volume and High-Velocity data repository which can store messages, files, objects, and data records consistently on different memory or storage tiers, provide transactions semantics, and address data security aspects.
It looks like Iguaz.io is setting itself the task of building such a repository.
Enterprise IT has to get faster and nimbler and the public cloud style of providing IT is a threat to traditional IT vendors, says the co-founder: "Today cloud providers deliver significantly better efficiency than what IT can offer to the business units. If that won’t change we will see more users flowing to the cloud, or clouds coming to the users. Amazon and Azure are talking to organizations about building on premise clouds inside the enterprise, basically out-sourcing the IT function altogether."
Haviv blogs about 3D XPoint memory, saying: "To take advantage of sub-microsecond storage we may need to bind applications to low-latency data access libraries which can abstract the new access models along with the traditional disks/file/memory interfaces, and we would need to rethink how we use CPU threads and avoid code blocking and locking. Well with such devices we may need to bypass OS kernel overhead altogether, like the high-performance networking and HPC guys.”
He writes: "We need the equivalent of SPDK, DPDK, RDMA APIs (those OS bypass, lock-free hardware work queue based approaches) for higher level storage abstractions like file, object, key/value storage. This will enable applications (not just storage vendors) to take the full advantage of NVMe SSDs and NV-RAMs, and in some cases be 10-1,000 times faster."
A future product offering 10 - 1,000x faster access to data by applications sounds interesting. If Iguaz.io's engineers build a promising prototype, B-round funding could follow quite quickly. ®