Like an everflowing stream: New tech promises remote S3 nearline disk performance

Cool, but streaming doesn't mean screaming

Stream waterfall in a forest

Analysis You can't store files in Amazon's public cloud, access them on-premises, and expect local disk access performance.

You can store them in a sync-and-share facility like Box and Dropbox but then they have to be downloaded completely. It's not so good for large files, large data sets and production environments.

You could also use a cloud storage gateway, like Nasuni or Panzura, which works fine but adds complexity and may not scale.

Startup LucidLink claims it uses local metadata caching, parallel TCP/IP streaming, pre-fetching and caching to make public cloud-stored files usable for on-premises primary data storage.

It was founded by two ex-DataCore people, CEO Peter Thompson and CTO George Dochev. That background is relevant because DataCore uses parallelised IO in its record-breaking SPC-1 v1 benchmark results.

Dochev was DataCore's director of software engineering until June 2015. LucidLink was founded in January 2016, took in $1.6m in seed funding in December that year and has just had a second seed round, $5.5m, this year.

What they appear to have come up with is a faster way of streaming files from remote object stores.

Let's start by having files stored as objects in S3 buckets, Amazon being their first supported cloud.

The things that get in the way of being able to use NFS, CIFS or SMB to stream data from them for on-premises use are time and latency. TCP/IP, for example, is a chatty protocol, with many metadata message sequences as well as data transfer sequences.

Specialist suppliers, such as Bridgeworks in the UK, speed things up by parallelising TCP/IP streams and so cut the transfer time. That's part of what LucidLink's technology does.

An architecture diagram shows a LucidLink store in Amazon S3 and a LucidLink app (or agent) in the customer's server. This stores synced metadata from the Amazon-resident LucidLink store and presents the LucidLink files as part of the local server's OS file system and folders/mount points.

LucidLink_architecture_650

Click to enlarge

If a user's application needs a file then it is streamed from the AWS S3 store on-demand to the local server and available for use as soon as the initial set of bytes have been received. Parallel TCP/IP streaming is used; metadata chatter is reduced, and pre-fetching and caching speed things up as well.

A LucidLInk demo shows a server booting a Hyper-V VM from an Amazon S3 object store 140km away.

Youtube Video

It takes about a minute for the VM's login screen to appear. Once the VM is in the local cache then a subsequent boot takes about six seconds.

What we have here is a means of using cheap cloud object storage and accessing files there as if they were stored on a local disk, though a fairly slow one. We might suggest it can turn an S3 archive into a nearline file store.

The streaming tech is bi-directional. So LucidLink's agent could operate in a data-producing edge device which streams data up the cloud, e.g. video surveillance data. It has a customer doing this in production using the AWS Government cloud.

LucidLink provides its product tech as a subscription-based, pay-as-you-go service. It provides some cost comparisons to justify the worth of its technology:

  • AWS Elastic Block Storage (EBS) – $1,230 per TB per year
  • AWS Elastic File System (EFS) – $3,680 per TB per year
  • LucidLink + AWS S3 – $895 per TB per year

As LucidLink uses S3, in principle any S3-compatible object store could be used for its repository. That means Azure, with support coming, and GCP, which will be supported after Azure. Other potential targets are BackBlaze B2, Cloudian, Scality and SwiftStack.

Its roadmap includes live data replication and migration between regions in a cloud and clouds, snapshotting, audit capabilities, third-party software integrations and mobile support, in that order.

In theory, we think LucidLink could use its streaming file transfer technology to send data to/from public cloud file stores as well as object stores, if that became an economic thing to do. That way the currently necessary object-to-file translation process could be junked.

tl;dr

LucidLink is a software-based streaming and distributed cloud file storage access technology, using a public cloud S3 repository that provides nearline disk access speed to data in the cloud. ®

Sponsored: Minds Mastering Machines - Call for papers now open




Biting the hand that feeds IT © 1998–2018