The hybrid cloud wants your data
Work in development, analytics, data science? – this is for you
Promo A coding competition is coming soon, brought to you by The Register in partnership with IBM. For now all we can tell you is that IBM's Cloudant platform will play a part in the proceedings.
So let's get the ball rolling with this call to developers, data science professionals and IT analytic architects to sign up now for a free IBM Cloudant fully managed cloud services trial.
You can find out more about IBM Cloudant here. But in the meantime here is a handy executive summary for you.
So what is Cloudant?
In 2016, it's difficult to have a conversation about data without discussing analytics. The pressure is on to massage data and ask it questions rather than simply using it for transactions. That’s all well and good, but many companies don’t have the tools or the computing muscle.
Compromise comes in the form of hybrid cloud infrastructure services. IBM is one firm working to solve the problem, creating environments that give customers the tools to work on their own data sets in the cloud while keeping them in sync with local versions.
The company has grown its Cloud Data Services portfolio through a mixture of organic development and acquisition. It relies heavily on open source systems such as Apache Spark, making it easier to integrate and transfer data sets and applications between 25 different tools. The online platform also provides a selection of post-relational NoSQL databases, designed to support the development of web-scale apps processing many different data types.
One of the linchpins of IBM's Cloud Data Services offering is Cloudant, described as a 'distributed database as a service' offering, in February 2014. The company, founded in 2009, based its NoSQL database on Apache CouchDB. It brings to bear a range of features designed to support customers working on hybrid cloud-based analytics and applications:
Cloudant gave developers the opportunity to replicate data from on-premise systems to IBM's BlueMix-based cloud data environment, and then to keep it synchronized. Developers distribute read/write-capable copies of their data across multiple computing instances, and can rely on the technology to synchronize them over intermittent connections.
Changes to the data are synchronized behind the scenes so that all sites are kept up to date, opening up several use cases for developers and operations staff. Updates to a single instance of Cloudant's Apache CouchDB engine could be replicated to a database cluster, for example, or changes to a cluster could be replicated to a remote datacenter running an identical one.
Developers can design applications to manipulate data offline and then resync it automatically using IBM’s service, leading to what the firm has labelled 'offline-first' design. This makes it easier to manipulate data in the cloud using the tools available in IBM’s Cloud Data marketplace, without creating discrepancies with locally-stored data sets.
Native JSON storage
When it comes to data storage, JSON is the data storage language of the web. It is a self-describing format, storing data elements in human-readable key pairs (eg ‘Name: Robin Birtstone’, ‘Publication: The Register’). This is stored in a tree-like structure that allows elements to be nested.
The big benefit of JSON databases are their flexibility. A traditional RDBMS forces developers to specify the data storage schema up front, making it difficult to change. Adding fields to relational tables is a big deal. The links between the database schema and the application are brittle and fragile. It’s a high-maintenance scenario.
Conversely, JSON users can easily add data fields on the fly, making it easier to update their data structures to support new features in their applications with no heavy lifting.
Advanced search capabilities
Cloudant also gave IBM another key capability: advanced search capabilities. Traditional databases often require specialized add-on modules for the storage and searching of particular types of data, such as geospatial information. Cloudant stores geospatial in GeoJSON, a dialect of JSON designed to handle geospatial properties such as co-ordinates, shapes, and geographical features. The platform includes indexing algorithms optimized for this data type.
Developers and data analysts can use search terms designed specifically for geospatial data types that you’d normally find in niche GIS software, including descriptions of shapes and geometric relationships. The results are viewable on interactive maps provided by Mapbox, which is a mapping service aimed at developers.
In additional to geospatial data, Cloudant also supports full-text searches using its Lucerne indexing technology. Developers have traditionally stored full text separately to transactional data, often creating duplicates and imposing a synchronization overhead. Instead, Lucerne lets them handle the full text natively, describing how they want to index text at a deep level, defining things like word break rules and enabling them to customize for specific languages. Full-text search types include phrase queries, fuzzy matching and proximity queries, and developers can access search capabilities via a RESTful API for simplicity.
Hybrid cloud solutions need flexible deployment, so it is important for developers and operations staff to have a range of deployment options. Cloudant was engineered to be installed in a variety of different models.
These models range from a fully managed service where engineers maintain the database for a customer offsite, through to on-premise deployments in the customer’s own data center, via the Cloudant Local option introduced in October 2014. This would be suitable for more sensitive data. There are also options to deploy native libraries for mobile users on either iOS or Android.
A platform for future data-driven development
The ability to deploy across different environments in different formats, all while keeping data synchronized, also creates a foundation for DevOps. The ability to change data schemas quickly and support new application features is a useful component in agile development, but it needs a stable architecture that takes care of basic synchronization tasks behind the scenes. Having these two things in place allows DevOps teams to focus on tasks such as provisioning, and the smooth introduction of frequent application updates.
We've come rather a long way from the cumbersome, on-premise business intelligence systems we used to grapple with. Now, customers need to work out how they can use the platform to create new kinds of app. Talking to your line of business managers is a good starting point, to see what their problems are – and where the potential opportunities lie.