Databricks wants one tool to rule all AI systems – coincidentally, its own MLflow tool

Turns out people are not that great at tracking thousands of variables

Machine learning

American upstart Databricks, established by the original authors of the Apache Spark framework, reckons its open-source machine-learning management engine MLflow is ready for prime time.

The released version 1.0 of the platform focuses on core API components. It improves the handling of metrics and search functionality, and adds support for Hadoop as an artifact store, in addition to the previously supported Amazon S3, Azure Blob Storage, Google Cloud Storage, SFTP, and NFS.

It also adds an experimental Open Neural Network Exchange (ONNX) model flavour, and a CLI command for building a Docker image capable of serving an MLflow model.

And finally, there’s Windows support for the MLflow client – in the unlikely event data scientists decide to opt for something other than Linux.

MLflow enables data scientists to track and distribute experiments, package and share models across frameworks, and deploy them – no matter if the target environment is a personal laptop or a cloud data centre.

The company launched the alpha version of MLflow project last year at the Spark + AI Summit.

Multiple code approaches

The basic machine learning life cycle – taking raw data, preparing it, training your model and deploying it – is full of variables and fraught with complications. It can involve hundreds of different open source tools and frameworks, each with dozens of configurable parameters.

Facebook, Google and Uber have all built their own proprietary tools to deal with this complexity.

MLflow was designed to take some of the pain out of machine learning in organizations that don’t have the coding and engineering muscle of the hyperscalers. It works with every major ML library, algorithm, deployment tool and language.

Lake_Tahoe_Emerald_Bay

Databricks launches open-source project to drain all your data swamps into info lakes

READ MORE

One of the project’s goals is to improve collaboration between data scientists and engineers that deploy their creations in production.

In a true open source fashion, MLflow users didn’t wait for a stable release to start experimenting: Databricks says the platform has already been deployed at thousands of organizations to manage their machine learning workloads, and the company is offering it as a managed service.

Group effort

Databricks might have started the project, but today, it has more than 100 contributors, including a few from Microsoft.

"People are excited about having an open-source project in this space," Mattei Zacharia, co-founder and chief technologist of Databricks, told El Reg last year.

"They're excited about having an ML platform – it's something that resonates with them, and that many wanted to build already – and having one that is a community effort will be much better than what any company could build on its own."

The next major addition to MLflow will be a Model Registry that allows users to manage their ML model’s lifecycle from experimentation to deployment and monitoring.

You can find full release notes on GitHub, along with the project’s code base. ®

Sponsored: How to Process, Wrangle, Analyze and Visualize your Data with Three Complementary Tools

SUBSCRIBE TO OUR WEEKLY TECH NEWSLETTER




Biting the hand that feeds IT © 1998–2019