Linux Foundation wants to do to data what it's done for software
Penguins and machine learning. It could happen!
OS Summit The Linux Foundation has created one open-data licence framework to rule them all, allowing users to collaborate on data-driven projects.
Today at the Open Source Summit in Prague, executive director Jim Zemlin announced the Community Data License Agreement, which is designed for non-proprietary data.
The org says data producers can now share the goods "with greater clarity about what recipients may do with it".
One branch "puts terms in place to ensure that downstream recipients can use and modify that data, and are also required to share their changes", while the other does not oblige users to share those changes.
Zemlin told the conference's crowd of open-source and enterprise IT pros that the rise of machine learning and AI algorithms, which need to be trained, has made data so important that this had to be done now.
There are quite a few licences data-wranglers use for open data right now, like the various Creative Commons options.
We spoke to Laurent Pinchart, a freelance Linux Kernel developer, at the summit, where he told The Register that he was curious as to what the foundation thought was lacking from Creative Commons or the ODBL licence used by Open Street Map.
Some developers see the new "data privacy agnostic" licence as something that will gain adoption because of the visibility of the Linux Foundation. Each individual who works through the data will have to work through various jurisdictional requirements and legal issues, the Foundation has pointed out.
"Data is getting more and more relevant," Mark Jonas, a developer at Bosch, told The Register in Prague. "It could become standard if it's good."
Brian Exelbierd, the Fedora Community Manager at RedHat, expressed enthusiasm but wondered why there were only two licences, although he expects that they are standards that will be iterated on.
Mike Turquette, CEO of embedded devices consultancy BayLibre, said: "The idea of putting out a licence is more important than the licence itself" – because it opens a discussion.
He cautioned that this would be for firms that open up their data. In cases such as Google or Twitter, "you're not going to be keen on sharing data."
Eben Moglen, Professor of Law at Columbia Law School and founding director of the Software Freedom Law Center said in a canned statement: "Clearly expressed, well-designed rules for 'share alike' treatment of collaboratively produced data will enable massive cooperation and help us resist over-concentrated ownership of the resource most crucial to 21st century social and economic development." ®