Fear of Staxit: What next for ASF's Cassandra as biggest donor cuts back
Fill the DataStax-sized hole with big ideas
I've been a user of Cassandra for quite a number of years. I've suggested fixes for Apache Cassandra and – I believe – was the first to build a small cluster on Raspberry Pi computers. This year I was lucky enough to be voted an Apache Cassandra MVP.
It's for these reasons that I've been saddened by this year's falling out between the Apache Software Foundation (ASF), which is home to the Cassandra Project, and DataStax, the primary contributor to ASF Cassandra since Cassandra shifted there from Facebook in 2009.
To me, it feels to me like a vibrant, responsive and welcoming community has been turned on its head, and in this case, by the very people who are about community building.
Apache Cassandra is governed by ASF's bylaws and procedures, rules that promote the foundation's principals of openness, innovation and community.
The terms of the Apache licence mean anyone can fork a project, develop it and use it commercially – as long as they don't breach the AFS licence and copyright. A number of ASF projects are marketed by commercial organisations, offering extended versions of the project, technical help services or add-on products that have been developed by the company themselves.
Examples include Microsoft's HDInsight, which is described as "a managed Apache Hadoop, Spark, R, HBase and Storm cloud service made easy", with no end of companies offering managed Tomcat or Apache web server applications.
In the case of Cassandra, DataStax had its offering.
As an end user, I was pretty happy with the role played by DataStax – it seemed to me that DataStax were giving back a lot to the Cassandra community and making Cassandra a pretty kick-ass database. The project chair for Apache Cassandra, Jonathan Ellis, is also the chief technology officer and co-founder of DataStax along with committers from DataStax, who were in my view responsible for much of the innovation Cassandra has seen over the past couple of years.
All was going swimmingly with the Apache's Cassandra project. I never really used DataStax's Enterprise Edition – it was too big and had too many additional features for my use case – but for commercial end users it certainly is an easy way to get into distributed database and analysis engines.
Issues between DataStax and the ASF started in June this year. This email is a perfectly reasonable question about the ownership of Java driver for Apache Cassandra. Other questions followed – the role of DataStax in providing cheap training for Apache Cassandra, the role of JIRA (software development tool) in Apache Casandra communications – culminating in a request from the ASF for a special report on the potential control of single company (DataStax) of Apache Cassandra, Planet Cassandra, marketing material and the composition of the Apache Casandra Project Management Committee.
At this point, as a developer and user of the software, I was beginning to feel a little nervous. Why had the ASF suddenly taken an interest in the management of the project? I don't know the answer to this question but by 19 August Ellis had announced he was stepping down from his role as PMC chair.
This was followed by two announcements from DataStax about its role and the future of the community portal Planet Casandra. In the first, Ellis stated that DataStax would in future concentrate its effort on the enterprise edition of Cassandra.
Patrick McFadin, DataStax chief evangelist for Apache Cassandra, announced that one of the major community voices for Apache Cassandra – Planet Cassandra – was shutting down and DataStax's Developer Relations team shifting its attention to the DataStax Academy.
This felt like DataStax had left the building as far as Apache Cassandra is concerned and I'm sure it leaves end users uncertain about the future.
McFadin has been quick to point out that DataStax is not "abandoning" Cassandra. He also said it was the ASF that objected to DataStax's heavy involvement: "ASF was very clear. Single corporate control on a project is not OK. Look at all the new PMC and committers added lately."
Where does this leave us? The community is still very strong on the Apache Cassandra development mail list, plans are still afoot for the release of version 4 with more committers added, and there has been increasing support from companies such as Apple and Instacluster (who provided managed Cassandra clusters).
What isn't clear is the exact role of the Cassandra big boy, DataStax – the commercial entity with a vested interest whose dominance of the project alarmed ASF in the first place.
Take for instance the annual Apache Cassandra summit that was hosted by DataStax. Ellis has said DataStax will continue to provide sponsorship and meet-up support, but who is going to do the conference organising? As for support for Cassandra's software development, McFadin tweeted recently that DataStax would be still heavily involved, but there would be a shift "from pushing features quickly to OSS C*" – open-source Cassandra to the rest of us – which could signal a slowdown in new features for Apache Cassandra.
Is this in the end a good thing for Apache Cassandra? The heavy involvement of a single company meant that there were a lot of resources that could be drawn upon, meaning the development of Apache Cassandra over the past couple of years has been at a blistering pace, moving it from a niche NoSQL product to one of the go-to solutions for data at scale.
DataStax brought a lot of expertise. It moved Apache Casandra from a single-model database to a multi-model one, and brought in its SQL-like language. Would this have happened without DataStax? Possibly but certainly not at the speed they did with DataStax onboard.
The pace of development for Apache Cassandra will continue owing to the fact such a community has sprung up over the years. It's just likely to be slower.
A rough roadmap already exists for version 4 but most of the proposed features look pretty technical or are updates to the protocols used (thrift for instance). I'd like, however, to see something bolder – perhaps extending the idea of a multi-model database or increasing support for JSON to take on other document databases.
The biggest challenge, though, is to get more community involvement. Without that, development could go from slow to slower and stop in the long run.
Was ASF right to step in? Certainly ASF is correct to protect their copyright and the principals of the foundation, but I can't help but think that ASF may have let dogma get in the way of a pragmatic approach to company involvement. The approach felt heavy-handed and accusatory rather than looking for the good side of the DataStax situation. I believe there were many advantages – and, possibly, allowing for a graceful withdrawal from the project, this feels like a sharp laying down of ASF law produced a rapid DataStax exit.
Overall, this should serve as a warning to other companies involved in ASF projects: be careful about your level of commitment and separate your commercial and non-commercial efforts. No matter how much good work you've done, you could get pushed out.
As for me? I'll continue to support the development of Apache Cassandra and use the project where I can. I also look forward to see where DataStax can take its Enterprise Edition. ®
Sponsored: Beyond the Data Frontier