Neo4j CEO: We're at 'a huge inflection point for graph databases'
Swede also speeks his branes on Saint Richard of Stallman
Interview Emil Eifrem, CEO and co-founder of Neo Technology, says the world is at “a huge inflection point for graph databases” as his company, which supports the open source Neo4j graph database management system, releases v3.0 of the software.
Ahead of releasing an architecturally overhauled v3.0 of the data management system, the author of O'Reilly Graph Databases spoke to The Register about his focus on staying customer funded, on tweaking how to tweak pitches to developers for managers, his company's involvement with the Panama Papers, and offered his thoughts on Richard Stallman.
38-year-old Eifrem, who is originally from Sweden, started programming “at a very young age, I did a lot of open source work in my teenage years, free software, open source, all that good stuff, in the nineties.”
Neo4j was initially released in 2007, and its community edition is available under the Gnu Public License v3. “Our summary is, whenever you can use MySQL free, you can use Neo4j community for free,” said Eifrem.
“For the longest time, I was a free software guy, not an open source guy,” he told us. “Richard Stallman has done some amazingly important work. I think the world needs idealists and he's definitely one of them. He is one the purest idealists on the planet. I bought a higher percentage of his philosophy when I was twenty than when I'm forty.”
Running a business has driven Eifrem towards what he calls pragmatism: “RMS is many things,” he said, but “pragmatic is not one of them.”
Now quite comfortable with being described as “an open source guy”, alongside the community edition, the company also offers a commercial enterprise edition.
“The community edition is a fully-featured graph database, you can store, retrieve, you can do anything that you want with it. The enterprise edition has certain features that big companies really value. These are things like clustering, fail-over, and high availability.”
Unsurprisingly, Eifrem reckons we're at “a huge inflection point for graph databases. So there used to be ony four databases at all, like, on the planet, and then – bam – there's like 400, for a number of different reasons.”
Gaph databases…. in the first NoSQL explosion, with document databases, with key value stores, graph databases were quite prominent in that, but as kind of the smaller cousin. Now for the past two years graph databases have been the fastest growing category. If you look at a site like db-engines, in terms of buzz and awareness, it is the fastest growing category in databases, not in NoSQL, but the fastest growing category in databases. That, I think, is pretty extraordinary, and we're now getting to this inflection point where we see graph databases going mainstream.
For Eifrem, graph databases take “relationships or connections in data, and put them front and centre.”
“If you're a developer and you're building a system which includes some kind of connections – let's say that the obvious one will be a social network, but it may also be fraud – where you want to take patterns amongst connected data elements,” said Eifrem.
It may be a recommendation where you want to traverse the connections between consumers and the products they've purchased and the hierarchies that those products are in. Even a simple e-commerce application, if you have a shopping cart which includes order items, those order items refer back to a specific product, but that product is never alone, it actually belongs to a category of products.
Making these connections requires “four or five hops already” said Eifrem, “and if you try to squeeze that all into a data model that does not have relationships as a first class citizen – which is basically all the other popular data models suffer from this deficiency, relational, key value, document, etcetera, that is going to lead to two big problems:
“One that I call a compile-time problem, which is basically as you build your application you're going to have to artificially encode this using foreign keys or EmbeddedIDs, stuff like that. And the second one is a performance problem, where when you try to traverse these connections.”
A graph database can be “sometimes a thousand times faster, sometimes a million times faster than a relational database or a document database” Eifrem claimed, and as such “it's really going mainstream.”
“It used to be that it had classic early adopters, which is the Silicon Valley early adopter alpha geek crowd, that paid attention and knew about graph databases and used them. Now, we have 200 customers today, 40-50 per cent of which are the global 2000, and we're talking like financial services, we're talking telcos, four of the top ten retailers on the planet, the biggest 10 retailers in the planet, four are using neo4j. It's really going mainstream, and I think that is a huge shift for us in the graph database space.”
Last December, GCHQ created a Github repository for its own graph database it called “Gaffer”, and billed it as “a framework that makes it easy to store large-scale graphs in which the nodes and edges have statistic such as counts, histograms, and sketches.”
“Of course they have a graph database,” said Eifrem. “What we've seen over the past fifteen years or so is that these huge web firms, the Facebooks and the Googles of the world, have built up this huge skillset around processing big data in real time. And the secret sauce to Google was a graph algorithm, the fact that they took a graph perspective on the documents. They called it PageRank, that's what allowed them to dominate the space compared to Alta Vista and these others guys who did the same thing but without the graph stuff.”
I love that they open sourced it. I think it has zero uptake so far.
“We've learned over the past few years, through Snowden and others, that also the big government institutions have also built up this skillset. This may not be a suprirse to some, it may be a surprise to others, but it's very clear that they have this skillset. I love that they opened sourced it. I think it has zero uptake so far. I think that's going to remain that way, but I may be wrong, who knows.”
Eifrem romanticised Neo's role as democratising access to this technology, to “make that available to small independent journalistic organisations, like the ICIJ with the Panama Papers, and also to the next start-ups who are disrupting the Googles and the Facebooks and the Walmarts and whatever of the world.”
“I think that's coming back a little bit to where we started, to where my own kind of political ideological interest in open source and free software, which is personally very satisfying,” he said. “I'm very proud of the role we play in democratising access to this. I do generally think that it's super important in an open and free society that not only the big web firms and government has access to this technology.”
Moving the company from working primarily with an early adopter crowd to a mainstream crowd changes much, according to the CEO: “Things change in the product, like maybe things like integration with other adjacent technologies become more important for the mainstream audience, and a lot of stuff changes around your communication.”
“It used to be as long as you're hipster hacker lingo cool, you can get away with that,” but now Neo is finding it needs to be more professional in how it communicates the benefits of its system. “You need to be able to address stakeholders that aren't just developers. We have a developer-go-to-market strategy, which is why we're open source, so developers pick up the database, they choose it for their project because they love it, and then they have to communicate to their managers what the value is and why they want to use it.”
“With the early adopter cloud, typically we sell the developer to the tool, and that's mission accomplished. As we go mainstream, there's a procurement department, there's a line of businesses funding the project, there's an architect committee, there's all these – I think of this as a graph process – this is a graph that surrounds the developer node, and we need to arm our champions, the developers, with materials and talking points, and examples, and customer references, to help them communicate the value in a broader fashion, not just pure technical value.”
“We're in the fortunate situation where we've always grown the company in a more European way versus the Silicon Valley way, in the sense that we've made sure that we always have customer funding and revenue.” Eifrem said.
“And that means that we're now in a situation that, if we want to do, we can get to profitability, and we don't need to raise any more money. We've raised a total of $50m to date, which is you know from any kind of reality-check perspective is a lot of money." Of course, this is away from the real world – "in Silicon Valley, it's not."
"If you look at some of these other database companies, they raise hundreds, two hundred, three hundred, million dollars, which we just think is a ridiculous amount, and sometimes it makes sense, but also you take on this huge risk where you grow the organisation to be way bigger than your customer funding actually allows. We haven't found that, we're going to grow fast, we're going to raise money to have that as some buffer, but really we want to stay customer funded as much as possible.” ®