At 900k lines of code, ONOS is getting heavy. Can it go on a diet?
'Net greybeard Douglas Comer talks SDN with El Reg
Interview Software Defined Networking (SDN) has changed the landscape of networking, but along the way it has created its own problems. Doug Comer of Purdue University thinks disaggregating SDN controllers like the Open Source Network Operating System (ONOS) could be a way forward.
Comer's name should be familiar to most of The Register's readers: he has been around since the early days of the internet, has served on the Internet Architecture Board, is the author of a small library of books discussing networking, and created the Xinu embedded operating system.
And he's concerned that SDN might be getting too heavy to be scalable, which is why his name is on this paper at arXiv, as co-author with one of his PHD students at the comp sci department of Purdue University, Adib Rastegarnia.
The paper, "Externalization of Packet Processing in Software Defined Networking", proposes doing to SDN what SDN has done to the world of proprietary networking: disaggregating it.
In the hardware space, that has meant taking functions that once lived together in dedicated proprietary boxes, and turning those functions into software that can run on general-purpose computers.
However, in SDN, something has gone wrong: the controllers themselves have become huge projects with a plethora of contributors, and that turns the controller into a bottleneck that has to process every flow in its environment.
Speaking to The Register's networking desk, Comer says the "well over 100" functions that the ONOS includes are burdensome. In the four-plus years since its launch, ONOS has attracted nearly 14,000 commits, and has close to 900,000 lines of code – it's not something you can recompile on-the-fly.
That lies behind his and Rastegarnia's proposal that some of those functions should be disaggregated from the controller.
SDN architectural features the paper highlights are:
- A monolithic controller isn't modular, can't be updated without disruption, and probably won't scale to the terabit networks of the future;
- The RESTful northbound APIs in SDN are poorly standardised, and depend on a single controller;
- Software modules aren't easily reusable across different controllers;
- Today's controllers assume a proactive approach, in which the programmer has to cover all possible cases in the flow rules they create. (Disaggregation, the paper argues, would allow programmers to create reactive management applications that can better respond to changes.)
The idea, Comer tells The Register, is simple: instead of putting everything inside ONOS, do what ONOS does, but separate the functions away from the controller "instead of putting everything inside and compiling one gigantic image".
In disaggregating the software, Comer says, "all we want to do is take services like setting up a route, or firewall setup, and move those services out of the controller software itself".
Why do that? Because that way, he explains, instead of routing the packets to a function, processing them, and sending them back out onto the network, the disaggregated approach makes it easier to move the functions to where the flows need to be.
There are, admittedly risks to such an approach, chief of which is efficiency: compared to today's SDN environments, there's going to be an extra overhead both in processing and in the number of messages you have to move around.
"Every time you say distributed systems, someone says 'inefficient', which is true. How much does it cost, is there a technology that can do this?" Comer says.
As described in the paper, Comer and Rastegarnia settled on the Apache Kafka message-passing environment.
They decided to implement message routing as a component in the controller: it receives all the messages and decides which need to be forwarded, to which function, and how best to pass the messages around.
The message-passing architecture at the heart of the Comer/Rastegarnia paper
- A Kafka message distribution application, in the SDN controller, provides copies of incoming packets to external processes. It listens to the incoming packets, and publishes them on a cluster where they're accessible to external management applications and processes;
- Applications and services use HTTP requests to subscribe to the Kafka message distribution app.
At this point, Comer says, it's vital to remember that the arXiv paper represents very embryonic work: it poses more questions than it answers, and message-passing is one of the most important questions.
However, he believes the paper demonstrates a core characteristic of the disaggregated approach: the controller's role is reduced to handling things like flow setup, route setup, and forwarding rules. He tells The Reg: "If you go that route, I think we've already demonstrated that it's going to be feasible."
Disaggregation means that, for example, in a 10Gbps network there's no longer a requirement to push every packet or every flow through the controller - if non-core network functions are turned into microservices.
"I can put a Docker container anywhere and move it if I want. I can move the container to the best place at the current time," he says.
"So instead of routing the traffic to a physical server where the network function is running, you move the function nearer to the data flow." Something like a firewall, for example, can be replicated so one is launched for every network point in the network. ®