Original URL: http://www.theregister.co.uk/2011/05/05/google_go/
Google Go boldly goes where no code has gone before
How to build all the Google stuff Google won't talk about
Every Google data center has a Chubby, and Heroku wanted one too.
Like the rest of Google's much admired back-end infrastructure, Chubby is decidedly closed source, so Heroku men Keith Rarick and Blake Mizerany built their own. Although they didn't have source code, they did have one of those secret-spilling Google research papers. And they had Go, Google's open source programming language for building systems like Google's.
Running its myriad online services atop a famously distributed back-end, splitting tasks into tiny pieces and spreading them across a vast network of machines, Google needs a way of controlling access to those machines – and that's Chubby. According to a 2006 Google research paper (PDF), there's a least one Chubby instance in each of the company's data centers, coordinating server access for its GFS distributed file system, its BigTable database, and its epic number-crunching platform, MapReduce.
Heroku runs a similarly distributed infrastructure behind its eponymous web service – a means of building, hosting, and readily scaling Ruby on Rails applications – and this too requires the sort of server-juggling you get from Chubby. "When you have a lot of distributed processes – lots of things going on, a network spanning a lot of machines – coordinating things can be tricky," Heroku system architect Keith Rarick tells The Register.
"If a machine goes down or you're having network problems where some machines can't talk to each other or the network is slow or you're getting packet loss, there are so many ways things that can fail. It can be really, really hard to get all those things right if you're doing ad hoc coordination."
Only known image of Google Chubby
Using an algorithm called Paxos, Rarick says, Chubby almost magically simplifies this process. But first, you have to build the thing. And for that, Rarick turned to Go. Conceived by a trio of Google heavyweights – Unix founding father Ken Thompson, fellow Bell Labs Unix developer Rob Pike, and Robert Griesemer, who worked on the Java HotSpot compiler – Go is specifically designed for distributed systems.
"We realized that the kind of software we build at Google is not always served well by the languages we had available," Rob Pike tells us. "Robert Griesemer, Ken Thompson, and myself decided to make a language that would be very good for writing the kinds of programs we write at Google."
Released as an experimental language in late 2009 and now in production at Google, Go includes built-in mechanisms for running concurrent tasks. Using lightweight processes known as "goroutines", you can readily juggle multiple tasks within the same operating system thread or spread them across disparate threads, and all this is automatically scheduled by the Go runtime. The inherently concurrent setup was ideal for Heroku's Chubby mimic, an open source project known as Doozer.
"Go is very good at letting you have multiple points of control in a single program, and coordinating the synchronization and communication between those different points of control," Rarick explains. "That's something that's really useful in a project like [Doozer]. We have a bunch of different processes on different machines trying to talk to each other, and you have to make sure they all have a consistent view of the world."
Go not only provides the concurrency Doozer requires, it keeps memory usage to a minimum. It offers garbage collection, but it also frees you from thread overload. "We don't have to worry about how many threads we're running. We aren't constantly saying 'Are we running out of resources?'" Rarick explains. "We can have lots of goroutines instead, and that's built into the language."
Other languages provide somewhat similar mechanisms for concurrency – such as Erlang and Scala – but Go is designed to provide maximum efficiency and control, as well. "It comes from a line of systems programming language like C and C++, so it gives you the ability to really control the performance characteristics," Rarick says. "When it comes time to measure things and make sure they run fast, you have the flexibility to really get in there and do what you need. And when you figure out why your program is being slow, you really have the control you need to fix it."
It's a unique combination. "C gives you control, but it's not good for concurrency. It doesn't even give you garbage collection," he says. "Go gives you concurrency and garbage collection, but it still gives you control over memory layout and resource use."
Google won't say how Go is used inside its mystery data centers. But Doozer – officially released earlier this month and set to go live at Heroku – is a suitable poster child for the fledgling language. It's no coincidence that Robert Griesemer, one of the original Go architects, also worked on Chubby.
Beauty is in the eye of the coder
Go is rooted in a 1978 researcher paper from Tony Hoare, the British computer scientist who would win a Turing Award for his work on programming-language theory. While at Bell Labs in the late 1980s, Rob Pike drew on Hoare's paper – "Communicating Sequential Processes" – when creating a language known as Newsqueak, and when he sat down with Thompson and Griesemer to build Go, many of the same ideas resurfaced.
Like Newsqueak and Hoare's theoretical CSP before it, Go was fundamentally designed for concurrency. That's concurrency, not parallelism. Concurrent programs may or may not be parallel. "Parallelism is the ability to make things run quickly by using multiple processors," says Rob Pike, the man known as "Commander Pike" on the Go mailing list. "Concurrency is a completely different concept. It's the idea that sometimes a program can be decomposed into structural things that are easily solved independently, and the program itself gets prettier as a result."
But with Go, beauty wasn't a primary aim. Pike and crew had a far more concrete goal. They needed a better way of building the systems that drive Google's back end. "Web servers, but also storage systems and databases. That kind of thing," Pike says. "The world is concurrent and parallel. But the programming languages we use to interact with the outside world – through users on the web and mice and keyboards on a local machine – they don't tend to support that way of thinking about things very well. There was a lot of interesting theory work on this, but very few practical languages."
The idea was that Go would provide concurrency in way that was actually useful. It wouldn't be a programming language you merely gazed at. It would be a programming language you programmed with. Pike calls this a "compromise", betraying a bit of a soft spot for pure programming theory. "Go is a little bit of compromise because it offers very nice support for concurrency, but it's also intended to be very efficient," he says. "You can write realistic server software that can handle a very large load."
This means that Go would be a compiled language, like C. But at the same time, it would be far easier to use than C. Pike wanted something that struck the right balance between the low-level and the high. It would be a compiled language, but it would feel like an interpreted language. It would be statically typed, but it would provide the flexibility and simplicity you expect from a language that's dynamically typed.
The end result is a concurrent language that Google bills as a cross between C++ and Python. "For large programming - programming in the large, like we do at Google, using large systems with many programmers working on them - static [typing] is a huge safety net. It catches tons of stuff early that would not be caught with all-dynamic typing," Pike says. "Go is a real systems language, a compiled language. You can write really efficient code that runs closer to the metal. But you can use these higher-level ideas to build servers out of the pieces you put together."
If it walks like a Python and talks like a Python...
Yes, for the most part, variable types are strictly checked at compile time. But Go automatically manages memory with garbage collection, and using a new concept called an "interface" – similar to but a bit different from a Java interface – there are cases where you can pass data structures and variables without defining a strict type. Go interfaces provide something similar to "duck-typing" in Python and Ruby, letting you pass any variable as long as it exhibits a certain behavior. "There are properties of programs that work better when you have different implementations at runtime," Pike says.
For instance, he explains, one interface is provided by a library called a "writer". To use a writer, you don't need to know what's doing the writing. All you need to know is that it's a writer. "Lots of things can be writers – networks, files, pipes, buffers, http connections," he says. "That means that anything that knows how to use a writer can use those resources to present data." This is reminiscent, he adds, of a "behavior" in object-oriented programming.
For many, the language lives up to its billing. "I've been dreaming about this sort of type safety in a language for awhile, and they've pulled it off," says programming consultant Chip Camden. "There is type safety when you want it, but you can also use an interface, so anything that implements the interface can be passed. This is sort of similar to the Java concept, but Java – as usual – makes it a hard thing. Go makes it an easy thing.
"With Go, all you have to do is provide [something] that implements the interface. You don't have to say it implements the interface. It is, in a sense, like duck typing, but it's still verifiable by the compiler, so it catches problems for you before they start crashing your website."
Go is object-oriented, but it's not type-oriented. There's no hierarchy, and there's no sub-typing. The compiler can determine the relationship between types. "If you're in a programming language like Java and you want to declare some sort of object, there's all this ceremony around just saying what the variable is," says Charles Thompson, a longtime developer who's writing a book on the language. "Go doesn't make you do all that. It just says 'Oh, you're setting a variable equal to this object. I know that its that type of object.' The code is just so much simpler."
What's more, says Brian Ketelsen, who runs a consulting businesses dedicated to Go and Ruby and is also working on a Go book, the compiler is much faster than the norm, and this too adds to the dynamic feel. "The compiler is lightning-quick," he says, "so the turnaround time on a compile is less than it takes to spin up the virtual machine on an interpreted language like Ruby or Python."
On the goroutine
But the point is that Go gives you concurrency, a concurrency specifically suited to modern systems programming. It gives you concurrency that runs close the metal, but it also gives you a new breed of concurrency you won't find in other languages, including Erlang. And this comes from goroutines.
Goroutines aren't threads or lightweight threads. They aren't callbacks. They're processes within a single address space that can communicate with each other. Communication is provided by "channels" running between goroutines, and these channels can transmit multiple signals at once. You can use a channel to send any variable, including other channels.
These processes can run across multiple operating system threads, but crucially, they can also run within threads, letting you handle myriad tasks with a relatively small memory footprint. "The idea is that goroutines time-slice on OS threads, so you can have any number of goroutines being serviced by a smaller number of OS threads, and the Go runtime is smart enough to realize which of those goroutines is blocking something and go off and do something else," Ketelsen says.
"It's using the runtime to 'multitask' using fewer OS resources. Goroutines are much lighter than a thread. You can have many thousands of them running without taking a performance hit."
This is ideal, Rob Pike says, for something like a web server, something that talks to many thousands of clients. "Linux, for many years, had very bad thread support. It wasn't practical to use thread-per-request model for web servers. Threads are so heavy. They require so much space in memory. But it's practical to use a goroutine-per-request model," he explains.
"Goroutines use very small kilobytes of memory and yet they can represent the entire application stream of a client action inside the server. ... You can imagine tens of thousands of goroutines running in a server. We've run benchmarks with tens of thousands of goroutines, and they run very efficiently."
What's more, the channel setup is conducive to communication across a network. Pike points to a channel's ability to send a channel – something akin to, say, a phone call sending a phone call. This is a particularly nice way, Pike says, of building a multiplexer. "When I send a request to the service at the other end of the channel, I can include in that request a channel that only I know about, so the server has to respond only to me."
"That lets you halve the amount of muxing you need to do. You use a mux to get to the service, but then the service has a direct channel back to you. It just returns the answer back to you. You don't have to send that back through the mux."
Go's concurrency setup, Keith Rarick says, mapped perfectly to Heroku's Doozer project. Paxos, the algorithm at the heart of Chubby, operates using independent and concurrent processes that pass each other messages. With Doozer, those processes become goroutines, and messages are passed via channels. "These tools let us avoid complex bookkeeping and stay focused on the problem at hand," Keith Rarick and Blake Mizerany said in a recent blog post. "We are still amazed at how few lines of code it took to achieve something renowned for being difficult."
This is exactly the sort of thing Go was designed for. And presumably, Google is using the language for similar purposes. In May of 2010, Rob Pike announced that the company was using Go for "some production" stuff, but he declined to provide specifics. And he still declines to provide specifics. "Go is being used for lots of things," he tell us. Andrew Gerrand, another member of the Go team, says the language is being used on a "small number of Google systems", and in all likelihood, these systems play a role in the distributed infrastructure that spans Google's worldwide network of data centers.
The New Node?
Open sourced less than two years ago, Go isn't widely used. Besides Heroku – a Silicon Valley darling recently purchased by Salesforce.com – the only big name programming with the service, at least publicly, is Canonical. But given its characteristics – and its pedigree – Go has the potential to reach an audience beyond even the systems-programming set.
"[Node.js is] a single thread. In a very similar amount of code, you could write a goroutine-heavy server that could handle tens of thousands of requests and use all the cores on your machine, so that if the requests were expensive – if they were CPU-intensive – you'd have a chance of keeping up," Pike says.
"Node.js shows great numbers for heavy numbers of clients, and they've done a really good job. But if those clients are CPU-intensive, they've got no place to go. You can't get the parallelism you need. With Go, you get the best of both worlds: You get many clients easily handled, and if they're CPU intensive, you can imagine scaling to a much larger number of requests."
What's more, Gerrand argues, Go doesn't force developers to embrace the asynchronous ways of event-driven programming. "With goroutines and channels, you can say 'I'm going to send a message to another goroutine or wait for a message from another goroutine', but you don't have to actually leave the function you're in to do that," Gerrand says. "That lets you write asynchronous code in a synchronous style. As people, we're much better suited to writing about things in a synchronous style."
Gerrand argues there's a "huge market" for using Go as a back-end web language. "There's a lot of hype about Node.js, but I think that Go actually solves a lot of the problems that Node does, but in a more holistic sense."
There's a familiarity to Go as well. The syntax looks like C, so if you have history with C, or C++, or Java, you can pick up the basics rather quickly. Grasping the concurrency may take a little longer, even if you're experience with something like Erlang – "the concept's a bit difficult to grasp if you've never been exposed to message-passing inside of a program before," says Brian Ketelsen – but Gerrand says that during workshops he has seen programmers with no Go experience write concurrent programs in a matter of minutes.
Go Fibonacci Closure
According to many developers we spoke to, Go has that certain something that makes it suited to almost anything. Gerrand says Google employees are using it to simply grab information from servers. "Google has people who administer apps and services, and they need to, say, write tools that say scrape a few thousand machines statuses and aggregate the data," he says. "Previously, these operations people would write these in Python, but they're finding that Go is much faster in terms of performance and time to actually write the code."
Programmer Charles Thompson uses Go for scripting. And Michael Hoise, another independent developer, has built a Facebook application with the language. The app is called SmartTweets, and he built it just to explore the ins and outs of Go. But it now has over 120,000 users. "It's a stable language," he says. "It can handle the load."
Pike – and others – wouldn't mind seeing Go on Android. "It was built for servers and stuff like that, but one of the surprises was that the design of the language made it really nice for general purpose programming," Pike says. "It's been used for scripting, for user-interface code, for graphics – just about anything."
Programming purists take issue with the language. It's not as "beautiful" as they would like. It makes those "compromises" Pike speaks of. But some believe those compromises aren't compromises at all. "I like a lot of the design decisions they made in the language," says Martin Odersky, the creator of Scala, the language does concurrency atop the Java virtual machine and remade the Twitter back end. "Basically, I like all of them." ®