Original URL: http://www.theregister.co.uk/2009/11/20/pubsubhubbub/
Google touts real-time RSS transplant
What's all the PubSubHubbub?
Web 2.0 NY Google is trumpeting a new messaging protocol it insists on calling PubSubHubbub.
Brett Slatkin - a software engineer on Google’s App Engine team - demonstrated the protocol this week at the Web 2.0 Expo in New York. It aims to turn RSS and Atom into real-time content delivery mechanisms - and maybe even revamp Google search.
In a traditional RSS or Atom implementation, there's a publisher, and there's a subscriber. The subscriber polls the publisher for updates, and if publishers has an update, it pushes it down the pipe. As Slatkin explains, this can be costly for organizations posting content. It results in heavy traffic from subscribers polling for content, and there's unnecessary bandwidth eaten up with each push. Entire feeds are sent to subscribers, not just the changed.
And since the subscriber has the responsibility of requesting new content, RSS feeds aren’t updated in real-time. New content is only available once it's been requested. Say, for instance, a site has an RSS feed embedded on the homepage. New posts to that feed won’t instantly appear on the site where the feed is embedded.
Simple in nature, PubSubHubbub relies on hubs and differentials to transform RSS and Atom feeds into real-time updates and significantly reduce the amount of bandwidth used.
A publisher signs up with a hub provider, and the feed sent to subscribers includes a declaration that points the user to the hub address, telling them the hub is a trusted entity. At that point, the subscriber has the option to subscribe with the hub for real-time delivery.
Whenever the publisher adds new content to the feed, the feed is sent to the hub. The hub, in turn, looks for differences in the feed, removes the content that the subscriber has already received, and multicasts a partial feed that includes just the new content to subscribers.
Because this is designed to work with RSS and Atom, publishers don’t have to implement new solutions. They can continue to use their existing feeds for instant syndication.
A publisher can also deploy its own hub. In this way, Slatkin explained, a publisher can create a system where it pushes out the content on its own in real-time to subscribers whenever content is published.
You can never have enough protocols
The project was started in the summer of 2008. "We wanted a server-to-server protocol for interoperable messaging. We wanted a way for servers to talk to servers,” Slatkin says.
He acknowledges there are already plenty of messaging protocols. But Google has, um, higher expectations than any of those protocols can meet. Slatkin and his partner, Brad Fitzpatrick (who is known for starting LiveJournal), wanted to create a protocol that is designed, among other things, for what he refers to as topic-based messaging. Rather than supporting filtering, PubSubHubbub, a stream is based on a specific URL or address.
They also wanted something scalable for the future and something that would enable instant syndication of content. “RSS and Atom might be good for things today that are generally slow moving on the order of hundreds or thousands of updates per day,” says Slatkin. “We wanted to move in the direction of hundreds or thousands of updates per second.”
As examples, Slatkin listed a number of relatively tame scenarios where he thinks PubSubHubbub would be useful. He described “decentralized social networks” where, for example, people on Facebook can communicate with people on MySpace as if they are on a single social network. He hopes this will lead to federated messages, saying “think of e-mail, but better.” He also said PubSubHubbub could enable blog comments to be instantaneous and therefore more interactive, noting that on some sites, people spend more time reading and responding to comments than reading the actual blog entries.
In his discussion, Slatkin recognized that other messaging protocols, such as AMQP, have good ideas. But he claims they aren’t good at topic-based messaging and that they are very complicated to use. And, according to Slatkin, many just aren’t as scalable as Google would like.
PubSubHubbub, according to Slatkin, addresses all these issues. And he likens his creation to TCP, saying “it would be awesome if we had an application level protocol for streams of data that stands the test of time like TCP has.”
The protocol was made widely available in July of this year. While it hasn’t been widely discussed or publicized, PubSubHubbub is already in use on many blogs. Slatkin says more than 100 million blogs are PubSubHubbub-enabled, including all the Blogger blogs and FeedBurner feeds, and it’s used in apps like FriendFeed and TwitterFeed.
What's good for the web is, well, you know
Though Google has made little hoopla about Hubbub, the company has some big aspirations for the protocol.
When asked how this is good for Google, Slatkin gave the canned response: “What’s good for the web is good for Google. It’s in our best interest to get maximum engagement of people online. And the best way to get maximum engagement is with things that spur real-time conversation between individuals.”
But then he continued with another small aspiration. “Someday we would like to be able to turn off the crawl. So instead of having to go and find what’s new, people would just tell us. You can imagine a day where there’s some fat tube of the Internet, and you can just subscribe to everything that changes, as it changes. Hubbub is something that can actually enable that.”
He summed this up by saying: “What’s in it for Google is hopefully the ability to do real-time indexing of everything, all the time.” ®