Original URL: http://www.theregister.co.uk/2007/05/10/latent_humans/

Latent problems with people, not networks

Could I have a hot air detector with that, please

By Guy Kewney

Posted in Mobile, 10th May 2007 09:23 GMT

Column I have a useful hot air detector. It listens for two or three keywords and marks the output of anybody using them as suspect. For example, "video conferencing", or "artificial intelligence", or "cheap fusion power".

So when the new managing director of BT Global started his presentation Q&A last week by waving his arms about excitedly and predicting that video conferencing would be one of the breakthroughs of the next decade, the hot air detector rang all its alarm bells.

I'm not saying that video conferencing can't be done. It can. I've seen it done well, and it's not half bad. You have the chairman in Prague, the financial director in San Francisco, the sales team down the studio in London, and the studio manager attentively watching all the camera monitors and switching to people as they intervene in the discussion. It works.

The problem is, it's also very pricey.

To run a proper video conference system you need a skilled studio manager in an expensively-equipped galley, probably with at least one assistant. You need high-bandwidth full video links, good cameras, and people who know how to sit in front of them without walking out of frame.

If you want to see just how bad it can be, watch any TV newscast using a videophone link.

Typically, the reporter on the screen looks like a pixel storm, with the pixels the size of a face. It shifts vaguely about, sometimes showing two heads. With an effort of imagination, you can pretend to make out a darker area low down in the head area, which might be a sort of mouth; and by concentrating you can - almost - make yourself believe that it's moving in a way that implies speech.

What you can't do is relate that movement to the sound track. The frame rate is pitiful, and the background only gets refreshed once every two or three seconds. And, more importantly, at the end of a two minute clip you haven't got the slightest idea what the reporter was saying.

I used my first video mobile phone four years ago. It was awful. I used a video mobile phone last week. Still awful. Why? Well, same reason - bandwidth and power.

The calculation isn't simple, but the rules are. You can get fast frame-rate with high resolution, but to get it, you need either high bandwidth comms or fast processing power.

For human conversation you need low latency.

We are very, very good at reading each other's faces, and reacting accordingly. A twitch of the eyebrows, a flicker of the pupils to point away from you, a slight protrusion of the bottom lip, all can alert us - long before a word can be formed - to the fact that the person we're talking to isn't convinced. We adjust our delivery. Or we see that they're getting angry. Again, we adjust what we're saying.

It sometimes looks telepathic to an outsider. There's the classic story of two schoolboys:

Tom Brown: Here's one: a man gets a raincoat and... Mike Smith: Well, his other leg... Tom Brown: That's it!

They're telling jokes. And with all jokes, it's timing. It doesn't work in high latency comms like letters or email. It doesn't even work in relatively low latency instant messengers. We're talking about a human response of milliseconds. If it goes beyond that, it is more confusing than helpful.

For example, you may have noticed that the major TV stations have stopped accepting delayed-feed reports from outside broadcast reporters, because although the delay is a second or so, the timing of the response leaves you utterly bewildered - and confused.

Now, with a reporter in Baghdad sending a piece "to camera" it doesn't matter. The footage is shot. It gets compressed. It gets sent, packet by packet, and is re-assembled into a podcast format this end, and the studio say: "Now it's over to Tom Smith in Baghdad" and presses "play" and you get full frame video.

With interactive video it's hopeless. To get the same frame rate, and a sufficiently short delay, even a satellite link is too slow in latency terms. And you need very high bandwidth.

You can get around the bandwidth. A really big video compression system (top of range PC will do, mostly) can take full frame video and compress it in real time with an acceptable latency. It's not ideal, but if you're chatting to someone around the world it's not much worse. And so you can use an ordinary ADSL speed comms link, squeeze the video in, and then expand it the other end, and it's still within human response times.

But the world is going mobile...more than half of all phone calls now start, or end, in a mobile number. And the processing power of a mobile phone is perhaps far greater than you realise - but you can only boost the ARM chip up to its impressive fastest for short bursts. For streaming video, at high definition, that sort of workload will flatten the tiny phone battery in minutes. So you build in latency, and cut the frame rate, and reduce the definition - and that's why you don't make video calls on your mobile.

And if someone says "Oh, we'll have that sorted in five years", trigger your hot air detector. No, they won't. Even if batteries double in capacity and efficiency, even if mobile networks quadruple in bandwidth, the need of the human mind for low-latency, high resolution, high frame-rate when interacting with other human minds is still not going to be satisfied. ®