Original URL: http://www.theregister.co.uk/2011/12/02/carrier_iq_interview/

Carrier IQ VP: App on millions of phones not a privacy risk

Like tiny fish through a net, key taps dropped from memory

By Dan Goodin

Posted in Security, 2nd December 2011 23:48 GMT

More than 48 hours after a software developer posted evidence Carrier IQ monitored the key taps on more than 141 million smartphones, a company official has come forward to rebut the disturbing allegations. And he's provided enough technical detail to convince The Register the diagnostics software doesn't represent a privacy threat to handset owners.

Yes, Carrier IQ is a vast digital fishing net that sees geographic locations and the contents of text messages and search queries swimming inside the phones the software monitors, the company's VP of marketing, Andrew Coward, said in an extensive interview. But except in rare circumstances, that data is dumped out of a phone's internal memory almost as quickly as it goes in. Only in cases of a phone crash or a dropped call is information transferred to servers under the control of the cellular carrier so engineers can troubleshoot bottlenecks and other glitches on their networks.

“To answer your point, we're on a fishing boat out at sea and we're catching fish that are too small and they go back in,” Coward explained. “And they go back in for two reasons: One, the holes in the net don't catch small fish, i.e. the filtering, and/or the fish is the wrong type and it gets thrown out of the boat, hopefully while it's still alive.”

The interview came as Carrier IQ faced four lawsuits and a request by a US lawmaker for an investigation by the Federal Trade Commission. US Senator Al Franken has already demanded the Mountain View, California-based company answer a battery of questions, including whether it violates federal wiretap statutes.

The reason the SMS contents and key taps are monitored at all is so they can be used to invoke Carrier IQ programming interfaces, he continued. Messages or key sequences that contain proprietary tags can be used to manually upload diagnostic information. Those that don't contain the special formatting (such as key taps shown in the developer's demo) dissolve into the ether as soon as they come in.

“The content of the SMS is never stored and never transmitted,” Coward said.

His version of the software has been confirmed by Dan Rosenberg, an Android security researcher who has reverse engineered Carrier IQ and examined the underlying machine language. He said he took the undertaking after viewing a video demonstration posted on Monday that showed the software echoing the precise key taps developer Trevor Eckhart typed into his HTC EVO handset.

“What the video is depicting is the application printing out what are known as bugging logs,” he said. “It's a way that applications keep a temporary record of the things they were doing so if anything were to break, a developer could go and read that record and figure out what went wrong. That's very different from the application actually recording that information and sending it off to the carrier.”

What follows are highlights from The Register's interview with Coward:

Carrier IQ speaks

The Register: Explain what's happening in the video. It does appear that the exact sequence of key taps is monitored or logged.

Coward: Those are two different things. The sequence of keys pressed, the sequence of things that happen on a device, gets passed to the software, and we are passing that through a fine filter to figure out what's needed and what needs to be captured and turned into the analytics. This is incredibly nuanced. To our world, there's a vast difference between seeing activity on a device and taking that information and straight copying, verses doing things that are important for the function of the analytics we need to gather.

As far as I know you don't have any kind of privacy policy with any kind of end user, so if you do [collect personally identifiable information] I don't necessarily have any recourse against you anyway. Do you agree on that?

We're not collecting data on our own behalf, and that's really important. The data that's being gathered is commissioned by the operators to be gathered. It's under their control, albeit sometimes in our data center, sometimes in their data center. We have no rights to that data.

You certainly have the potential to see the precise sequence of key taps and you do have the potential to see each message sent, the phone numbers somebody dials, the location where they're typing, but you are choosing not to log it?

Correct, and to prove that's the case, we've brought in security consultants to take a look at our code and take a look at what we're doing and validate it.

One of the problems I had with what you said in the past was you said, we can't do it. I had a tough time reconciling your statement that we can't do it with a YouTube video showing a debugger being plugged in and exactly that thing happening. We can't and we don't are two entirely different things.

My point is that the software was never designed to gather and transmit that text. It was designed to filter all the information that comes through. We can't gather that information because the software isn't written that way. That's the point really, that to gather the analytics this information is constantly pumping through your phone, and tapping into that cycle is a function of what you code and what you're looking for. We're putting a huge filter on that to reduce what we're seeing to the essence of what's needed by our customers to solve the problems they have.

If we're looking at a debugger that is monitoring what goes into Carrier IQ and it is showing the precise sequences of key presses, how is it that the software isn't designed to capture the precise sequences of key presses? I'm still having a tough time reconciling what I see in the YouTube video with what you say.

What we're doing is looking at this vast stream of information that’s coming in and we're filtering it to populate specific analytics that get transmitted back. Some of that information has been described as somewhat sensitive. If you have a dropped call, what was your location and so on.

In seeing all this information come through, the software does not have the ability, because of the way it's written, to say I need to capture your SMS message and transmit it up. What we want to know is how many SMS messages were sent and did it succeed or did it fail.

So the question is why are we even seeing some of this content? Why is it important to us? For example, why bother looking at SMS traffic if you're not going to actually capture the content that sits inside it? And there are very specific reasons.

With the SMS one, there are control messages that come to us through SMS. So the back end system, for example, wants your device to check in or deliver an upload of the latest information, it has the ability to send an SMS message to a device to say please upload this information now. So we look at SMS messages that come in, and SMS messages that come in and are tagged with our details we will look at and say OK, this one is for us, let's process it and follow the actions inside of it.

In other words, a phone with Carrier IQ on it may receive an SMS that has formatting in it that calls some sort of an API?

Right.

The metaphor that's coming to mind is you're a fisher with a big net and you're catching everything that's out there and then you're quickly deciding we're going to throw this out, we're going to throw this out, we're going to keep this. The concern that we're having is we're seeing you guys catch all this stuff and we're not necessarily seeing you throw everything out.

I like the fishing analogy. It's a good one. To answer your point, we're on a fishing boat out at sea and we're catching fish that are too small and they go back in, and they go back in for two reasons. One, the holes in the net don't catch small fish, i.e. the filtering and/or the fish is the wrong type and it gets thrown out of the boat, hopefully while it's still alive.

And to extend this back to Carrier IQ you throw it out before anyone has seen it or before this information has been divulged to anyone?

Correct.

What's the reason for monitoring outgoing key taps, key taps that are typed into a Google search, for instance?

There are a sequence of key codes that can be typed by the user that cause the software to do things in the control center. For example, you can be on the phone with support and they'll say key dial this number and that will cause an upload to take place at that particular point in time.

It seems there must be some sort of buffer of received text messages, or a cache. Am I right?

We receive this information in real time, so a text message comes in, we'll look at it. Is it for us? No, discard. So within the software itself there's this kind of fast process. We shouldn't need to buffer this information.

You shouldn't need to? Does that mean you categorically don't do it?

I haven't had that question before. I can't think of a reason why we'd need to buffer it. Because we're operating in real time, we'll see the SMS come in. Is it for us? Yes, OK, let's deal with it. If not, discard. Just like letting the small fish go through the net, the same analogy applies.

Does that mean SMS messages are never logged?

The content of SMS messages are never logged. There are two things that happen when SMS messages are received. One is, obviously, we count them, the ones that succeed, the ones that fail. We do also record the telephone numbers the SMSs are from and to. So for example, if you send and SMS to me and it fails, you want to be able to work out did it not leave your phone, was it a communication problem with the tower, did it somehow not get to me in the last mile. This is a two-way conversation. You need the know both ends of the chain to understand.

The content of the SMS is never stored and never transmitted.

And who has custody of that information?

As with all the information, the information is not controlled by us. It's controlled by the operator. We have no rights to that data.

So what information gets gathered?

We have profiles and the profiles are designed by the operator, and that actually defines what is or is not gathered. We have customers who just collect failed calls with an upload that takes place once a week. We have others . . . where they get an upload once a day that will contain information about what applications you've been using.

How much data on the average phone running Carrier IQ is actually transmitted in a day, a week or a month?

This is a really important point because obviously the more that you take off a device the more processing power you'd need. If we were doing everything that was claimed, we'd be outstripping Google for requirements of architecture.

The typical upload in for customer care information is about 200KB. That's about 200 times 1024 characters.

That's still a fair amount of information.

One of the reasons for that is there's a huge amount of radio information that gets transmitted. The radio conditions around the call are actually the kind of gold for the operator. Meaning what were the messages that were going between the handset and the tower at the time that caused the call to drop? There's a lot of detail in that radio to handset communication that gets captured.

So that would be part of the 200 KB

Correct.

Looking for my mantra

What percentage of that 200KB do you reckon is radio conditions? Would it be 80 percent, 20 percent?

It varies depending on the customer. It could be as much as 80 percent. Our advice to customers is to keep it within that 200KB framework. Just doubling it to 400KB or doubling it to twice a day obviously doubles the amount of processing power you need to deal with it.

Our mantra has always been to throw away as much information as early as possible. Throw away what you don't need on the handset first. Throw away what you don't need as you start bringing it into the cloud, into the data center, and go from there. Less is more in this case. Even at 200KB per day, if you start multiplying that out by thousands, ten thousands, hundreds of millions [of users], it ends up to be a lot of data.

What kind of legacy is there on handsets that run carrier IQ for the collected data? Is it possible for a very determined individual to grab that phone and pull data off of it?

It's really a function of how often the information has left the phone because once the information has left the phone there's no reason to keep it on the phone. And let's just say you did get hold of that information we gathered with whatever tool you had, you'd still have to understand and decode that entire format and what we did. Unless you're going to guess what we did, you'd kind of have to use our tools to be able do that, i.e. you'd have to do what essentially happens when that package gets to the data center.

But that's exactly what reverse engineers do.

Correct. But again, if customers are uploading once a day, you've got the last 24 hours [of data stored]. And if the uploads take place once a week, the level of information that's going to be recorded is going to be way less.

We all know that stuff is never really deleted unless it's specifically wiped, and that's very processor and battery intensive, so I'm guessing Carrier IQ isn't wiping this stuff clean.

We're operating in the RAM space.

Is it fair to say you can't rule out the possibility that a phone recovered by law enforcement or somebody else may be able to pull some of the data that was collected by Carrier IQ and glean information about key taps that were made, phone numbers that were called, etc.?

The key taps, remember, are being filtered and dropped so that's never making its way into any captured [data]. It's in and out in the same way as the fish net analogy with the little fish. It's a memory copy function, so I see this copy, does the pattern match? No, so discard, please.

The other thing to think about is that while you potentially jump through all these hoops, the operators themselves are going to have all this information one way or another. The operators themselves will comply with law enforcement. They will have a huge amount of information even without our technology.

®