I know what you downloaded from Freenet

Anonymous P2P network open to easy forensic attack

  • alert
  • submit to reddit

SANS - Survey on application security programs

Exclusive The Freenet Project has been around since 2000. It was designed as a stealthy P2P network (some have called it a "darknet") that distributes its content so broadly that it's impossible to censor.

There are a number of security features in Freenet that other P2P networks lack. Because data that the network's various nodes exchange is encrypted, it's difficult, though not impossible, for an outside observer to know what's being passed between two nodes. It is also nearly impossible to identify the author of a Freesite, or to identify the person responsible for inserting content into the network, unless they wish to be known. Most importantly, it's nearly impossible for an outside attacker to determine whether a given node is requesting the data being sent to it, or is merely relaying it to another node.

These layers of obscurity and limited anonymity are what enable Freenet participants to exchange information freely. Content that is illegal, whether rightly or wrongly, flows freely through the network, cached on thousands of computers worldwide.

Who knows where that stuff came from?

Each participant necessarily operates a Freenet node, which caches encrypted data that has either been requested by that node's owner, or requested by other Freenet nodes. That is, one's node will cache data that it is merely proxying for others. Caching enables the broad distribution of content that makes Freenet impossible to censor. It also introduces doubt about the origin of any data found in a given node's cache. It helps to provide deniability.

Of course, anyone can find out what data is in their cache by decrypting it. If one applies the correct Content Hash Key (CHK), the data will be revealed. But because it's encrypted, one can avoid knowing what's in their cache simply by neglecting to run a list of CHKs against it - hence deniability in case a forensic examiner should locate illegal files in one's Freenet cache. It is, or rather, ought to be, impossible to determine whether the owner of a particular machine requested the files in his cache, or if his node merely proxied and cached them for others.

Obviously, this works only so long as cached data that the node's owner has requested, and cached data that his node has proxied, are indistinguishable. Unfortunately, The Register has discovered that this is not the case for large files.

Behavioral differences

Let there be a file of, say, 700 MB - maybe a movie, maybe warez, and possibly illegal, that you wish to have. Your node will download portions of this "splitfile" from numerous other nodes, where they are distributed. To enable you to recover quickly from interruptions during the download, your node will cache all of the chunks it receives. Thus when you re-start the download after an interruption, you will download only those portions of the file that you haven't already received. When the download is complete, the various chunks will be decrypted and assembled, and the file will be saved in your ~/freenet-downloads directory.

If you destroy the file but leave your cache intact, you can request it again, and the file will appear almost instantly. And there's the problem.

Freenet distributes files in a way that tends to select for frequently-requested, or "popular" data. This is partly because the other nodes that one's requests pass through will also cache parts of any files one requests. The more often a file is requested, the more often it will be cached, and the more nodes it will appear on.

We tested this, and found that a 50 MB file took six hours to download the first time we tried. After we eliminated the contents of our own local cache, we requested the file again, and it took only two hours and 20 minutes. Clearly, our "neighborhood" nodes had been caching a good deal of it while we downloaded it the first time. That behavior is by design, and it's nothing to be concerned about. The difference in download times between files never downloaded before and ones cached nearby is not revealing, because anyone else nearby might have initiated the request.

However, it is quite easy to distinguish between a large file cached in nearby nodes and one cached locally. And that is a very big deal.

As we noted earlier, a large splitfile will be cached locally to enable quick recovery from download interruptions. The problem is, the entire file will be cached. This means that, when a file is downloaded once, so long as the local cache remains intact, it can be reconstructed wholly from the local cache in minutes, even when the computer is disconnected from the internet. And this holds even when the browser cache is eliminated as a factor.

We tested this by downloading the same 50 MB file and removing it from our ~/freenet-downloads directory, while leaving the local Freenet cache intact. On our second attempt, it "downloaded" in one minute, nine seconds.

We ran the test again after disconnecting our computer from the internet, with the Freenet application still running, and it "downloaded" in one minute, fifty seconds.

So, it took six hours initially; two hours, twenty minutes with neighboring nodes caching it thanks to our request; and less than two minutes with our local cache intact, even when disconnected from the net.

The difference in download time between a splitfile cached locally (seconds) and one cached nearby (hours) is so great that we can safely dismiss the possibility that any part of it is coming from nearby nodes, even under the best possible network conditions. It's absolutely clear that the entire file is being rebuilt from the local cache. Forensically speaking, that information is golden.

The attack

Exploiting that information would be trivial. Only a bit of statistical data, of the sort that any government agency in the world could easily afford to obtain, will be needed.

Here's what we need to know: how many chunks of a splitfile will appear on a node that only relays file requests after x amount of uptime. That's it. We already know that for nodes requesting a splitfile, the answer is 100% of the chunks in the amount of uptime needed to fetch them. By running several nodes and observing them, we can easily determine how long it will take to cache, by relaying alone, an entire file of x size.

Since Freenet logs uptime, a forensic examiner can easily learn how long your node has been alive, even if there have been interruptions. So it is quite possible to estimate how many intact files, and of what size, your node ought to have cached without your participation. If the examiner finds many more files, or many larger files, than predicted, and they are illegal, you are in trouble.

Using a tool called FUQID, which queues Freenet file requests, one could easily run a list of forbidden CHKs against a disk image. If the number/size of whole files containing naughty stuff is significantly higher than predicted by your node's uptime, you are in trouble.

A forensic attack can be made more damning if the examiner has statistical information about the density of certain types of files on the network overall, which, again, running several test nodes will reveal. If the density of intact naughty files in your cache doesn't mimic within reasonable tolerances the density of such files on the network, you are in trouble.

You might be smart enough to disable your browser cache and its downloads history, and smart enough to wipe properly or encrypt dangerous files you've downloaded, but your Freenet cache, over which you have little control, will still tell on you.

The fix

We ran these observations by Freenet founder Ian Clarke. He agreed that the caching behavior does reveal far too many clues. But the next major revision is expected to eliminate the problem. Sometime later this year, it is hoped, the Freeenet developers will release a version that employs premix routing.

According to current plans, requests will be relayed through at least three nodes before any caching is performed. The three nodes nearest the one making the request will therefore not be able to determine what has been requested. The requesting node will cache downloads temporarily, although the mechanism by which locally cached data will be purged or randomized once the download is complete is not clear to us. Perhaps it has yet to be established.

In a forthcoming article, we will consider Freenet more generally, and offer suggestions for using it with greater safety, such as it is. ®

Update Due to an ambiguous comment in e-mail exchanged between The Register and Freenet founder Ian Clarke, we reported above that the forthcoming premix routing version of Freenet will temporarily cache splitfiles locally. According to Clarke, no premix nodes (including the requesting node) will cache any content.

Combat fraud and increase customer satisfaction

More from The Register

next story
Parent gabfest Mumsnet hit by SSL bug: My heart bleeds, grins hacker
Natter-board tells middle-class Britain to purée its passwords
Samsung Galaxy S5 fingerprint scanner hacked in just 4 DAYS
Sammy's newbie cooked slower than iPhone, also costs more to build
Obama allows NSA to exploit 0-days: report
If the spooks say they need it, they get it
Web data BLEEDOUT: Users to feel the pain as Heartbleed bug revealed
Vendors and ISPs have work to do updating firmware - if it's possible to fix this
Snowden-inspired crypto-email service Lavaboom launches
German service pays tribute to Lavabit
One year on: diplomatic fail as Chinese APT gangs get back to work
Mandiant says past 12 months shows Beijing won't call off its hackers
Call of Duty 'fragged using OpenSSL's Heartbleed exploit'
So it begins ... or maybe not, says one analyst
prev story


Designing a defence for mobile apps
In this whitepaper learn the various considerations for defending mobile applications; from the mobile application architecture itself to the myriad testing technologies needed to properly assess mobile applications risk.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.