Stay ahead of Web 2.0 worms
XSS marks the spot
Posted in Security, 7th January 2008 21:49 GMT
Free whitepaper – Optimizing the data center for cost and efficiency
Canonicalize: Attackers will use tricks to attempt to bypass your validation. By using various kinds of encoding, either alone or in combination, they hide code inside data. You have to decode this data to its simplest form - "canonicalize" it - before validating, or you may allow code to reach some backend system that decodes it and enables the attack. But canonicalizing is easier said than done. Take a string that's been doubly encoded using multiple different (possibly unknown) encoding schemes, add any number of backend parsers, consider the browser's permissive parsing, and you have a real mess - as seen here:
?input=test%2522%2Bonblur%253D%26quotalert&%23x28document.cookie%2529
Encode output: Encoding isn't all bad. You can use encoding to tell the browser that data shouldn't be run as code. "Whitelist output encoding" is simply replacing everything except a small list of safe characters (such as alphanumerics) with HTML entities before sending to the browser. The browser will render these entities instead of interpreting them, which defuses XSS attacks. Except that it doesn't always work. First, many frameworks and libraries use "blacklist" encoding where only a very small set of characters is encoded. Also, there are some places in HTML, such as href and src attributes, which allow HTML entities to be executed. So the following insane URL is actually executed by browsers:
![]()
Assign character set: Even if you're doing great input validation and output encoding, attackers may try to trick the browser into using the wrong character set to interpret the data. An innocuous string in one character set may be interpreted as a dangerous string in a different character set. Many browsers will attempt to guess the character set if one is not specified. So, the harmless-looking string +ADw- will be interpreted as a < character if the browser decides to use UTF-7. This is why you should always be careful to set a sane character set like UTF-8, as seen here:
Content-Type: text/html; charset=UTF-8
Free whitepaper – Power distribution systems for the Dell PowerEdge M1000e Modular Server Enclosure

Analyst Keynote: The Register Agile Data Center Summit
Analyst Keynote: The Register Agile Data Center Summit
Enabling the Agile Data Center
Breaching Fort Apache.org - What went wrong?
Snow Leopard security - The good, the bad and the missing
US Dems fill inboxes with 419 scams
BlockMaster SafeStick hardware-encrypted USB drive