Stay ahead of Web 2.0 worms
XSS marks the spot
Think you've protected your web applications from cross-site scripting (XSS) vulnerabilities? The odds are against you. Roughly 90 per cent of web applications have this problem, and it's getting worse as web applications and web services share more and more data.
Many frameworks and libraries are encoding, decoding, and re-encoding with all kinds of schemes and sending data through new protocols. Ajax and other "rich" applications are complicating this situation.
XSS happens any time your application uses input from an HTTP request directly in HTML output. This covers everything in the HTTP request, including the query string, form fields, hidden fields, pull-down menus, check boxes, cookies, and headers. And, it doesn't matter if you immediately send the input back to the user who sent it - you get something called "Reflected XSS" - or store it for a while and send it later to someone else - that's "Stored XSS".
Fortunately, there are steps application developers can take to protect applications:
Validate input: One way to keep code out of user input is "whitelist input validation." All you have to do is verify that each input matches a strict definition of what you expect, like a tight regular expression. Attempting to take a shortcut and apply a global filter for attacks is known as a "blacklist" approach and never works. Unfortunately, even the tightest input validation can't completely defeat XSS, as some input fields require the same characters that are significant in HTML - a you can see here:
?name=O'Connell&comment=I "like" your website; it's great!
Canonicalize: Attackers will use tricks to attempt to bypass your validation. By using various kinds of encoding, either alone or in combination, they hide code inside data. You have to decode this data to its simplest form - "canonicalize" it - before validating, or you may allow code to reach some backend system that decodes it and enables the attack. But canonicalizing is easier said than done. Take a string that's been doubly encoded using multiple different (possibly unknown) encoding schemes, add any number of backend parsers, consider the browser's permissive parsing, and you have a real mess - as seen here:
Encode output: Encoding isn't all bad. You can use encoding to tell the browser that data shouldn't be run as code. "Whitelist output encoding" is simply replacing everything except a small list of safe characters (such as alphanumerics) with HTML entities before sending to the browser. The browser will render these entities instead of interpreting them, which defuses XSS attacks. Except that it doesn't always work. First, many frameworks and libraries use "blacklist" encoding where only a very small set of characters is encoded. Also, there are some places in HTML, such as href and src attributes, which allow HTML entities to be executed. So the following insane URL is actually executed by browsers:
Assign character set: Even if you're doing great input validation and output encoding, attackers may try to trick the browser into using the wrong character set to interpret the data. An innocuous string in one character set may be interpreted as a dangerous string in a different character set. Many browsers will attempt to guess the character set if one is not specified. So, the harmless-looking string +ADw- will be interpreted as a < character if the browser decides to use UTF-7. This is why you should always be careful to set a sane character set like UTF-8, as seen here:
Content-Type: text/html; charset=UTF-8
As you can see from my examples you can - if you're careful - think through all the inputs to your application and prevent code from getting mixed into your data. But for real protection, you'll need to get organized and put a strategy in place.
You'll need to define some organization-wide security standards that cover XSS. You'll also need to make sure all developers know how to recognize XSS problems and implement protections. Don't reinvent your controls in every application. You'll want to adopt standard controls for canonicalization, validation, and encoding across your organization. The Enterprise Security API  from the organization I work with, the Open Web Application Security Project (OWASP), is one attempt to define a set of interfaces and a reference implementation.
You'll also need a good testing program, with both automated scanning and manual reviews. Automated tools can find simple XSS problems, but only manual code review and penetration testing will be able to diagnose the complex problems. Only manual review can verify that the standard controls work as intended and that developers are using them properly.
XSS isn't going to go away anytime soon - in fact the problem's going to become worse thanks to AJAX, web services and Web 2.0. The key to containing the problem, though, is to act tactically and strategically whilst building web applications.®