Original URL: https://www.theregister.com/2008/01/07/xss_tactics_strategy/

Stay ahead of Web 2.0 worms

XSS marks the spot

By Jeff Williams

Posted in Channel, 7th January 2008 21:49 GMT

Think you've protected your web applications from cross-site scripting (XSS) vulnerabilities? The odds are against you. Roughly 90 per cent of web applications have this problem, and it's getting worse as web applications and web services share more and more data.

Many frameworks and libraries are encoding, decoding, and re-encoding with all kinds of schemes and sending data through new protocols. Ajax and other "rich" applications are complicating this situation.

XSS happens any time your application uses input from an HTTP request directly in HTML output. This covers everything in the HTTP request, including the query string, form fields, hidden fields, pull-down menus, check boxes, cookies, and headers. And, it doesn't matter if you immediately send the input back to the user who sent it - you get something called "Reflected XSS" - or store it for a while and send it later to someone else - that's "Stored XSS".

XSS is a fairly serious vulnerability. By sending just the right input, typically a few special characters like " and > and then some JavaScript, an attacker can get a script running in the context of your web page. That script can disclose your application's cookies, rewrite your HTML, perform a phishing attack, or steal data from your forms. Attackers can even install an "XSS proxy" inside the victim's browser, allowing them to control your users' browsers remotely.

Security problems like XSS are inevitable when you don't keep code and data apart - and HTML is the worst mashup of code and data of all time. There's no way good way to keep JavaScript code and HTML data separate from each other. To prevent SQL injection, you can use a parameterized query to keep the data and code separate. But there's nothing equivalent for HTML. Just about every HTML element allows JavaScript code in attributes and event handlers, even CSS:

CSS meets XSS

Fortunately, there are steps application developers can take to protect applications:

Validate input: One way to keep code out of user input is "whitelist input validation." All you have to do is verify that each input matches a strict definition of what you expect, like a tight regular expression. Attempting to take a shortcut and apply a global filter for attacks is known as a "blacklist" approach and never works. Unfortunately, even the tightest input validation can't completely defeat XSS, as some input fields require the same characters that are significant in HTML - a you can see here:

?name=O'Connell&comment=I "like" your website; it's great!

Canonicalize: Attackers will use tricks to attempt to bypass your validation. By using various kinds of encoding, either alone or in combination, they hide code inside data. You have to decode this data to its simplest form - "canonicalize" it - before validating, or you may allow code to reach some backend system that decodes it and enables the attack. But canonicalizing is easier said than done. Take a string that's been doubly encoded using multiple different (possibly unknown) encoding schemes, add any number of backend parsers, consider the browser's permissive parsing, and you have a real mess - as seen here:

?input=test%2522%2Bonblur%253D%26quotalert&amp%23x28document.cookie%2529

Encode output: Encoding isn't all bad. You can use encoding to tell the browser that data shouldn't be run as code. "Whitelist output encoding" is simply replacing everything except a small list of safe characters (such as alphanumerics) with HTML entities before sending to the browser. The browser will render these entities instead of interpreting them, which defuses XSS attacks. Except that it doesn't always work. First, many frameworks and libraries use "blacklist" encoding where only a very small set of characters is encoded. Also, there are some places in HTML, such as href and src attributes, which allow HTML entities to be executed. So the following insane URL is actually executed by browsers:

URL test code

Assign character set: Even if you're doing great input validation and output encoding, attackers may try to trick the browser into using the wrong character set to interpret the data. An innocuous string in one character set may be interpreted as a dangerous string in a different character set. Many browsers will attempt to guess the character set if one is not specified. So, the harmless-looking string +ADw- will be interpreted as a < character if the browser decides to use UTF-7. This is why you should always be careful to set a sane character set like UTF-8, as seen here:

Content-Type: text/html; charset=UTF-8

As you can see from my examples you can - if you're careful - think through all the inputs to your application and prevent code from getting mixed into your data. But for real protection, you'll need to get organized and put a strategy in place.

You'll need to define some organization-wide security standards that cover XSS. You'll also need to make sure all developers know how to recognize XSS problems and implement protections. Don't reinvent your controls in every application. You'll want to adopt standard controls for canonicalization, validation, and encoding across your organization. The Enterprise Security API from the organization I work with, the Open Web Application Security Project (OWASP), is one attempt to define a set of interfaces and a reference implementation.

You'll also need a good testing program, with both automated scanning and manual reviews. Automated tools can find simple XSS problems, but only manual code review and penetration testing will be able to diagnose the complex problems. Only manual review can verify that the standard controls work as intended and that developers are using them properly.

XSS isn't going to go away anytime soon - in fact the problem's going to become worse thanks to AJAX, web services and Web 2.0. The key to containing the problem, though, is to act tactically and strategically whilst building web applications.®

Jeff Williams is chair of the OWASP and chief executive of AspectSecurity.