Original URL: http://www.theregister.co.uk/2012/12/18/tapad_aerospike/
Secrets of an ad broker: NoSQL, millisecond auctions and FLASH ARRAYS
Ad-slinger powers up with Big Data flash rocket fuel
Typically a web page featuring online ads is built with space set aside for the ad. A user clicks on the page's URL and it's presented to that user in real time and the spaces are populated with pre-built and stored ads. So far so ordinary, only with online ad-broker Tapad the ads are not pre-stored at all.
What happens is that a Tapad customer's web page is requested by a user and Tapad flogs a paid-for ad targeted at that specific user at that specific time by an auction process among a set of potential advertisers. The auction takes place in real time - actually in 100 milliseconds or less - and the ad is displayed as the the rest of the web page is being loaded.
That way an ad for a fast coupe is presented to a 30- to 40-year-old male with a history of interest in fast cars, a location near the car dealership, and a disposable income in the right range, while a teenage college student with an interest in dancing, but who has not seen a supplier's dance wear ad recently, gets a dance wear ad, and this happens up to 150,000 times a second. How the hell does this get done?
What Tapad's computer system does is analyse the incoming device from which the web page is requested. Once it identifies whether the device is a smartphone, tablet or desktop, it searches for its geographic location, browsing patterns, and click-through history of the user, and accumulates metadata about that entity. Of course the amount it can glean depends on the user's settings....
An awful lot of data overall is being handled by a simple key:value NoSQL database to perform these tasks. The database is being used for relatively simple per-webpage analytics and also for the subsequent ad-bidding auction. Once the web page requester's metadata is gathered, it's matched against criteria for ad display on the requested webpage.
Selling your eyeballs to the highest bidder
A group of potential advertisers will have said that, for example, when the web page is requested by users with certain demographic characteristics, then it would be interested in bidding for its ad to be placed on that page. Tapad's system then contacts these potential advertisers' systems and says: "I have a potential spot for you with this requester's metadata. Do you want to make a bid?" The potential advertisers then match the potential spot to their needs.
If they have ad inventory and value that demographic then they make a bid. Tapad receives incoming bids, picks the highest one, and serves that bidder's ad to the webpage to be eyeballed by the user: the well-off early middle-age male sees the ad for the fast coupe, the female college student interested in dancing gets a dance-wear ad and, I don't know, a mature middle-aged couple with pets receive a pet-grooming aid advertisement.
This process is highly complex, dead reliable, extremely fast, great for web page ad sellers and good for advertisers too, as they get personalised ads delivered directly to hot prospects rather than paying for spray-and-pray billboard-type advertising.
Tapad co-founder and chief technology officer Dag Liodden says a relational database just couldn't keep up with the simultaneous analytics and transaction work required. It would need a hugely powerful and costly infrastructure behind it and would have been functional overkill. That's why a key:value (or distributed hash table) NoSQL database is used. But that's not all.
In a Wikibon Peer Incite session on Big Data, Liodden says that the system couldn't work if the database data was held on spinning disk. Access times to data would simply be too slow. It would work if the data was stored in memory, but then the system cost would outweigh its worth and there was the risk that DRAM would fill up and disk would have to be used as a fallback, slowing things down to the point users would think web pages were being displayed too slowly and clicking off. Also, as Tapad's system scaled, more servers with lots of DRAM would have to be added ... increasing the cost to unrealistic levels. An in-DRAM database was no-go.
Aerospike flash-based NoSQL
Tapad settled on flash storage instead, as it was both fast enough and affordable enough. In fact, the ad-serving firm bought an Aerospike NoSQL database running on SSDs with indices held in DRAM. Aerospike was previously known as Citrusleaf and is headquartered in Mountain View. The company claims that, with its in-SSD database, "it can get 500,000 transactions answered per second on a $2,000 server or 1 million transactions per second on a $5,000 machine."
Tapad runs a 5-node Aerospike cluster with each server node having six 120GB SSDs. Reads and writes are spread across these SSDs and not one has had to be replaced in 18 months of operation. The operational stats are impressive. It manages more than 150 billion ad impressions a month sent to 2 billion devices, with up to 50,000 queries/sec per server node, "reaching 150,000 ads per second during peak activity." The total data volume is 3.6TB and still growing. The system spreads work across nodes by monitoring their latency.
Tapad customers include Dell, Evidon and three of the top four telecom providers. They get better ad-display conversion rates through the precise targeting provided by Tapad, meaning better sales. This kind of combined fast, real-time analytics and transaction-based system is applicable to any application involving multiple input data sets that can be generated in real-time, and includes transaction decision-support for the supply of millions, billions even, of items - such as an amount of electricity for a time interval, a bid for shares or other financial assets, and any online offer for things such as airline and railway seats, mobile phone contracts, a car or house insurance.
You may not like the applications, the idea that advertisers and online suppliers have a better idea of who you are and your behaviour, but the storage and processing systems behind Tapad's system are certainly impressive.
Disk drive arrays and RDBMS' are massive obstacles to the provision of such applications. Once your app uses data in flash it's very unlikely you'll want to go back to disk. Tapad certainly won't. ®