Feeds

Secrets of an ad broker: NoSQL, millisecond auctions and FLASH ARRAYS

Ad-slinger powers up with Big Data flash rocket fuel

Protecting against web application threats using SSL

Typically a web page featuring online ads is built with space set aside for the ad. A user clicks on the page's URL and it's presented to that user in real time and the spaces are populated with pre-built and stored ads. So far so ordinary, only with online ad-broker Tapad the ads are not pre-stored at all.

What happens is that a Tapad customer's web page is requested by a user and Tapad flogs a paid-for ad targeted at that specific user at that specific time by an auction process among a set of potential advertisers. The auction takes place in real time - actually in 100 milliseconds or less - and the ad is displayed as the the rest of the web page is being loaded.

That way an ad for a fast coupe is presented to a 30- to 40-year-old male with a history of interest in fast cars, a location near the car dealership, and a disposable income in the right range, while a teenage college student with an interest in dancing, but who has not seen a supplier's dance wear ad recently, gets a dance wear ad, and this happens up to 150,000 times a second. How the hell does this get done?

Tapad's system

What Tapad's computer system does is analyse the incoming device from which the web page is requested. Once it identifies whether the device is a smartphone, tablet or desktop, it searches for its geographic location, browsing patterns, and click-through history of the user, and accumulates metadata about that entity. Of course the amount it can glean depends on the user's settings....

An awful lot of data overall is being handled by a simple key:value NoSQL database to perform these tasks. The database is being used for relatively simple per-webpage analytics and also for the subsequent ad-bidding auction. Once the web page requester's metadata is gathered, it's matched against criteria for ad display on the requested webpage.

Selling your eyeballs to the highest bidder

A group of potential advertisers will have said that, for example, when the web page is requested by users with certain demographic characteristics, then it would be interested in bidding for its ad to be placed on that page. Tapad's system then contacts these potential advertisers' systems and says: "I have a potential spot for you with this requester's metadata. Do you want to make a bid?" The potential advertisers then match the potential spot to their needs.

If they have ad inventory and value that demographic then they make a bid. Tapad receives incoming bids, picks the highest one, and serves that bidder's ad to the webpage to be eyeballed by the user: the well-off early middle-age male sees the ad for the fast coupe, the female college student interested in dancing gets a dance-wear ad and, I don't know, a mature middle-aged couple with pets receive a pet-grooming aid advertisement.

This process is highly complex, dead reliable, extremely fast, great for web page ad sellers and good for advertisers too, as they get personalised ads delivered directly to hot prospects rather than paying for spray-and-pray billboard-type advertising.

Tapad co-founder and chief technology officer Dag Liodden says a relational database just couldn't keep up with the simultaneous analytics and transaction work required. It would need a hugely powerful and costly infrastructure behind it and would have been functional overkill. That's why a key:value (or distributed hash table) NoSQL database is used. But that's not all.

In a Wikibon Peer Incite session on Big Data, Liodden says that the system couldn't work if the database data was held on spinning disk. Access times to data would simply be too slow. It would work if the data was stored in memory, but then the system cost would outweigh its worth and there was the risk that DRAM would fill up and disk would have to be used as a fallback, slowing things down to the point users would think web pages were being displayed too slowly and clicking off. Also, as Tapad's system scaled, more servers with lots of DRAM would have to be added ... increasing the cost to unrealistic levels. An in-DRAM database was no-go.

Aerospike flash-based NoSQL

Tapad settled on flash storage instead, as it was both fast enough and affordable enough. In fact, the ad-serving firm bought an Aerospike NoSQL database running on SSDs with indices held in DRAM. Aerospike was previously known as Citrusleaf and is headquartered in Mountain View. The company claims that, with its in-SSD database, "it can get 500,000 transactions answered per second on a $2,000 server or 1 million transactions per second on a $5,000 machine."

Tapad runs a 5-node Aerospike cluster with each server node having six 120GB SSDs. Reads and writes are spread across these SSDs and not one has had to be replaced in 18 months of operation. The operational stats are impressive. It manages more than 150 billion ad impressions a month sent to 2 billion devices, with up to 50,000 queries/sec per server node, "reaching 150,000 ads per second during peak activity." The total data volume is 3.6TB and still growing. The system spreads work across nodes by monitoring their latency.

Tapad customers include Dell, Evidon and three of the top four telecom providers. They get better ad-display conversion rates through the precise targeting provided by Tapad, meaning better sales. This kind of combined fast, real-time analytics and transaction-based system is applicable to any application involving multiple input data sets that can be generated in real-time, and includes transaction decision-support for the supply of millions, billions even, of items - such as an amount of electricity for a time interval, a bid for shares or other financial assets, and any online offer for things such as airline and railway seats, mobile phone contracts, a car or house insurance.

You may not like the applications, the idea that advertisers and online suppliers have a better idea of who you are and your behaviour, but the storage and processing systems behind Tapad's system are certainly impressive.

Disk drive arrays and RDBMS' are massive obstacles to the provision of such applications. Once your app uses data in flash it's very unlikely you'll want to go back to disk. Tapad certainly won't. ®

Choosing a cloud hosting partner with confidence

More from The Register

next story
Wanna keep your data for 1,000 YEARS? No? Hard luck, HDS wants you to anyway
Combine Blu-ray and M-DISC and you get this monster
Google+ GOING, GOING ... ? Newbie Gmailers no longer forced into mandatory ID slurp
Mountain View distances itself from lame 'network thingy'
US boffins demo 'twisted radio' mux
OAM takes wireless signals to 32 Gbps
Apple flops out 2FA for iCloud in bid to stop future nude selfie leaks
Millions of 4chan users howl with laughter as Cupertino slams stable door
Students playing with impressive racks? Yes, it's cluster comp time
The most comprehensive coverage the world has ever seen. Ever
Run little spreadsheet, run! IBM's Watson is coming to gobble you up
Big Blue's big super's big appetite for big data in big clouds for big analytics
Seagate's triple-headed Cerberus could SAVE the DISK WORLD
... and possibly bring us even more HAMR time. Yay!
prev story

Whitepapers

Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
The next step in data security
With recent increased privacy concerns and computers becoming more powerful, the chance of hackers being able to crack smaller-sized RSA keys increases.