Feeds

What your database needs is a good thermometer

Data blows hot and cold

Build a business case: developing custom apps

In the very early days of databases, when they contained relatively small quantities of information, the distinction between frequencies of access was pretty immaterial.

Now that we collect and store data in massive quantities, however, the distinction is becoming increasingly important. And apposite terms have evolved to describe this facet of data: "Hot" data is accessed frequently, and "cold" is accessed less frequently.

In almost any business system this year's data will be considered hot, because it's accessed frequently and at the transactional level. People, for example, will want to know things like "What did we sell to Customer X last week?" and "Have we fulfilled Customer Y's order placed yesterday?"

Much of the data held in a warehouse is cold - the bulk of its content is likely to be the transactional data from, say, the previous four years - with a much smaller quantity of very hot data from the current year.

Broadly speaking, there are three classes of disk that serve us, which you can see in the table below. The price of one of them - Solid State Drives - is, as I wrote here, plummeting. In that earlier article, I also discussed the use of SSDs in databases and, as may already be apparent, the concept of hot and cold data is particularly applicable to data warehouses.

Disk Type Power consumption Capacity Speed Cost
Slow hard disks (5K rpm) 6 – 8 watts High Slow Cheap
Fast hard disks (15K rpm) 16 watts Medium Fast Expensive
SSD (Solid State Disks) 150mW Low Blistering Painfully expensive

But surely any DBA with any sense will already be partitioning tables, putting frequently accessed data onto fast media and less frequently accessed data onto slow and so on. In fact, we've been handling data in this way for many years, archiving older material onto tape and so on. So, why the sudden interest in data temperature?

The answer is that these two factors - the increasing diversity in the heat of the data, combined with the changing storage media - mean that the management of the data is becoming more complex. Slightly more accurately, it means that we can gain significant performance advantages if we are prepared to manage the data in more complex ways.

For example, the obvious approach is to put the cold data on slow cheap disks and, ignoring cost implications, the hot on SSDs. What may be less obvious is a counter-intuitive argument for putting some hot data on slow disks. A slow disk holding mainly cold data should, by definition, spend much of its time in an underworked state.

If you add a little hot data to the disk the chances are that when a request for it arrives, the disk won't be busy doing anything else and the hot data will be delivered rapidly and, if not piping hot, at least moderately warm. Neat.

Essentially it is a matter of bottlenecking, and the bottleneck on a slow disk for archived data may sometimes be zero. You are simply balancing probabilities to take advantage of this fact: It won't work every time but in practice it is found to improve performance rather than degrade it.

And once you have decided to put hot data on a slower disk, it makes sense to put that data onto the outer sectors of the disk because data there can often be read off the disk faster. And if you are, by now, thinking "Whoa, this may be theoretically correct but it's way too difficult to manage in practice," you're right.

By understanding that data has temperature and disks vary in performance, we can do a great deal manually but in practice the problem is that the temperature of the data changes over time and the fine-tuning is so complex that it is unlikely to be done in practice.

This is exactly the kind of situation that is far better managed under algorithmic control, which can determine the positioning of data depending on its actual usage. In fact, this way of handling hot and cold data sits extremely well with handling data as Massively Parallel Processing (MPP).

And this is already being done. Teradata, for example, is way ahead of the curve in this area: The data is monitored for temperature and automatically moved to the optimal storage location. Teradata has also been interested in SSDs for years.

Teradata's director or platform and solutions marketing for EMEA Martin Wilcox said: "Not all database engines are created equal when it comes to data access. In contrast to most other engines, Teradata has always been optimized for random I/O patterns, rather than assuming sequential access, and is therefore inherently capable of immediately exploiting all of the advantages that SSDs have to offer."

Teradata has pushed the boundaries of SSD use even further. At its conference a year ago the company demonstrated a data warehouse system running entirely on SSDs and was using it to evaluate performance and cost benefits - and the price of SSDs has done nothing but plummet since then.

Thinking about data temperature can help us to tune our databases better, and badgering your database engine supplier to think about adding it as a feature is even better. ®

Build a business case: developing custom apps

More from The Register

next story
NO MORE ALL CAPS and other pleasures of Visual Studio 14
Unpicking a packed preview that breaks down ASP.NET
KDE releases ice-cream coloured Plasma 5 just in time for summer
Melty but refreshing - popular rival to Mint's Cinnamon's still a work in progress
Leaked Windows Phone 8.1 Update specs tease details of Nokia's next mobes
New screen sizes, dual SIMs, voice over LTE, and more
Secure microkernel that uses maths to be 'bug free' goes open source
Hacker-repelling, drone-protecting code will soon be yours to tweak as you see fit
Mozilla keeps its Beard, hopes anti-gay marriage troubles are now over
Plenty on new CEO's todo list – starting with Firefox's slipping grasp
Apple: We'll unleash OS X Yosemite beta on the MASSES on 24 July
Starting today, regular fanbois will be guinea pigs, it tells Reg
HIDDEN packet sniffer spy tech in MILLIONS of iPhones, iPads – expert
Don't panic though – Apple's backdoor is not wide open to all, guru tells us
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Backing up Big Data
Solving backup challenges and “protect everything from everywhere,” as we move into the era of big data management and the adoption of BYOD.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.