SQL Server 2008 - from semi-relational to sublime
Inside Microsoft's R2 preview
Master Data Services
Following the acquisition of Stratature, SQL Server 2008 R2 introduces Master Data Services (MDS), which allows the consolidation of data from multiple sources. Imagine, for example, three source systems holding customer lists but none of them has a complete set. MDS makes it much easier to consolidate the three lists into one complete, accurate list of customers.
The complete list can, of course, be fed into a data warehouse but the main reason for using MDS is that it allows the data to be synchronised across the three sources so that each has the same version of the data. If it sounds like your thing, it's there for experimentation in the CTP.
Part of the previous CTP - but with further enhancements in this release - is StreamInsight, Microsoft's technology for handling continuously streaming data. The number of organizations dealing with this type of data is increasing as the use of web logs, RFID tags, telemetric, and other streaming data sources increases. The stock market is one scenario where the technology is being used while another is the monitoring of wind-farm data.
StreamInsight is a data engine that sits in front of SQL Server and can handle incoming transactions at the rate of 15,000 per second. Transactions can be averaged and/or aggregated before being written to SQL Server, say, every five seconds, as a much smaller number of rows.
Complex conditions can be applied that control which data is written to SQL Server. Developers can create Complex Event Processing solutions that monitor and mine incoming data to derive information from the patterns within that data. Existing tools such as Microsoft's Visual Studio and .NET can be used for application development and included with the platform is a range of management features, including a management interface, debugging and diagnostic tools.
Not part of the CTP are two new editions of SQL Server - Datacenter and Parallel Data Warehouse, which it's worth taking a quick look at in relation to Enterprise.
Datacenter will offer very high scalability and is aimed obviously at heavily loaded applications. It will feature virtualization, consolidation, and infrastructure management options. Also, more than eight and up to 256 logical processors, and whatever the operating system can handle in the way of memory.
The Parallel Data Warehouse edition - formerly known as Madison - is a marriage of SQL Server with the technology that underpinned the DATAllegro appliance. This essentially transforms SQL Server into a massively parallel processing (MPP) relational engine. You wouldn't want to run the finance application for a small to medium sized enterprise on this because of the cost and complexity involved. However, if your SQL Server-based data warehouse application has been suffering from poor performance, this should be the version for you once it's released.
Loads and nodes
This solution will comprise multiple physical nodes, each with its own storage, CPU, and memory. Each node will run a SQL Server instance, a configuration that Microsoft is calling Ultra Shared Nothing. Performance is maintained by balancing the load across the nodes and redundancy by mirroring all server and storage components.
The Warehouse edition works with SSRS, SQL Server Integration Services, and SQL Server Analysis Services for integration, reporting, and analysis. It supports star join queries and change data capture, both desirable warehouse features, and scales to a capacity in the tens or hundreds of terabytes. Warehouse edition will only come on hardware from partners Bull, Dell, EMC2, Hewlett-Packard, and IBM.
SQL Server has grown up over the years. From the flaky, semi-relational engine that was SQL Server 6.5 it has progressed to a world-class engine that incorporates excellent BI tools and is increasingly wrapped around with extra features such as MPP, MDM, and reporting services that put it in a class of its own. Not only that, the CTP is finally easy to install. If you are interested in the processing of data, go for it. ®