Master Data Services
Following the acquisition of Stratature, SQL Server 2008 R2 introduces Master Data Services (MDS), which allows the consolidation of data from multiple sources. Imagine, for example, three source systems holding customer lists but none of them has a complete set. MDS makes it much easier to consolidate the three lists into one complete, accurate list of customers.
The complete list can, of course, be fed into a data warehouse but the main reason for using MDS is that it allows the data to be synchronised across the three sources so that each has the same version of the data. If it sounds like your thing, it's there for experimentation in the CTP.
Part of the previous CTP - but with further enhancements in this release - is StreamInsight, Microsoft's technology for handling continuously streaming data. The number of organizations dealing with this type of data is increasing as the use of web logs, RFID tags, telemetric, and other streaming data sources increases. The stock market is one scenario where the technology is being used while another is the monitoring of wind-farm data.
StreamInsight is a data engine that sits in front of SQL Server and can handle incoming transactions at the rate of 15,000 per second. Transactions can be averaged and/or aggregated before being written to SQL Server, say, every five seconds, as a much smaller number of rows.
Complex conditions can be applied that control which data is written to SQL Server. Developers can create Complex Event Processing solutions that monitor and mine incoming data to derive information from the patterns within that data. Existing tools such as Microsoft's Visual Studio and .NET can be used for application development and included with the platform is a range of management features, including a management interface, debugging and diagnostic tools.
Not part of the CTP are two new editions of SQL Server - Datacenter and Parallel Data Warehouse, which it's worth taking a quick look at in relation to Enterprise.
Datacenter will offer very high scalability and is aimed obviously at heavily loaded applications. It will feature virtualization, consolidation, and infrastructure management options. Also, more than eight and up to 256 logical processors, and whatever the operating system can handle in the way of memory.
The Parallel Data Warehouse edition - formerly known as Madison - is a marriage of SQL Server with the technology that underpinned the DATAllegro appliance. This essentially transforms SQL Server into a massively parallel processing (MPP) relational engine. You wouldn't want to run the finance application for a small to medium sized enterprise on this because of the cost and complexity involved. However, if your SQL Server-based data warehouse application has been suffering from poor performance, this should be the version for you once it's released.
Loads and nodes
This solution will comprise multiple physical nodes, each with its own storage, CPU, and memory. Each node will run a SQL Server instance, a configuration that Microsoft is calling Ultra Shared Nothing. Performance is maintained by balancing the load across the nodes and redundancy by mirroring all server and storage components.
The Warehouse edition works with SSRS, SQL Server Integration Services, and SQL Server Analysis Services for integration, reporting, and analysis. It supports star join queries and change data capture, both desirable warehouse features, and scales to a capacity in the tens or hundreds of terabytes. Warehouse edition will only come on hardware from partners Bull, Dell, EMC2, Hewlett-Packard, and IBM.
SQL Server has grown up over the years. From the flaky, semi-relational engine that was SQL Server 6.5 it has progressed to a world-class engine that incorporates excellent BI tools and is increasingly wrapped around with extra features such as MPP, MDM, and reporting services that put it in a class of its own. Not only that, the CTP is finally easy to install. If you are interested in the processing of data, go for it. ®
Your post is a great critique of SQL Server and I honestly stand behind you on everything you say - I must have encountered most of the things you pick out at the most inconvenient moments myself.
However, will that stop me betting my mortage (in a very real sense) on the continuing success of the Microsoft product? Not a chance. Long live the marketing dollar.
"The last time I messed with MS-SQL it went into a "log-file full" mode and eventually refused to work."
Given that there are a lot of MS SQL DBs out there that work fine, Occam's Razor tells me that the fault isn't with MS SQL, there. It's with you.
This is a content free marketing rehash of no use to the serious practitioner.
SQLServer still has a clinically dead optimizer, aimed at trivial data base sizes. It relies blindly on the internal statistics, despite the fact that statistics collection is based on a terminally broken algorithm conceived by someone (probably decades ago) and never adjusted (and not user adjustable). SQLServer will happily decide to read 1 billion rows (or a trillion) rather than do a direct index key lookup to retrieve a trivial result set when all values required are specified in the predicate. It takes a seriously warped engineering attitude to think that this is appropriate under any circumstances.
SQLServer is also a failure with respect to adherence to standards, never mind what might be considered useful. The inability to describe complex contstraints declaritively springs to mind and the necessity to drop computed columns in order to change underlying functions is another example of what a dirt cheap implementation SQLServer actually is. Domains I hear you ask? Nope, never heard of them, and don't try using user defined types, or you will discover that they don't really work, and woe betide the poor fool who might want to alter a column so constrained.
The number of View declarations and complex select statements that cannot be correctly resolved, or simply return the wrong answer is horrifying. Nothing has improved in the SQLServer2000 to SQLServer2008sp1 timeframe in this area.
I could fill up a book with all the things that are fundamentally broken with SQLServer. However, MS is not and has rarely been interested in building a solid underlying SQL engine. They are interested in seducing incompetent journalist, fanbois and diverse lackys with an alphabet soup of this months flavoured functionality. And today I read yet another version of how successful MS are at actually getting their message out.
Now, I know that there are some great engineers in the SQLServer team, like whoever did the page compression code, which is an outstanding piece of engineering with which I have yet to find fault. But the bottom line is, that MS is still concerned with the alphabet soup, flavour of the month than core engine functionality - the stuff that makes DBA's lives somewhat tolerable is sadly left far behind. Gee, if only the basic functionality did what it says on the box. Don't get me started on replication - another disaster on wheels that MS still cannot get right. SSMS is also seriously creaky and did not improve in the last iteration - actually, now I think of it it got provably worse as they removed functionality that worked, and replaced it with functionality that didn't work. Multiple bugfixes and the issue remains - but hey, no one uses schemas do they? So why bother fixing it?
Enough of this rant. Whatever else should be obvious is that the issues raised above are probably not even on the radar of MS, the MS marketing machine, the reviewer or the rest of the SQLServer fanbois crowd, almost none of which could manage a piss-up in a brewery if the chatter on the internet by self styled gurus and experts is anything to guage by. And that fact really says everything about what MS is as a corporation and has always been.
SQLServer2008R2 is just the next iteration of more new features at the cost of actually having base functionality that actually does what it is supposed to.