Back to basics for SQL Server 2008
Project Watch: Microsoft 2008 When I asked: "How do we convert more than 12,000 location items - by hand?" we had almost completed the process as part of our move to Microsoft's up-coming SQL Server 2008. The question was, in fact, rhetorical. Nevertheless, we received a lot of advice and suggestions from Reg Dev readers. This, for example, from AlanGriffiths:
"Almost (sic) address validation software should get you most of the way, the postal address file the rest. (And there are plenty of companies that will do the job for you at a reasonable price....)"
This was a complex problem. The bulk of the original data was easy to convert by automated process, so we did. Post-coded data, within reason, is straightforward. The problem lay, as it usually does, with the exceptions. Location data included:
- Streatley Hill, Berkshire
- Clayhithe, Cambridgeshire
- Yalta, Crimea
- Causey Pike Gill, Cumberland
There was simply no list of these place names with their co-ordinates, so we ultimately solved the problem by throwing human intelligence at it. One of our intelligent human converters explains how she worked:
"Online gazetteers were very helpful, especially the Ordnance Survey one, as were those for Welsh and Scottish place names. Where places remained elusive, searching the web often provided clues in such varied places as a list of repairs to railway bridges, hill walkers' blogs and even an illustrated mythical story. With the location pinned down, the latitude and longitude could be read from MapPoint or Google Earth, both of which use the all-important WGS84 datum. At the time of writing, some still elude us - like Old Park Pool on Anglesey and Pleasby Wood in Nottinghamshire."
If anyone is familiar with either Old Park Pool or Pleasby Wood, please do let me know.
Happily this is a one-time conversion, and subsequent data is likely to be algorithmically convertible.
Another part picked on last time by readers was the fact that, during our move to SQL Server 2008, we were using text indicators (N,S,E,W) rather than using signed decimals for the spatial data.
However there is an important distinction between data collection and storage. Tools like MapPoint return values with N,S,E,W notation, so it was easier for our human convertors if we collected the data in that format. The subsequent conversion to the signed format that SQL Server requires is, of course, trivial:
SELECT foo, (CASE WHEN LatNS = 'S' THEN -1 ELSE 1 END) * Lat AS Latitude, (CASE WHEN LongEW = 'W' THEN -1 ELSE 1 END) * Long AS Longitude FROM penguin;
Many people felt that we should write our own spatial data types, for example good old Anonymous Coward felt moved to say:
"I am quite frankly incredulous that you have to wait for a new OS and new database design to do this, why not just define a special data type: latitude, longitude, date. Any competent final year computing science graduate could do as much."
We can clearly store spatial data as a pairs of signed values -1.222, 53.327, so why not roll our own data types? One answer is that spatial data isn't just about storing points in isolation. We store it so that we can answer questions like: "Are these points inside this polygon?" We'd need to store the polygon and the obvious place to put it is inside the data type that inevitably makes the internal structure of the data type more complex. This internal structure, combined with a complex model of the Earth's shape, makes indexing essential but non-trivial.
Another answer lies in the functions/methods associated with the data type. Most engines that support spatial data types provide a wide range of these - something in the order of 70. They provide the facility to, for example, translate from Well Known Text (WKT), return the distance between two points, return true if a point is inside a polygon and so on.
So, can you write your own, spatial data types complete with non-spherical modeling, indexing and an appropriate set of methods? Of course. You can write your own RDBMS as well... it's just a matter of how much time you have and what you charge per hour for your time. For most people, us included, it is not a realistic option unless the spatial requirements are relatively simple.
Which brings us very neatly to the next question: how do we import data that is stored as signed values into the spatial data type?
In SQL Server 2008 the answer is that we use one of those built-in methods.
UPDATE tblSpatialData SET SpatialLocation = geography::STGeomFromText('POINT (' + CONVERT(nvarchar, Latitude) + ' ' + CONVERT(nvarchar, Longitude) + ')', 4326) WHERE Latitude IS NOT NULL AND Longitude IS NOT NULL
So the next (rhetorical) question is, how does it work? What is that 4326 doing?®
Follow Register Developer regular Mark Whitehorn next time on Project Watch: Microsoft 2008 as he continues to roll out a spanking-new 1TB database for several thousand users on Microsoft’s SQL Server 2008, Visual Studio 2008 and Windows Server 2008.