How the UK's national memory lives in a ROBOT in Kew
El Reg visits the National Archives
Keep stuffing the repositories
As for the rooms themselves, don’t be thinking Raiders of the Lost Ark. The repositories themselves are large, though not cavernous rooms: the ceilings need only be high enough to accommodate top shelves within arm's reach. Warning posters detail exactly how to protect both the nation’s history and your back when handling the bigger box files. The repositories in the older part of the complex have narrow windows to afford the bookworms access to daylight, while ledges and skirting boards are brightly painted, all part of the old Ministry of Works strategy to keep the workforce happy. Temperature and humidity are carefully managed across the repositories, to keep mould and other threats at bay.
Scroll on a roll
We saw a pile of blackened rolled-up manuscripts on a trolley, like props from a Harry Potter film, en route to or from some historical researcher. Picking a box at random off a shelf in one of the storage rooms, we found field survey books for a proposed early 20th century land tax. It never happened. WWI intervened, but the neat handwritten notes, still piled up down in Kew, give a copperplate, ground-level snapshot of England just before that particular catastrophe.
Some plan drawers yielded a hand-drawn view of the 18th century harbour at Guadaloupe. This was in French: how did this French map come into the British hands? Other drawers revealed sheet after sheet of tobacco and whisky advertising posters from the 19th century, a legacy of the Worshipful Company of Stationers’ erstwhile role in registering copyright.
Other shelves also hold regimental records, civil service papers, cabinet minutes, various PMs’ correspondence, railway plans. Anything that is needed to track the progress and development of a country and much more besides.
History, boxed and filed
Of course, you need some way of navigating this pile of history. Until the mid-1990s, this was via the vast paper-based catalogue. Even with this, tracking a soldier’s career via his paybooks and regimental records, or tracking the policy turns that drew the UK into WWI meant a researcher needed to know, almost feel their way around the collection - even if only PRO employees were allowed to enter the repositories to retrieve the documents.
If only there was a machine that could pinpoint exactly the document you need and tell you where is. Or even deliver you a copy instantly. Even faster than the microfiche machines in the reading rooms...
When the PRO opened, back in the 1970s, it had one computer: a DEC-based docket ordering system used to manage requests to retrieve an item from the archive.
As David Thomas, CIO and a 40-year veteran of the PRO explains, in the mid-1980s, a few PCs started to appear on the desks of key members of staff. However, these were not networked until some time in the mid-1990s. Around the same time, the PRO took its first steps onto the web - though there is no evidence for that, as that first site was not actually archived.
If mid-'90s surfers found this system less than interactive, things were not much better for the PRO staff administering the site. Thomas says the process for updating the site involved sending a floppy disk to the government’s Central Computer and Telecommunications Agency in Norwich, once a month.
The web operation was brought in-house in 1996 - though it has fluctuated between in-house and contracted out ever since.
The catalogue itself began to move online in 1998, the first national archive in the world to do so. Someone presumably had to key it in - a gargantuan task. The catalogue currently stands at 21 million plus entries.
Who do you think we was?
The closest thing to a big bang for the archive was the release of the 1901 census online in 2002.
The names on this census were people within living memory of 21st century Britons with access to computers and the net. The stage was set for a massive explosion in interest in genealogy - or a complete disaster. In the event it was both, with Thomas confirming the system - built by sometime defence contractor Qinetiq - was completely “overwhelmed” initially as contemporary Britons opened their shiny laptops, fired up their new internet connections, and found out just how grindingly poor their great, great grandparents were.
It got over it though. Last year, the website clocked 13 million visits from 230 countries. The UK accounted for 63.6 per cent. Interestingly, “wills” was consistently the top search term last year, except for one month when it was ousted by UFO.
Surprisingly perhaps, the website, and in fact all the Archive’s IT, is handled on-site. There are 210 servers, all Xeon-based HPs. A dozen of these are used to host a further 150 virtual servers. The site currently has 316TB of storage on tap.
The catalogue too is an inhouse development. The “new” catalogue, Discovery, was launched in 2011. It is based on Mongo DB, and subject to regular updates. At just over 100TB when first deployed in 2011, it is expected to run into PBs by 2014.
The archive itself is about to be subjected to a tsunami of data for two reasons.
Firstly, you might not have noticed, but the UK government has shifted from the Grigg report’s 30-year rule for shifting documents from Whitehall closets to public archives, to a 20-year rule. This is why documents covering the Falklands War began going public last year. Serving politicians and civil servants’ early career screw-ups are now likely to come into public view mid-career.
Sponsored: Hyper-scale data management