Johns Hopkins and VMware forge medical records mega-cloud
The tech is ready this time
Medical and IT researchers at Johns Hopkins University, healthcare application software supplier Harris Corp, and virtualization juggernaut VMware have teamed up to create a medical imaging cloud that they hope will become the central, secure repository for US citizens and the doctors who care for them.
The desire to provide doctors in America with electronic access to medical records is as old as computing itself and sits right alongside the holy grails of the paperless office and fifth-generation programming languages. But Jim Philbin, who is co-director for the Center for Biomedical and Imaging Informatics at Johns Hopkins, tells El Reg that the combination of broadband networks, virtual desktop infrastructure, and cloudy infrastructure is making a medical imaging cloud a technical and economic possibility. Philbin is also CTO at Peake Healthcare Innovations – a partnership between the university hospital system and Harris Corp that was formed two years ago to tackle the medical imaging issue – and he says that Peake intends to build and operate just such a cloud.
Harris Corporation is a $5bn company with 16,000 employees – 7,000 of them are scientists and engineers – and is no slouch when it comes to either high tech or medical systems. The company builds radio, broadcast, and satellite communications systems for militaries and public agencies; has a healthcare application and systems software business that has been around forever; and also provides IT services. But rather than take on the medical imaging cloud all by its lonesome, Harris decided to partner with Johns Hopkins and use the university hospital system, which operates six hospitals around Baltimore, Maryland, as a testbed for the cloud before rolling it out nationally. Harris has the tech and Hopkins has the clinical and medical research expertise.
VMware and Intel are also partners in the effort.
Given the sensitive nature of the data and the rigorous requirements of the Health Insurance Portability and Accountability Act (HIPAA) passed in 1996 by the US Congress, CT scans, X-rays, and MRI scans are not the kinds of documents that doctors or patients are comfortable plunking out there on public clouds like Amazon's EC2. (Although that doesn't mean that Amazon won't offer a more secure cloud at some point in the future, as it has done for federal government agencies in a portion of its newest data center in Oregon, dubbed GovCloud. There's no reason there can't be a MedCloud.)
The Peake medical records cloud, called PeakeSecure, is back-ended by x86 servers, as you might imagine, which are running VMware's ESXi hypervisor and vSphere server virtualization management stack. Philbin says that the test cloud (which is running in a co-location center in the Washington DC area) is small enough right now that Peake doesn't have to resort to using the vCloud Director cloud controller, but as the cloud builds up that will be necessary.
The PeakeSecure cloud uses standard x86 servers and rather than use RAID 5 or 6 storage on the server nodes, Peake has chosen Caringo's CAStor content-addressed storage. Because RAID striping across disks gives a huge performance hit when a disk drive fails – and in a cloud that is storing the medical records for 330 million people, this will always be happening – the Peake cloud instead stores three copies of each medical image in the cluster. (Hadoop does the same thing for data sets for data protection and access reasons, by the way. The power of three...)
The Caringo arrays were also chosen because disks inside the arrays can spin up and spin down as access patterns to the data change. In the States, you have to keep seven years' worth of medical records for adults, and for children, pediatricians have to keep all of their records plus the seven years when they become adults. This can be a huge amount of data, which translates into big gobs of money for disks, power, and cooling.
The good news, says Philbin, is that the access patterns to old data decay exponentially. "The sicker the patient, the more likely an image is viewed," Philbin explains. "But for the average patient, an image is not viewed much after the first year."
A medical imaging cloud is more than some virtual servers with clever object storage, of course. Peake has chosen a package called dcm4che, which is an open-source clinical image and object management system.
Because the United States does not have a national identification number for its citizens, Peake had to create some sophisticated record management systems that layer on top of dcm4che – allowing records from multiple sources and using multiple medical record numbers to be aggregated and stored for one patient. This is called MPI, short for master person index, and is not to be confused with Message Passing Interface, the clustering framework that supercomputer simulations use.
The idea behind the PeakSecure is to do the rendering of medical images back on the server farm and to keep the images secure back on that cloud. If the image never leaves, then you don't have to worry about where it has been sent. While Philbin says that it will support multiple ways of streaming CT and MRI scans and X-rays down to doctors working in the field, the company has chosen VMware's View 5 virtual desktop infrastructure to stream down to thick clients and thin clients. (View 5 can even push an image down to a smartphone or an iPad if you want that.) The back-end rendering and streaming of images makes use of Teradici's PCoIP protocol. By doing it this way, doctors can use the computing they have on site rather than having to buy a $15,000 or $20,000 high-end workstation to render images. And they don't have to actually move the images over the wire to view them.
The medical record and image data is ginormous in the States, with Philbin citing statistics from IBM that say about 30 per cent of the entire disk capacity in use in servers in America is dedicated to medical records and images. With the private cloud version of PeakeSecure that Peake built for two Johns Hopkins hospitals to test out the code, medical images range from 200MB to a couple of gigabytes, depending on the type and resolution of the scan. Moving such files from a cloud to a doctor's PC is just not practical, and they would need a pretty hefty box to render the image locally. But Philbin says he has done cross-country tests of the private cloud and that as long as he can get around 6Mbit/sec of broadband bandwidth and keep the latencies under 25 milliseconds, the Peake cloud can render and stream the images down to a client in such a way that they are perfectly usable by doctors.
The full private cloud version of PeakeSecure will be deployed at Johns Hopkins in March, and the public cloud version will be done by the end of the second quarter or early in the third quarter, according to Philbin.
Peake will be co-locating its iron in managed data centers and says it can serve the East coast with only three data centers – although it will eventually add three more to cover the rest of the country. Presumably there will be algorithms across these data centers to move the data closest to where people live as they move around.
Pricing for the PeakeSecure has not been set, but the idea is for it to become a central repository for various medical applications on the market (some of them available from Harris). Peake's cloud will sell storage on its cloud by capacity used, much like Amazon does with its S3 storage cloud, and will also make access available on a per-image accessed basis. The idea is to come up with utility pricing that makes it attractive to small medical offices as well as giant hospital systems, with everyone paying for what they use and not having to shell out huge capital expenses to play. ®
Employees vs government vs hackers
Hacking is so last year. Why not just pay an employee with direct access to the database for details of medical records? I'm sure there are plenty of low-paid healthcare employees who would not be against taking a few buck in return to do an illegal data search for the ploice / your prospective employer / the press etc
Anyways, the real problem with the security of a medical mega-database is who the government legally allows access. Expect upcoming data trawls for DNA matching, for meta scans for medical insurance fraud, etc etc etc. And what about giving Google "limited, anonymous" access for marketing purposes?
Data is power, and the more easily a government can tie all our data up together the more power they have over us.
The reply from Sporkinum confuses the compression applied to the image on disk (which may or may nto be lossy - I don't know).
Anonymous coward asked about PCOIP compression - which is the compression used by the remote desktop protocol. Read up about PCOIP - it is pretty smart. It will 'build to lossless' - ie you might see some loss when an image is rotating, depending on how much bandwidth you have.
HOWEVER when the image is stationary - ie like an X-ray image - you will have a perfect image.
Speaking as someone who worked on the first PACS system in the UK, this is really interesting stuff and a good use for PCOIP.
In the US, even though the Social Security Number (SSN) is not supposed to be used as either a general or national ID, in practice it is. Of course, this leaves yet more room for miscreants to get the 'crown jewels' of identity theft.
On the cloud front, the more sensitive data that's out there, the more vulnerability, History seems to prove that internet security is a game of leapfrog.