Feeds

Mapping the universe at 30 Terabytes a night

Jeff Kantor, on building and managing a 150 Petabyte database

  • alert
  • submit to reddit

Combat fraud and increase customer satisfaction

Interview It makes for one heck of a project mission statement. Explore the nature of dark matter, chart the Solar System in exhaustive detail, discover and analyze rare objects such as neutron stars and black hole binaries, and map out the structure of the Galaxy.

The Large Synoptic Survey Telescope (LSST) is, in the words of Jeff Kantor, LSST data management project manager, "a proposed ground-based 6.7 meter effective diameter (8.4 meter primary mirror), 10 square-degree-field telescope that will provide digital imaging of faint astronomical objects across the entire sky, night after night." Phew.

When it's fully operational in 2016, the LSST will: "Open a movie-like window on objects that change or move on rapid timescales: exploding supernovae, potentially hazardous near-Earth asteroids, and distant Kuiper Belt Objects.

"The superb images from the LSST will also be used to trace billions of remote galaxies and measure the distortions in their shapes produced by lumps of Dark Matter, providing multiple tests of the mysterious Dark Energy."

In its planned 10-year run, the LSST will capture, process and store more than 30 Terabytes (TB) of image data each night, yielding a 150 Petabytes (PB) database. Talking to The Reg, Kantor called this the largest non-proprietary dataset in the world.

Data management is one of the most challenging aspects of the LSST. Every pair of 6.4GB images must be processed within 60 seconds in order to provide astronomical transient alerts to the community. In order to do this, the Data Management System is composed of a number of key elements. These are:

  • the Mountain/Base facility, which does initial data reduction and alert generation on a 25 TFLOPS Linux cluster with 60PB of storage (in year 10 of the survey)
  • a 2.5 Gbps network that transfers the data from Chile (where the telescope itself will be based) to the U.S. and within the US
  • the Archive Center, which re-reduces the data and produces annual data releases on a 250 TFLOPS Linux cluster and 60PB of storage (in year 10 of the survey)
  • the Data Access Centers which provide access to all of the data products as well as 45 TFLOPS and 12 Petabytes of end user available computing and storage.

So what's a time-critical system of this magnitude written in?

The data reduction pipelines are developed in C++ and Python. They rely on approximately 30 off-the-shelf middleware packages/libraries for parallel processing, data persistence and retrieval, data transfer, visualization, operations management and control, and security. The current design is based on MySQL layered on a parallel, fault-tolerant file system.

Combat fraud and increase customer satisfaction

More from The Register

next story
OpenBSD founder wants to bin buggy OpenSSL library, launches fork
One Heartbleed vuln was too many for Theo de Raadt
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Ubuntu 14.04 LTS: Great changes, but sssh don't mention the...
Why HELLO Amazon! You weren't here last time
Got Windows 8.1 Update yet? Get ready for YET ANOTHER ONE – rumor
Leaker claims big release due this fall as Microsoft herds us into the CLOUD
Patch iOS, OS X now: PDFs, JPEGs, URLs, web pages can pwn your kit
Plus: iThings and desktops at risk of NEW SSL attack flaw
Next Windows obsolescence panic is 450 days from … NOW!
The clock is ticking louder for Windows Server 2003 R2 users
Batten down the hatches, Ubuntu 14.04 LTS due in TWO DAYS
Admins dab straining server brows in advance of Trusty Tahr's long-term support landing
Red Hat to ship RHEL 7 release candidate with a taste of container tech
Grab 'near-final' version of next Enterprise Linux next week
Apple inaugurates free OS X beta program for world+dog
Prerelease software now open to anyone, not just developers – as long as you keep quiet
prev story

Whitepapers

Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.