Feeds

Mapping the universe at 30 Terabytes a night

Jeff Kantor, on building and managing a 150 Petabyte database

  • alert
  • submit to reddit

Build a business case: developing custom apps

Interview It makes for one heck of a project mission statement. Explore the nature of dark matter, chart the Solar System in exhaustive detail, discover and analyze rare objects such as neutron stars and black hole binaries, and map out the structure of the Galaxy.

The Large Synoptic Survey Telescope (LSST) is, in the words of Jeff Kantor, LSST data management project manager, "a proposed ground-based 6.7 meter effective diameter (8.4 meter primary mirror), 10 square-degree-field telescope that will provide digital imaging of faint astronomical objects across the entire sky, night after night." Phew.

When it's fully operational in 2016, the LSST will: "Open a movie-like window on objects that change or move on rapid timescales: exploding supernovae, potentially hazardous near-Earth asteroids, and distant Kuiper Belt Objects.

"The superb images from the LSST will also be used to trace billions of remote galaxies and measure the distortions in their shapes produced by lumps of Dark Matter, providing multiple tests of the mysterious Dark Energy."

In its planned 10-year run, the LSST will capture, process and store more than 30 Terabytes (TB) of image data each night, yielding a 150 Petabytes (PB) database. Talking to The Reg, Kantor called this the largest non-proprietary dataset in the world.

Data management is one of the most challenging aspects of the LSST. Every pair of 6.4GB images must be processed within 60 seconds in order to provide astronomical transient alerts to the community. In order to do this, the Data Management System is composed of a number of key elements. These are:

  • the Mountain/Base facility, which does initial data reduction and alert generation on a 25 TFLOPS Linux cluster with 60PB of storage (in year 10 of the survey)
  • a 2.5 Gbps network that transfers the data from Chile (where the telescope itself will be based) to the U.S. and within the US
  • the Archive Center, which re-reduces the data and produces annual data releases on a 250 TFLOPS Linux cluster and 60PB of storage (in year 10 of the survey)
  • the Data Access Centers which provide access to all of the data products as well as 45 TFLOPS and 12 Petabytes of end user available computing and storage.

So what's a time-critical system of this magnitude written in?

The data reduction pipelines are developed in C++ and Python. They rely on approximately 30 off-the-shelf middleware packages/libraries for parallel processing, data persistence and retrieval, data transfer, visualization, operations management and control, and security. The current design is based on MySQL layered on a parallel, fault-tolerant file system.

Gartner critical capabilities for enterprise endpoint backup

More from The Register

next story
Why has the web gone to hell? Market chaos and HUMAN NATURE
Tim Berners-Lee isn't happy, but we should be
Microsoft boots 1,500 dodgy apps from the Windows Store
DEVELOPERS! DEVELOPERS! DEVELOPERS! Naughty, misleading developers!
'Stop dissing Google or quit': OK, I quit, says Code Club co-founder
And now a message from our sponsors: 'STFU or else'
Apple promises to lift Curse of the Drained iPhone 5 Battery
Have you tried turning it off and...? Never mind, here's a replacement
Mozilla's 'Tiles' ads debut in new Firefox nightlies
You can try turning them off and on again
Linux turns 23 and Linus Torvalds celebrates as only he can
No, not with swearing, but by controlling the release cycle
Scratched PC-dispatch patch patched, hatched in batch rematch
Windows security update fixed after triggering blue screens (and screams) of death
prev story

Whitepapers

Top 10 endpoint backup mistakes
Avoid the ten endpoint backup mistakes to ensure that your critical corporate data is protected and end user productivity is improved.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Backing up distributed data
Eliminating the redundant use of bandwidth and storage capacity and application consolidation in the modern data center.
The essential guide to IT transformation
ServiceNow discusses three IT transformations that can help CIOs automate IT services to transform IT and the enterprise
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.