HPC

This article is more than 1 year old

HPE's Eng Lim Goh on spaceborne computers, NASA medals – and AI at the final frontier

Never mind the edge, try running a super 'puter up there

Mon 7 Oct 2019 // 09:45 UTC

Interview Though HPE's Spaceborne Computer is still fresh from its jaunt to the International Space Station, veep and CTO for HPC and AI Dr Eng Lim Goh is pondering a return visit and outfitting missions to Mars with the company's kit.

The Register met Dr Goh after he'd been reassuring attendees at the Sibos 2019 event that AIs were worse than children at spotting giraffes without considerable training. UK prime minister Boris Johnson's vision of "pink-eyed terminators" is perhaps a while away.

More imminent, however, is NASA's use of off-the-shelf supercomputers in space as Goh showed off medals received from the US space agency after the successful conclusion of the first Spaceborne mission.

NASA Exceptional Technology Achievement Medal

The Exceptional Technology Achievement Medal was awarded "for successfully demonstrating the first commercial supercomputing platforms on the ISS, capable of executing over one trillion calculations per second for a year without requiring a reset".

The goal of the mission was, according to Goh, to show NASA that off-the-shelf hardware could be reliable in space, rather than custom chippery replete with a lengthy gestation period.

To be fair, the ISS is festooned with laptops (and, of course, some ageing Raspberry Pi hardware), but getting a supercomputer into orbit required a change of thinking at the agency, and some challenges for HPE.

"Just before launch we just picked the latest 1U server and plugged it into a locker. The only issue was that a 1U server is quite deep and the Express racking [in the Destiny Lab] is quite shallow... so we turned it around and used two slots."

The desire to keep the hardware as stock as possible meant that the kit needed AC power. "However," said Goh, "the space station uses solar panel DC power. So NASA supplied us with inverters to convert DC to to AC so that we can plug ourselves right in.

"So of the four power supplies in the two servers, one of them did fail during the 1.6 years. However, they are all redundant anyway – so it didn't interrupt operations or applications."

Certainly, anyone who has spent time in the company of servers will recognise the foibles of power supplies. Goh told us: "First and foremost, lesson learned, maybe we need a triple-redundant power supply..."

Of course, NASA occasionally had to shut down the power to the racks, which allowed a swift replacement. As it transpired, the system ended up being rebooted four times during its 1.6 years of running on the ISS due to "various reasons on the station".

Running shrinkwrapped Red Hat Linux and software to harden the system against environmental factors such as cosmic radiation (rather than the hefty and expensive physical hardening usually used), the Apollo Spaceborne Computer still suffered its fair share of problems. "Nine of the 20 SSDs failed," remarked Goh, but redundancy ensured the thing kept ticking over. And the lengthy period running on orbit means that lessons can be learned.

Now, back at the factory following a SpaceX splashdown, "it booted up fine, even after the harsh landing". And those SSDs? "We are suspecting that it could be more the controllers because during the four reboots in space, some of the SSDs came back."

Good to know that the old BOFH standby of turning it off and on again can work just as well in orbit.

Of course, the goal was to stop already busy astronauts going anywhere near the device to fix problems. "What we did was develop three circles of software, the outermost circle supervising the second level, and the second level supervising the core and for it to also sense correctable errors, and in the future be able to sense inputs from the station saying there's a storm coming then respond appropriately."

The version that ran for 1.6 years on the ISS "had the ability to sense correctable errors". Goh explained the system dealt with these problems, but there was a danger that "correctable errors might accelerate to a point that it hits a threshold and it becomes uncorrectable".

"Which," he understated, "would be a problem for applications." A bad day in space indeed.

"We decided that after a certain threshold of correctable errors, let's be on the conservative side, after the next correction, retire that page. We can't retire a bit, but we can retire the page around that bit. So these are some of the mitigating things trying to keep the system going."

As for what the computer actually did, Goh told us the gang thrashed it with benchmark software from HPCG as well as Linpack and some of NASA's own. The poor thing was tortured in the CPU, memory and storage departments (aside from those reboots). And the performance decline? "Minimal," according to Goh.

Back to orbit, the Moon and beyond

Goh plans to send another computer to the ISS in the coming years, again pulled from what HPE is selling at the time, but this time the machine won't just be running benchmarking software. "Now we know its limits, we can run typical applications in space."

Unsurprisingly, because it is a heck of a lot more efficient to process data at the source rather than transmit it back to the ground for crunching by earthbound hardware, "NASA has strong interest in running the applications."

And, of course, HPE would also like its computers on NASA's upcoming Moon missions "because that's the penultimate step before Mars," explained Goh. "The station is still in a somewhat protected orbit [from radiation]." As such, seeing how the stock supercomputing hardware performs in deep space is a precursor to more ambitious applications.

And Goh has high hopes that supercomputing hardware could be used on space telescopes and probes, as well as reducing crew workload. While spacecraft are becoming ever more sensitive, getting the data back to Earth for processing is forever constrained by bandwidth and latency. "It's getting difficult," said Goh, "to keep up with the amount of data as it is coming through the sensors."

While NASA has always shovelled reprogrammable computing power into its probes (the software running on its Voyager probes is quite different from what left the launch pad more than 40 years ago), Goh reckons that the boost from supercomputing hardware could see tools like machine learning pushed out to actual spacecraft. The probes, he said, could then "learn locally... without having to send all the data back to Earth, which would be impractical anyway."

Topics

Special Features

Vendor Voice

Resources

HPC

HPE's Eng Lim Goh on spaceborne computers, NASA medals – and AI at the final frontier

Never mind the edge, try running a super 'puter up there

Back to orbit, the Moon and beyond

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

Los Alamos Lab powers up Nvidia-laden Venado supercomputer

HPE sues China's Inspur Group over server patents

Microsoft foresees a new type of AI PC: A Surface designed with help from machines

Getting on board with AI

NASA will send astronauts to patch up leaky ISS telescope

NASA confirms Florida house hit by a piece of ISS battery pack

India and EU finally advance HPC collaboration project hatched in 2022

Intel's neuromorphic 'owl brain' swoops into Sandia labs

Butler Investments joins Atos rescue party

Google is wrong to put AI search features behind paywall, says HPC leader

HPE bakes LLMs into Aruba as AI inches closer to network takeover

Lenovo scores deal to build supercomputer at UK's Hartree Center

About Us

Our Websites

Your Privacy