The Register® — Biting the hand that feeds IT

Feeds

This supercomputing board can be yours for $99. Here's how

Adapteva's parallel dash for community cash

SaaS data loss: The problem you didn’t know you had

Feature Adapteva, an upstart RISC processor and co-processor designer that has tried to break into the big-time with its Epiphany chips for the past several years, is sick and tired of the old way of trying to get design wins to fund its future development.

So it has started up a community project called Parallella that seeks to get users to pay for development directly through crowdfunding via Kickstarter.

"We're going to build a community around parallel computing," Andreas Olofsson, CEO and co-founder of Adapteva, tells El Reg. "It will be kind of like the Raspberry Pi, but with real performance."

He is quick to add that he has nothing against Raspberry Pi, but rather that a hybrid architecture, marrying ARM processors and Epiphany massively parallel RISC coprocessors, is the way to go.

A community of enthusiasts monkeying around with hardware might be able to sustain the development of current and future Epiphany chips if the Kickstarter plan pans out. The initial target price to get a board is $99 compared to the $35 for a Raspberry Pi board, which is a little high but not unreasonably so.

Vibrant communities sprang up around Beagle boards and Arduino kits, too, so there is some precedence for this. Perhaps more significantly, the Parallella community approach is as reasonable and sensible as begging for money from venture capitalists and trying to go up against Intel with its Xeon and Xeon Phi coprocessors or AMD with its FirePro GPU coprocessors or Nvidia with its Tesla GPU coprocessors. And it is less demeaning, too. Provided it actually raises the necessary funds.

Risc-y business

Adapteva was founded in February 2008, and as Olofsson, which designed digital signal processors at Analog Devices for a decade, jokingly explained to El Reg, "I got the RISC memo from 1980 and I paid attention."

The idea with RISC was to have chips with relatively simple instructions and to do complex things by combining operations in quick succession; the theory was that a simple RISC chip could get more work done than a CISC processor, and there are not enough pixels in the world to settle that argument in this story.

Suffice it to say that Olofsson and his compatriots at Adapteva – Roman Trogan, director of hardware development, Oleg Raikhman, director of software development, and Yaniv Sapir, director of application development, who all hail from Analog Devices and worked on the TigerSHARC DSPs – believe that for parallel computing to take off, devices have to be simple, cheap, and accessible. And so they designed the Epiphany line of processors to be used as coprocessors as well to make them more accessible and therefore useful.

Adapteva's Epiphany-IV chip

Adapteva's Epiphany-IV chip

But the problem that Adapteva is chasing is much larger than providing cheap parallel computing for hobbyists. The company wants to be at the forefront of exascale computing, and to do so by providing the cheapest and most energy-efficient floating point operations on the planet.

"We have been out there for four years now, and we see that the pickup for parallel processing is too slow," says Olofsson. "There are too many gatekeepers, and too many people can't afford the $10,000 startup fee for a reference board to run tests and do development."

The Parallella Kickstarter funding program is about changing that, with users being given an older generation Epiphany board if they help fund the development of the future ones.

Before we get into that program, we need to talk about the Epiphany chips. They have their own instruction set, although Olofsson says he was inspired by MIPS and ARM RISC processors as well as the DSPs that he and his co-founders know so well. And like the massively multicore processors from Tilera, the idea behind the Adapteva chips is to take hoards of very modest RISC processors and lash them together with an on-chip interconnect.

Olofsson took this approach not because he loves minimalist core designs out of a three-decade old textbook, but because this is the only approach that will fit into the thermal envelope that will limit exascale-class systems.

Adapteva sees the parallel challenge

Adapteva sees the parallel computing challenge as its opportunity

Like most people in the processor and coprocessor chip rackets, Adapteva thinks the future of computing is both parallel and heterogeneous. T be even more specific, the company believes that you need a clean slate approach on the coprocessors because this is the only way to get the coprocessors, which will do most of the heavy lifting on compute, to be much more efficient than the usual suspect processors we are used to on our desktops, inside out handhelds, and in our data centers.

The Epiphany core has a mere 35 instructions – yup, that is RISC alright – and the current Epiphany-IV has a dual-issue core with 64 registers and delivers 50 gigaflops per watt. It has one arithmetic logic unit (ALU) and one floating point unit and a 32KB static RAM on the other side of those registers.

Each core also has a router that has four ports that can be extended out to a 64x64 array of cores for a total of 4,096 cores. The currently shipping Epiphany-III chip is implemented in 65 nanometer processors and sports 16 cores, and the Epiphany-IV is implemented in 28 nanometer processes and offers 64 cores.

Block diagram of the Epiphany chip

Block diagram of the Epiphany chip

The secret sauce in the Epiphany design is the memory architecture, which allows any core to access the SRAM of any other core on the die. This SRAM is mapped as a single address space across the cores, greatly simplifying memory management. Each core has a direct memory access (DMA) unit that can prefetch data from external flash memory.

The initial design didn't even have main memory or external peripherals, if you can believe it, and used an LVDS I/O port with 8GB/sec of bandwidth to move data on and off the chip from processors. The 32-bit address space is broken into 4,096 1MB chunks, one potentially for each core that could in theory be crammed onto a single die if process shrinking continues.

Steps to Take Before Choosing a Business Continuity Partner

Next page: Scaling up to exascale

Missed point.

Err... you may have missed the point.

1) They're suggesting that the power consumption is a barrier to wider adoption.

2) Also, what the Reg didn't cover was the other barrier to adoption: parallel computing suffers from a lack of skilled programmers. The first computing revolution was powered by self-taught hobbyist programmers on single-processor boards. The developers believe that this has created a generation of single-processor-centric programmers without the skills for parallel work. They want to create a hobbyist scene for parallel processing and foment a skills revolution in the parallel computing sphere, which will then (hopefully) allow genuine parallel processing to become part of mainstream computing, as opposed to the minimalist OS-managed parallelism of current-gen multicore processors.

Cynics viewpoint: what we have is a bunch of clever blokes who developed a clever processor and found that the people who could use it don't want it, and those who might want it couldn't use it, so they're repositioning it as a hobbyist teaching toy.

Optimist's viewpoint: a bunch of clever blokes developed a clever processor that solves a clever problem, and finding that the market couldn't take advantage of it, they decided to try to develop the market by themselves.

17
0
Anonymous Coward

It's the Transputer all over again.

8
0

Re: Intel & IBM & MS

Occam was "mathematically proveably correct" at the cost of being very static. This made it ok for fixed embedded tasks like radar processing, but very difficult for anything more general purpose, unless you built your own dynamic memory management on top of it, like our project did, or used the C compiler and libraries.

By the time they ironed out the H1 (or T-9000, or whatever the next big thing after the T-800 was) Intel x86 performance had left it in the dust, and has done until now, when the mainstream is running out of ways to improve it economically, and are being forced back into considering these old ideas.

The article is right. What is holding back progress is ultimately soflware to take advantage of all that horsepower. Put enough cheap hardware into the hands of the hobbyist masses, and we should see some interesting things come out of it.

6
0

More from The Register

SCO vs. IBM battle resumes over ownership of Unix
Zombie lawsuit back and wants to suck the brains out of Linux
 breaking news
You don't need phone lines or cable for ANYTHING, says Dish
The satellite-dish man can sort you out with phone and broadband over the air too
 breaking news
What's HP got under wraps? Looks awfully flash and tape shaped
What happens in Vegas won't stay there - we've got the details
Microsoft borks botnet takedown in Citadel snafu
Stupid Redmond kicked over our honeypots, wail white hats
IBM's $1bn layoffs latest: Now axe swings in US, Canada - reports
Union claims 121 storage bods canned after dismal sales
NetApp musters muscular cluster bluster for ONTAP busters
Storage array OS overhauled to juggle more nodes, go down on you, er, less
HP adds 'Haswell' Xeon E3s to entry ProLiant servers
Gussies up MicroServer for SMBs, adds baby switches