Oracle tunes Solaris for Intel's big Xeons
Eight-socket box catching up Sparc
Word is trickling out of Oracle that is recently acquired Solaris has been heavily tuned to support Intel's new eight-core Nehalem-EX Xeon 7500 beasties.
Pity, then, Oracle does not yet seem to have a Nehalem-EX box in the field. But the indications are that Oracle is working on an eight-socket box.
First, said eight-socket box. In a briefing with El Reg, Shannon Poulin, director of Xeon platform marketing at Intel, flashed a foil that showed all the vendors making Nehalem-EX servers by socket count and form factor. Oracle was not among the group making two-socket blade and rack servers using the Xeon 7500 and its HPC variant, the Xeon 6500 - which is only available for two-socket machines. Nor was Oracle among another group of server peers who admit they are making four-socket machines using the Xeon 7500.
Oracle was, though, among the vendors named in the presentation that were working on machines with eight or more sockets.
And then there is the photographic evidence below - compliments of our very own Rik Myslewski, who spotted an Oracle Nehalem-EX in the wild at the launch event Intel hosted in San Francisco.
This don't look like a four-socket box, does it?
If that is indeed an eight-socket Sun Fire server, then it looks like Oracle is taking the uniboard design it used with the Sparc servers and then applied to Opteron machines and using it on Nehalem-EX machines.
The server sure looks like eight separate system boards that slide into the chassis horizontally, with eight disk drivers underneath and four power supplies off to the side. If that is not a Nehalem-EX machine, it could be an 8U chassis that is cramming in eight two-socket servers on cookie sheet trays. Intel did not bring out any machinery as part of its dog and pony, so I could not peek into this box and see what it had inside.
In the absence of shipping Nehalem-EX iron, Oracle is talking up how OpenSolaris, the development version of Solaris, has been tweaked so it can support some of the new features in the latest Xeon chips from Intel. Scott Davenport, one of the developers responsible for OpenSolaris at Oracle, said in a blog that he has been working since last summer with Intel's engineers to align OpenSolaris and Xeon chips, and that the hot plug CPU and memory features of the Unix variant now work with the Xeon 7500s.
The current Fujitsu-designed Sparc64-based Sparc Enterprise M servers support hot plug CPU and memory cards already running Solaris 10, and Sun itself has supported hot plugging of these features since the UltraSparc-III systems nearly a decade ago. Support could be even older than that, with the Starfire E10000 high-end servers. If so, add it to the comments at the end.
Just as the Nehalem-EX chips were being readied for market, Oracle put out a whitepaper outlining the optimizations that the OpenSolaris and Solaris team have made for the Westmere-EP Xeon 5600s and the Xeon 7500s.
Oracle didn't say anything precise about its own Xeon 7500 iron in that report, but did say that on a Java server-side benchmark test, it was able to show near linear scalability in moving from a Xeon 7500 server with four sockets to one with eight sockets. Ditto for an unnamed CPU-intensive workload.
Oracle added that the four-socket Xeon 7500 machines showed about twice the performance as the current four-socket Xeon 7400 machines, and offered up to four times the memory bandwidth thanks to the shift to the benefits of QuickPath Interconnect over the old frontside bus used in the 7400s.
Oracle and Intel have worked on Solaris Fault Manager, which diagnoses hardware and software errors and offlines components that are faulty. Solaris FM now can function atop Xeon 5600 and 7500 processors, and the utility embedded in Solaris also can tap into the Machine Check Architecture (MCA) recovery feature that was added to the Xeon 7500s to recover from double-bit memory errors.
This was one of the big features Intel was going on about at the Nehalem-EX launch at the end of March. ®
Actually, the x86 requires you to support over 1000 instructions in it's instruction set. It is really buggy and bloated and old. It takes many millions of transistors to decode and figure out where the next instruction is. Many more transistors to support old instructions that no one uses. etc. x86 is buggy and bloated.
But Intel and AMD has poured lots of money in it, so it is getting helluva fast! Nehalem-EX is extremely fast, for a cheap price. Soon the 32nm version will come, with even higher performance. Soon x86 will be the fastest commodity cpus on earth. The pace is extreme. Much higher than for any other CPU architecture.
If half of that money went into a clean architecture, such as SPARC, we would have faster CPUs, for much less power. It is only a matter of resources. If someone developed a badly designed OS, that crashed all the time, but poured in huge amounts of money, then that OS would take over the world with a market share of 90%. Even though that OS would suck big time.
It is only a matter of resources. Right now, x86 has the most resouces so it will crush everything else soon.
Regarding 8-socket Nehalem-EX and Solaris. That is a match in heaven. Solaris has a long reputation of scaling well, with excellent performance and stability. I expect this machine to be quite cheap, with a high efficiency ratio. It will surely best other, more expensive Unix machines.
Now Starfire that was a cool name for a server. The E10K rocked.
x86 hoovers ...
... them all up. If that's what you mean by "sucks", I fully agree.
Instruction set these days are much less of a distinguisher than they used to be. You say "transistor count" - well, add two million or so additional transistors to the next-gen x86 for improvements on the CISC decoder stages, i.e. roughly quadruple them - that's far less than 0.1% of the total additional transistor budget for a new-gen CPU (remind me, how many billions of transistors does a 8-core chip have ?). In the big picture of things, having 1 million or 10 million transistors doing instruction predecoding, so what. There's orders of magnitude more in caches, buffers, interlinks and other glue these days.
If instruction efficiency were really key to the success of anything, then we'd never have seen Java get to where it is - the bytecode being a stack machine is just about as hardware-implementation-scalability unfriendly as could be. Who (apart from JVM/JIT implementors that have both my pity & admiration) cares ?
Also, wrt. to what current x86/x64 actually are:
First, both Intel as well as AMD have used RISC engines in their cores for many years (variously called my-ops, micro-ops, R-Ops or some such), including superscalar/VLIW-style instruction bundling. The x86 instruction set compatibility is only a shim layer.
Second, current x86 optimization guides by both Intel and AMD actually state (if phrased less in-your-face): You want this thing fast, use simple instructions, order them for no/little interdependency - in short, use RISC, the x86/x64 bits mapping 1:1 to what the low-level engine does are by far the best for you.
Third, agreed that x86/x64 may have a few thousand instructions (the instruction set reference manuals have 1600+ pages these days). But have you ever made the experiment how many of them are indeed used in executable binary code on your systems ? When teaching low-level debugging and x86 assembly language, one of my favourite experiments was to let students write a little perl script that found all ELF i386/x64-64 executables, disassembled the code, sorted by instruction and then created an instruction set histogram. Invariably, one found that 99.9% were made up from a set of no more than about 50 opcodes. Programming a CPU using only ~50 instructions, where have I heard that before ? Ah - yes, I think it's called "RISC".
That's not to say splashing resources and giant transistor counts all over it is the only way to get decently-performing CPUs. It's just the one that, by experiment, in the last 20 years has proven to create the best-performing server / workstation class CPUs. But you only need to look beyond that space, what do you see ? Set-top boxes, mobile phones, tablet devices, consoles, the entire "embedded" space uses MIPS, ARM, PPC and other, elsewhere forgotten/abandoned architectures. And Intel's attempt to push x86 into that space isn't anywhere near as simple as Intel dreamt it to be.
To sum this up:
x86/x64, in the high-end, "stationary" systems space, has proven to be just about efficient and performant enough to beat everything else. There, it sucks ... up all the competitors.
x86/x64, in the mobile/embedded space, never made inroads. There SoC solutions, tiny power/thermal/form factors are mandatory, and implementors want to license modular designs to combine cores, graphics, comms etc. into a single package. Off-the-shelf components there are frowned upon. Here, x86 sucks.
Gosh, you're right after all. No matter how you look at x86/x64, it always sucks !