Sun cranks clocks on Sparc T2 and T2+
Revs LDom hypervisor
The executives at server and operating system maker Sun Microsystems have been uncharacteristically quiet since the $5.6bn Oracle deal was announced back in April. And they've been silent since Sun's shareholders approved the deal last Thursday. This - from one of the most aggressive, PR-driven firms on the planet - is a bit disturbing. But Oracle is calling the shots, which is why the IT trade press had to figure out for itself that Sun has actually done a good thing and boosted the clock speeds on its 'Niagara' family of Sparc T2 and T2+ processors.
The Sparc T2 chips, known as 'Niagara-2' internally at Sun, are used in two-socket boxes. The Sparc T2+ chips are used in four-socket machines and are known as 'Victoria Falls.' The Sparc T2 chips came out in August 2007, and the T2+ chips made their debut in October 2008.
Sun has positioned the Sparc T series of chips as leaders in performance per watt, saying it offers better bang for the buck than RISC or Itanium alternatives running Unix. For customers with Sparc-Solaris workloads, the Niagara servers offer compatibility with prior Sun UltraSparc and Fujitsu Sparc64 chips, which means customers do not have to recompile their code for the x64 variant of Solaris 10 to get a competitive entry or midrange Sparc box.
Both Sun and Fujitsu have been reselling Sparc T-based machines for the past two years, just as they both sell bigger Sparc Enterprise machines based on the quad-core Sparc64 VII processors made by Fujitsu. The T2 and T2+ chips have eight cores and eight threads per core, making it the most highly cored and threaded chip in commercial data centers today.
What Sun has not been able to do easily is get the clock speed of the chips up, and that's because it is hitting the same thermal ceiling as other chip makers. According to Sun, a 1.4 GHz Sparc T2 chip with all eight cores being stressed by an application can hit as high as 123 watts, and even during normal loading it hits 95 watts. That's about what a quad-core 'Nehalem EP' Xeon does.
The move from eight-core, four-thread Sparc T1 chips to the eight-core, eight thread T2 chips did not much change in clock speed, although the T2 did has twice as many threads and could be used in two-way machines, which gave systems about twice the oomph on workloads. Specifically, the Sparc T1 topped out at 1.2 GHz and had a 1 GHz variant. The T2 chips had a top speed of 1.4 GHz, with a 1.2 GHz variant for customers who wanted lower thermals and a 900 MHz experimental chip for even lower thermals (such as in blade servers).
Starting today, Sun and its fab partner, Texas Instruments, can deliver Sparc T2 and T2+ chips running at 1.6 GHz. Representatives from Sun were not available as we went to press with this story, so it is unclear if Sun has talked TI into doing some sort of process shrink to get the extra 14.3 per cent increase in clock speed. Considering the financial shape of Sun, it is far more likely that TI is just doing deep sorts on the Sparc T bins to find chips that can run at the higher clock speed. Hopefully, they can do so at a slightly lower voltage than the standard Sparc T2 and T2+ chips and therefore stay within the power budget.
It looks like Sun is also supporting 800 MHz DDR2 main memory in the Sparc T2 and T2+ servers too. Prior machines used 667 MHz DDR2 main memory.
Sun is charging a pretty big premium for the extra Sparc T speed bump. A T5440 server with four 1.4 GHz T2+ chips with all 256 threads activated in the four-socket box, plus 128 GB of memory and two 146 GB disks has a list price of $89,895. Jacking that machine up to four 1.6 GHz T2+ chips with the same hardware otherwise boosts the price to $115,695. That's a 28.7 per cent price hike for 14.3 per cent more clocks. On a two-socket T5240 machine using the 1.4 GHz T2 chips, a machine with 128 threads, 64 GB of memory and two 146 GB disks costs $45,495, but jumping up to the 1.6 GHz chips raises the price by 24.2 per cent to $56,495.
On the single-socket T5220 server, a machine with 64 threads running at 1.4 GHz with 32 GB of memory and two 146 GB disks costs $27,895, and Sun boosts the configuration up to 64 GB with the 1.6 GHz versions of the T2 chip and raises the price to $45,895. It is not clear what Sun is charging for 800 MHz DDR2 memory, but it is around $100 per GB on the street for 667 MHz chips for the T5220. Which means it might be as high as $150 to $200 per GB for Sun list price and then another premium for the higher memory speed. Call it around $8,000 for the incremental memory in the fatter 1.6 GHz configuration of the T5220. That would put the price premium for the 1.6 GHz chips in this single-socket box at around 36 per cent, not including the cost of the extra memory.
This is a lot to pay for extra performance. But that is what all chip makers do with their top bins.
On Tuesday, Sun also updated its Logical Domain (LDom) partitioning technology with release 1.2. The updated LDom software can power down unused Sparc T cores that are not being used and has a new set of built-in configuration tools that make it easier to create and deploy LDoms on the Sparc T machines. (You obviously don't need the faster processors to get the new LDom software, and Sun distributes this LDom code as a patch to the Sparc T systems for free).
The virtual networking support in LDoms now has support for jumbo frames, which makes big file transfers go faster and reduces CPU overhead. Sun has also added domain mobility with LDom 1.2, and presumably, this means that domains can be live migrated between two physical Sparc T boxes. Sun has rolled up a physical-to-virtual converter into LDom 1.2 as well, which speeds up the conversion of applications running on legacy Sparc platforms to virtual ones that can be deployed on the Niagara family of servers.
The LDom 1.1 update came out in November 2008. It included performance enhancements, virtual I/O dynamic reconfiguration, hybrid I/O for network interfaces (allowing a physical NIC to be tied to a virtual machine to boost performance), and virtual disk failover.
LDom is software that should be running on all Sparc servers and should have been on them five years ago because it is something all Sparc machines have needed. It will be interesting to see if LDoms survive the Oracle ax. ®
COMMENTS
Dear Mattie Pattie Laddie
"I'm scanning through the rest of your diatribe looking for something relevant, but it's just more insults and repeated waffling."
"Well, I did post a rebuttal to your previous piece of clueless waffle, but I fear Ms Bee decided it was simply too cruel to post. I'll try and remember the main points and put them down below, hopefully without upsetting Ms Bee!"
I wonder who is insulting who, here? I am not the one who gets his posts blocked because of foul language.
----------------------------------
"In reality, since not many applications fit this model, what happens is the first thread stalls and the core can't kick off a second as there is no second thread to start, or if there is a second thread then there is no third. This is why Niagara sucks so badly when it comes to the current crop of enterprise applications."
Could you please explain this again? What do you mean with "not many applications fit this model"? Maybe you havent heard about client-server, but one server serves many clients. It is not about your application has to be parallellized. Each client will be served by one thread. Server - client software is naturally parallellized. You dont have to rearchitect your server-client software.
If it were the case that Niagara code had to be rearchitected, then Niagara would suck big time - both in theory and in practice, when you did some benchmarkes. Because then the cache misses would stall all threads and the cache. But what does facit say? Who wins benchmarks? Niagara or the slooooow Power6?
As David Halko writes:
"The concept that environments with common applications would not benefit from highly threaded hardware is really a myth propagated by DoS trained folks.
With DNS cache lookups, async I/O, file system syncs, multi-threaded NIC cards, VPN encryption, HTTPS encyption, compression file systems, web browsers, background processes, software update downloads, virus checking, signature checking downloads, de-duplication, backups, RSS feeds, internet radio, MP3's playing, etc. - the common user benefits tremendously as hardware become more highly threaded with a generally more responsive platform.
Even my Windows XP desktop has 74 processes running, never mind the thousands of threads!"
Maybe I misunderstood you. Maybe you didnt talk about applications must be rearchitected. Therefore, could you explain again why Niagara is slow yet wins all these benchmarks posted?
--------------------------------
"Adding more cache would help Niagara, but a proper cache design and a larger cache would help it a lot more."
Maybe you dont know that a server CPU can not keep all the different data in it's cache? So what makes you believe a large cache would help a server CPU? Could you explain this point?
------------------------------
"Stating that Niagara would perfrom just as well with no cache is frankly the type of statement that could only be uttered by someone with their head firmly in the sand. "
I didnt say that. why are you lying again? Read my post again. What I tried to say, was that a server CPU such as Niagara doesnt need large cache, because a cache can never fit in a server workload. And therefore, maybe Niagara could perform almost equally without a cache. But I pointed out that was only far fetched speculationm, and I had not seen data on this.
-------------------------------------
"The same type of person that just cannot see that a vendor's benchmark is liable to have little bearing on how a server will perform in the real World. "
Ive posted information about a company got 50 times more throughput with a Niagara T1 than a AMD. Didnt you read my post here, or are you deliberately lying?
-------------------------------------
"Try again, newbie!"
Who is the newbie? Someone with Masters in Comp Sci and a Masters in Math, or some business people knowing nothing about CPUs? Someone that believes that a server CPU is able to hold all different data in a cache? And these insults all the time. Why? Can't we talk like educated grown up people, without insulting each other? Is it difficult to stop?
Dear Chihuahua abuser....
Well, I did post a rebuttal to your previous piece of clueless waffle, but I fear Ms Bee decided it was simply too cruel to post. I'll try and remember the main points and put them down below, hopefully without upsetting Ms Bee!
Firstly, on the cache point. What happens with Niagara is that it starts a new thread whenever the current thread is stalled waiting on data. When the second thread stalls, it kicks off a third. I'm sure even Kebabbert will agree with that part at least. In the Sunshine fantasyland, this allows the CPU cores to keep spinning and deliver a high throughput. In reality, since not many applications fit this model, what happens is the first thread stalls and the core can't kick off a second as there is no second thread to start, or if there is a second thread then there is no third. This is why Niagara sucks so badly when it comes to the current crop of enterprise applications. So, the core is stuck waiting for that first thread to come back, which is when you are really hoping for a cache hit, the only problem is that is unlikely given both the small size of the cache available to the thread and the poor cache techniques used. Remember, even with T2+ we only have 4MB shared out between sixteen "cores" and possibly up to 264 threads, and that's before we consider what else the cache has to hold. Odds on you will get a cache miss and have to go off to RAM or disk. Even in the Sunshine fantasyland scenario this is an issue as the delay means your response time has just gone through the roof.
Now, Sun knew this when they designed T2+ and they talk a load of hooey about how they can keep all the cores spinning like this is somehow what the customer needs. Like Kebabbert recommends, they will try and have you restructure your whole testing to try and keep those wheiner cores spinning, even if this in no way reflects what is actually happening in your environment. What the customer actually wants is his usually-single-threaded application to respond as fast as possible. Sun did try and up the amount of cache on T2+ compared to the earlier designs, but in order to try and keep it anywhere near price-competitive and to avoid pushing the power requirements up, T2+ is capped at 4MB of L2 cache, and with a poor means to use even that amount. In the webserving niche Niagara is not too badly handicapped here as most people will expect a delay and attribute it to Internet delay or just the page's graphics loading, but in a business scenario where one system is talking to another the delay in handling single-threaded or heavy-threaded apps is just not acceptable when the competition will smoke through the task a lot quicker.
Adding more cache would help Niagara, but a proper cache design and a larger cache would help it a lot more. Stating that Niagara would perfrom just as well with no cache is frankly the type of statement that could only be uttered by someone with their head firmly in the sand. The same type of person that just cannot see that a vendor's benchmark is liable to have little bearing on how a server will perform in the real World. Try again, newbie!
/SP&L
Mattie Pattie Laddie
Well, that is not true. Kebabbert comes from Kebab - Bert. Bert is taken from Dil-bert, pay homage. Kebab is the same thing as Kabob or however you spell it, in the UK.
Regarding my manlyness, at least I dont go around spreading lies and FUD as you do. Therefore I consider myself more manly than you. Fair fight, and no lies is manly, yes?
Anyway, I think that maybe you should stop claim that Niagara is slow because it suffers from a small cache. This is simply not true, as benchmarks and many testimonies show (if you google a bit). Ive tried to explain why Niagara doesnt need a large cache, it only took 10 posts or so for you to understand, reiterating the same thing over and over again. Can you understand that people may find you annoying? Especially when you claim things about which you have no clue?
If you were right, if Niagara actually were slower than Power6, then I would say nothing. But you claim that Niagara is slower, despite all benches showing the opposite. That is just weird of you. I really dont understand how you reason. The proofs shows something, and you claim the opposite. That is not really sound logic? It is like, if it is raining hard outside, and you claim "no, it doesnt rain" - but if you look out you DO see the rain. Or, if a parrot is dead, and you claim "no, it isnt dead, the parrot is only sleeping". That is just strange reasoning.
By the way....
One of the guys here has just told us that a kebabbert is a slang term for a "typical Chihuahua owner", that is a rather less than manly fashion victim. Can't say I'm surprised!
Liar Matt Bryant, where art thou?
I want to see how you are going to wriggle yourself out of this.
Liar Mattie, what is the reason of using a cache? Do you know that? Let me tell you; the reason of a cache is to be able to quickly access data. This requires the data to already be in the cache. Either
1) since earlier access.
2) prefetch logic has prefetched new data that is believed to be used soon.
Now the cache is small, compared to all data that gets accessed all the time (thousands of clients user data, AIX kernel data, a server is not likely to run any GUI - so no GUI code will be cached, Oracle kernel data, etc etc). If the cache is too small, it gets emptied and refilled all the time. CPU spends time with filling cache with data, which will get swapped out soon when a new client is to be served. The data gets swapped out all the time, the data changes all the time. It is not a small data set which hardly changes.
How can a CPU use and utilize a cache under this circumstances, with ever changing data? It can not. You still havent understood, why this is the reason a desktop CPU performs very bad on server work loads?
Liar Mattie Laddie, how about you do some basic computer architecture courses, instead of me spending my precious time lecturing you? I have no problem with lecturing others, but when the student doesnt understand, despite several explanations you start to wonder. Dont you agree? You explain once, twice, thrice, etc etc etc - and still he doesnt understand basic concepts. What would you think about such a student?
No, I have a better suggestion, instead of you doing several basic courses, why not doing the same course several times instead? I doubt you will understand only one explanation?
Mattie Laddie, what do you say?
