Sun delivers Unix shocker with DTrace
It slices, it dices, it spins, it whirls
Analysis Try to imagine a geeky version of famed salesman Ron Popeil . Keep Popeil's exuberance, keep his pitchman savvy and keep his verbal overflow. Then erase his age, sturdy frame and Ronco Food Dehydrator and replace all this with a young, lanky kernel engineer hawking something called DTrace, and you have Bryan Cantrill.
Cantrill is one of three Sun Microsystems Solaris engineers who developed DTrace over the course of several years. As far as we can tell, he is the most energetic member of the bunch. It would be hard to be more energetic. While in China recently, we witnessed what you might call a "Cantrill explosion" take place in front of about 40 Asian server administrators. Using a style that combined vigorous moves between laptop and projection screen with a manic delivery, Cantrill managed to extol the virtues of DTrace - a tool which has revolutionized system instrumentation, so we are told.
"It is not the technology that Sun claims," said Jarod Jenson, chief systems architect for Aeysis in Houston. "(Sun) is being far too modest. ... DTrace has completely changed the way I do business."
DTrace is one of the key additions being made to Sun's flagship operating system in the upcoming release of Solaris 10. And it's one of those rare items that appears to work as billed. The software gives administrators thousands upon thousands of ways to check on a system's performance and then tweak the box while it's still running. What's unique about DTrace, beyond its ability to be used on production boxes with minimal system impact, is that it can help fix problems from the kernel level on up to the user level.
"With the exception of system calls, the tools - such as they exist at all - are ad hoc, and at best designed for developer use," Cantrill said. "For example, there is no tool anywhere that allows for arbitrary dynamic instrumentation of a production operating system kernel.
"And the tools that allow for arbitrary user-level instrumentation are largely research systems. And there is nothing - absolutely nothing, research or otherwise - that ties together user-level and kernel-level instrumentation. This is part of the reason that the reaction is so strong among the users you interviewed - DTrace is a quantum leap over previous tools."
We interviewed Cantrill and fellow DTrace designer Adam Leventhal. A lack of modesty about their invention is a shared trait, but their attitude comes off more as Popeil-like pride than ego rub. Mike Shapiro is the third brain behind DTrace, but we've yet to have the pleasure of meeting him. (In photo, Shapiro is on the far left, Cantrill in the middle and Leventhal on the right.)
The three Solaris engineers are enthused for obvious reasons. By most user accounts, DTrace reduces the time performance analysis takes from days down to hours. It also gives Sun customers a way to hold software vendors accountable for underperforming code. When the Oracle and BEA reps throw their hands in the air, the BoFH can step up with pinpoint performance data, allowing blame to be placed where it belongs.
"I really think it is amazing," said Vlad Grama, a sys admin and student at the University Politehnica Bucuresti. "Basically, if you know the OS enough, you can do with DTrace what all other Solaris tools ( vmstat, truss, sar, lsof, process accounting) do and much more. You can get data at high-precision intervals and monitor kstats, system calls and better yet functions in user-processes. Plus you can monitor only the processes or calls you're interested in so that the monitoring impact is insignificant."
Or perhaps you prefer a real world confession from Brendan Gregg, a Unix developer in Sydney.
I first used DTrace to examine disk I/O, in realtime, in detail. Beforehand, I was using kernel debugging via prex to do this, which was both ugly and had limitations. I was solving the long standing problem of identifying disk I/O by process - we can all spot a process hogging the CPU, but there is no easy way to spot a process hogging the disks. My first programs with DTrace provided a %I/O column to "ps -ef", and a snoop style command for disk I/O events.
I've also used DTrace to record a memory event for a program I was developing that had memory usage issues. Beforehand, I had been adding breakpoints to pause the program to examine memory. It was time consuming, and I was worried my breakpoints were not in the best places. Instead, I've used DTrace to print out the memory usage profile, which I then graphed. Rather than a lot of effort to generate a graph of three points, DTrace has given me a graph of a hundred points that leaves nothing to the imagination. It did more than just help my program, it helped me understand memory allocation so that I can become a better programmer.
DTrace's inventors say admins need "to have a good relationship with their brains" to use the software best. Although, the users tended to say DTrace can work well for just about any administrator. This is, in part, because DTrace combines old instrumentation tools and invents new ones and then puts them all together in a single package. Users have more places to instrument a box and an easier way to do so.
"Actually, I think the learning curve for Dtrace is much nicer than for other tools," said Thomas Nau, director of the infrastructure department at the University of Ulm in Germany. "This is first because it hides a number of details if you don't request them and immediately creates high quality reports (including averages, quantization, ...). The second reason is that from it's programming point it's very C'ish. Obviously, of course, this also means that it helps quite a lot to know C."
Sun sees DTrace as a big advantage for Solaris over other versions of Unix and Linux. But a recent uptick in open source Solaris  talk makes one wonder how long Sun will keep a tight hold on its Solaris IP. Sun's President Jonathan Schwartz offered an answer on this to The Reg.
"We think the early success of DTrace is yet more evidence that customers care about innovation - and that's why we continue to invest in Solaris, on both Intel/AMD and SPARC/SPARC64," he said.
"The real question for us isn't whether we'll continue investing - that should be obvious. The question is whether we'll keep innovations like DTrace wed to Solaris, or move the industry forward, as we have with Project Looking Glass, with a more contribution-minded approach."
Right. That's the question.
"I can't tell you where we'll end up - we're in the midst of working that out with the community."
Oh, come on, tell us.
"But I can assure you the entirety of Sun Microsystems is returning to its roots, toward an aggressive engagement with the entirety of the open community. Those of us investing in innovation have nothing to fear - those that are simply resellers or repackagers are going to have a hard time keeping up."
So there you have it. DTrace may or may not end up in the public domain. Glad that's settled.
Sun is backing up the Solaris engineers' promise that DTrace will not take down or hurt a production system in any way. This is good news for customers who might be looking to save costs on test environments. DTrace should cut down on the need to waste time and money creating copies of production systems and trying to force problems on the kit for performance analysis.
In total, the software takes customers one step closer to the fabled utility computing. A modern performance analysis tool, working fast on running systems to increase overall compute capacity. Simple. In addition, it helps save on costs by cutting down on test hardware and by making is possible to get ISVs to fix problems with their code.
Or as Cantrill would say, It slices, it dices, it spins, it whirls.
DTrace is currently available via Sun's Solaris Express  early access program. More information is also available here . The tool only works with Solaris 10, which becomes generally available in January of next year. ®
Here are a couple of last DTrace uses from Aeysis' Jenson.
I looked at one customer's application that was absolutetly dependant of getting the best performance possible. Many people for many years had looked at the app using traditional tools. There was one particular function that was very "hot" - meaning that it was called several million times per second. Of course, everyone knew that being able to inline this function would help, but it was so complex that the compilers would refuse to inline.
Using DTrace, I instrumented every single assembly instruction in the function. What we found is that 5492 times to 1, there was a short circuit code path that was taken. We created a version of the function that had the short circuit case and then called the "real" function for other cases. This was completely inlinable and resulted in a 47 per cent performance gain.
Certainly, one could argue that if you used a debugger or analyzer you may have been able to come to the same conclusion in time. But who would want to sit and step through a function instruction by inctruction 5493 times? With DTrace, this took literally a ten second DTrace invocation, 2 minutes to craft the test case function, and 3 minutes to test. So in slightly over 5 minutes we had a 47 percent increase in performance.
Another case was one in which we were able to observe a high cross call rate as the result of running a particular application. Cross calls are essentially one CPU asking another to do something. They may or may not be an issue, but previously in was next to impossible (okay, really impossible) to determine their effecs with anything other than a debug version of the kernel. Being able to correlate the cross call directly to application was even more complex. If you had a room full of kernel engineers, each would have theories and plausible explanations, but no hard quantifiable data on what to do and what the impact to performance would be.
Enter DTrace.... With an exceedingly simple command line invocation of DTrace, we were able to quickly identify the line of code, the reason for the cross calls, and the impact on performance. The basic issue was that a very small region of a file was being mmap(2)'d, modified, msync(3C)'d, and then munmap(2)'d. This was basically being done to guarantee that the modified regoin was sync'd to disk.
The munmap(2) was the reason for the cross call and the application could get the same semantics by merely opening the file with O_DSYNC. This change was made and performance increased by almost double (not all from the cross calls, but they were the "footprint" that lead us down this path). So we went from an observable anomaly that previously had no means of analysis to a cause and remediation in less that 10 minutes.