Talking pizza and packets with Samba's Tridge
Interview Much in the same way that Cisco founders Sandy Lerner and Leonard Bosack invented the router so they could send emails to each other across the Stanford University campus, Andrew Tridgell just wanted the three computers on his home network to talk to each other. The three computers, a PC running DOS, a Sun workstation, and a DECstation 3100 running Digital Unix, needed a common protocol that all could understand. Hacking on what he thought was a proprietary protocol of a DOS-Unix program called Pathworks, Tridge (as he's known) accidentally found himself reverse-engineering the heart of Microsoft's networking, the SMB protocol.
The Server Message Block protocol, more commonly known on most versions of Windows as Client for Microsoft Networks, worked hand in hand with NetBIOS naming to make up the cornerstone of Microsoft's networking, most notably the browsing through Network Neighborhood or Entire Network folders. Enabling Linux, and subsequently other flavors of Unix, to speak SMB has been integral in getting Linux into many companies. Since the 2.0 release of Samba (as this open source project came be known) Linux machines can even function as a Primary Domain Controller or Backup Domain Controller in a Windows NT Domain.
Microsoft has since moved on -- its networking infrastructure since the release of Windows 2000 Server has been built Active Directory, a cocktail of different protocols that use SMB and NetBIOS primarily for backward compatibility. Samba has evolved too. I caught up with Tridge at his home base in Canberra, Australia, for a virtual chat to talk about Samba, Linux, and international phone calls.
Q: Since Samba is still probably the best known pizzaware application, for the record, how much pizza did the Samba team end up accumulating? Are there still new pies rolling in?
The whole pizzaware thing started as a joke, but soon took on a life of its own. Many years ago I included the following paragraph in the README for Samba:
You could also send hardware/software/money/jewelry or pizza vouchers directly to Andrew. The pizza vouchers would be especially welcome.
I never expected anyone to actually do it! Soon after that an enterprising person managed to convince pizza hut in Australia to issue some pizza vouchers (which they didn't do at the time). I used the first vouchers to feed the assembled hordes at a Canberra Linux User Group meeting.
After that people found all sorts of ways to send pizza. I received cans of pizza makings, pizza vouchers, a 'pizza account' at a local store, GIFs of pizzas via email and even a nice little origami pizza. I'm not sure how many I received in total, but it was quite a few!
The best lot was when someone rang up from Germany and quoted his credit card number to a local pizza place for $300 of pizza. That's a lot of pizza in Australia, and it took my wife and me many months to get through it. At the time we were both students with an almost zero combined income so it was very welcome!
It's dropped off a lot in the last couple of years and I haven't received any vouchers at all for months. That's probably a good thing as my wife is now sick of pizza and I need to lose some weight! I'm also now employed in a good job so I can afford to buy my own pizza if I feel a pizza craving.
Q: According to the Samba web site, the Samba Team currently is looking for donations. Where does the bulk of the funding for Samba currently come from?
The team doesn't need a lot of funding itself, as our only significant expenses are travel expenses for team members (especially students) who can't afford to pay their own way to the two yearly conferences, and the hosting expenses for the main samba.org server.
Luckily IBM very kindly offered to pay for all our hosting expenses as of last October, so the only remaining expenses are travel. That is what we use the donations for.
We received a US$25,000 prize at the LinuxWorld conference a couple of years ago and that helped a lot in giving us a bit of travel money. Apart from that and the donations the team has no other source of revenue.
On the other hand, the biggest (and most appreciated) contribution isn't money, it's time! Quite a few companies employ Samba developers, documenters, and testers and allow them to contribute to Samba on company time. That makes a huge difference to the amount of progress we can make.
Q: How many people currently work on Samba?
That's quite hard to say. There are currently 27 people on the 'Samba Team,' which means there are that number of people with direct CVS commit access to the code and documentation. Not all of those people are active developers, but there are also lots of other people who work on Samba in some way but aren't team members.
I'd guess there are probably about 30 to 50 people who have written some code that has gone into Samba in the last year, and maybe 10 to 15 people who commit something regularly.
Q: At this point does the SMB protocol still hold any surprises? Or have you documented it enough that you primarily work on other issues surrounding it?
Oh yes, we discover new things about SMB all the time. We mostly just commit the necessary change or addition to the Samba source code rather than maintaining formal documentation, although we often also add a new test to the Samba test suite, so we can be sure that future versions of Samba also get that corner of the protocol right.
Q: On the face of it, making sense of Active Directory seems like it should be easier than Microsoft's earlier networking, since it's made up of modified versions of documented protocols like DNS, LDAP, and Kerberos. What have been the challenges in making Samba compatible with Active Directory and how close is Samba to functioning as an Active Directory Domain Server?
You are right that Active Directory is better designed than the earlier NT4 domain infrastructure. There are still quite a few interesting challenges however.
One thing that surprised us a little is how much Active Directory still relies on the old Microsoft Remote Procedure Call (MSRPC) transport (the one used in NT4) rather than doing everything with LDAP. When I first started investigating Active Directory I assumed that Microsoft would have moved as many of its calls as possible to LDAP, as LDAP is just so much more flexible. We're finding that Microsoft still make extensive use of MSRPC in its clients, even though there is usually an equivalent way of making the same query via LDAP.
One of the biggest advantages of Active Directory for Samba has been the increased flexibility of LDAP calls in the 'winbindd' daemon. In the old MSRPC-based NT4 domains there was a fixed set of remote queries that you could make to the domain controller. While this set of queries is quite large, we sometimes found that it didn't include exactly the query we needed, forcing us to use a nasty compromise.
A good example is when 'winbindd' needs to fetch a list of users from the domain controller along with each users primary group, which is needed for the very common getpwent() POSIX call. To do this properly with an NT4 domain we need to make a separate query to the domain controller for each user to fetch the primary group ID, which meant a huge amount of network traffic and latency cost. With LDAP the client gets to choose what fields are returned in a query, so we can just send a single query that says 'please send me a list of all users and their primary group IDs.' That is vastly more efficient.
Another challenge we are facing with the advent of Active Directory in Samba is a greatly increased tie-in with other free software projects. Previously Samba was fairly independent, and didn't rely on many non-core operating system services to function in a reasonable fashion. With Active Directory that all changes, as we are now very dependent on the exact functioning of a Kerberos, LDAP, and DNS library. That gives the potential for problems of matching Samba releases to the releases of other packages, which can be quite painful.
Despite these challenges we are pleased to say that the Active Directory support in Samba 3.0 is coming along very nicely. The alpha releases of Samba 3.0 are already being used in production in quite a number of companies as a Active Directory domain member, and two vendors that I know of have shipped the alpha releases in their appliance products. The reports back from users have been very positive.
The work to make Samba an Active Directory domain controller is in a much earlier stage of development. A few developers (in particular Jim McDonough and Anthony Liguori) have been doing a lot of work on this and while their progress is encouraging, I don't think you'll see something that could be used in production for a few months at least.
Q: One of the concerns in implementing support for Active Directory must have been the potential of corrupting the Active Directory database, since each Domain Server has a complete copy. Has this been a problem, or is it even an issue?
This hasn't really been a problem so far. While developing the 'ADS domain member' code we're just doing simple LDAP queries to a existing Microsoft ADS domain controller, and those queries were not likely to corrupt the LDAP database.
The effort to develop an ADS domain controller has concentrated on creating a standalone domain controller at this stage, rather than mixing a Samba domain controller and a Microsoft domain controller for the one domain. That means the potential for corruption really is minimised.
Once we have mixed Samba and Microsoft domain controllers on the one domain this could well be something we need to watch out for, but that isn't a top priority just yet.
Q: I'm not sure about Australia, but in the US more than 50% of businesses are defined as small businesses, many of which have little to no IT administration and primarily use their LANs for simple file and print sharing. While SMB and NetBIOS are less than optimal for many uses, for a small peer network where the goal is user-friendly LAN browsing and peer resource sharing, this combo has worked pretty well. Seeing as nobody really wants SMB or NetBIOS anymore, what do you think can replace this combo in the future?
First off I should clarify a bit what you mean by SMB and NetBIOS. The terminology in this space can be rather confusing!
Some people think of NetBIOS as a transport, at about the same level as TCP and UDP. I prefer to call that NetBEUI, as it better distinguishes it from 'NetBIOS over TCP/IP' or NBT, which is what Samba implements.
A few years ago, a fairly large proportion of people who were doing Windows file sharing on their home or small office networks were using SMB over NetBEUI. As you have pointed out, this was great for small LANs because of the minimal administration needed -- you didn't need to assign each computer an IP address. A lot of people used SMB over IPX/SPX for the same reason.
It seems that nearly everyone uses SMB over TCP/IP these days, partly because of things like DHCP that make the administration of an IP network much simpler. With the recent advent of zero-configuration systems and cheap, fully automated home gateways that get rid of even the small amount of administration needed for DHCP, I think that the last reasons to use NetBEUI or IPX/SPX are disappearing.
Some people are also a little confused about comparisons between SMB and CIFS. In recent years Microsoft has been referring to the file sharing protocol that it uses as CIFS, and talks about SMB as though it is an older protocol.
The basic fact is that SMB and CIFS are the same protocol. The name change was largely a marketing move to include the word 'Internet' in the protocol. Since the name change there have been a few minor changes in the protocol, but nothing really major. The changes between versions of the older SMB protocol were larger.
Just to add more confusion, Microsoft has recently started to use a new nomenclature. They now seem to use three terms:
- They use SMB for the core pieces of the original protocol.
- They use CIFS for the protocol with minor changes.
- They use Microsoft SMB for the union of the core pieces of the protocol with some other bits they they consider somehow to be their sole domain.
Of course, Samba implements all of the above. Our aim is to provide seamless interoperability with all Microsoft clients, and in order to do that we can't restrict ourselves to some artificial subset of the protocol.
As to your final question as to what might replace SMB and CIFS eventually, that is very hard to say. I really expect SMB and CIFS to be around for a long time to come, although it would not be surprising for Microsoft to start pushing some alternative protocol at some stage. A couple of possibilities have cropped up over the years (such as WebDAV), but none of them have turned into serious contenders for the Windows LAN file sharing market.
Q: OK, to clarify my last question regarding LAN browsing, we'll take the example of a small LAN of four computers with no real IT administration. As you pointed out, DHCP and inexpensive routers make it easy to set up a LAN like this with little to no configuration. The missing gap in this situation, as I see it, is a protocol that allows computers to have names instead of IP addresses and can announce/respond to local network broadcasts so that users can easily browse local resources. If at some future date there were a desktop version of Linux, say for lack of a better example Lindows, and this network were all Linux, would Samba be the best solution?
Samba (and more specifically the NBT protocol portion of Samba) would be one solution, but there are others. One method is to have a smart DHCP server that talks to a DNS server to automatically register names of clients in DNS when they grab an address with DHCP. This is quite a common setup, and while it can be a bit fiddly to configure this using standard tools I imagine that home gateways might start to support this kind of feature soon.
Another alternative is the Service Location Protocol, as implemented at OpenSLP. It's hard to know if this will take off or not.
There are problems with using Samba and NBT for this sort of name resolution that make me a bit reluctant to say that it is the right solution for non-Windows LANs. The limitation to 15-byte names, the poor handling of non-ASCII characters, and the global flat name space are all significant issues with NBT. It does have the short-term advantage of being relatively simple and already having a stable free implementation, but I'm not sure if that is enough to recommend it as a longer-term solution.
Q: What's the overhead of SMB compared to similar protocols?
SMB is a very large protocol and because of this there is considerable overhead in just fitting all the code needed to implement it into memory. That means that embedded SMB servers tend to need more memory than would be needed for simpler protocols, but with memory being so cheap these days that is not much of a problem.
For straight file transfers, the performance overhead of SMB is quite low on a typical small server on a LAN. Certainly it would be possible to design a lower overhead protocol but the difference would be quite small in typical usage.
For more specialised setups, such as when two machines have high-end fancy Gigabit Ethernet cards or are connected by some sort of high performance network setup like is used in expensive supercomputer clusters, the overhead of SMB would be much more noticeable. This isn't really a problem though, as in those cases you would not expect to be using a general purpose file sharing protocol anyway; you would use something tailored to the network architecture of the system.
Perhaps the most important overhead in SMB as it is implemented in Samba isn't really inherent in the protocol itself, but is instead a result of the fact that SMB semantics are not a good match for POSIX semantics, so Samba spends a lot of time mapping between POSIX and SMB. A good example is the work that Samba has to do to provide case-insensitive file names to Windows clients. That is really expensive to do on POSIX systems and is one of the biggest overheads that Samba has to deal with. Luckily there are some moves towards adding support for non-traditional semantics in Linux that will really help to lower this overhead in the future.
Q: You've often advocated NFS over SMB in the past. Is that still your position? And in what instances do you think NFS should be used as substitute for SMB?
I have advocated NFS over SMB for specific applications, not as a general solution. For example, I recommend that people use NFS between two Linux systems rather than using the smbfs client and Samba. I definitely don't recommend NFS for anything that involves Microsoft clients or servers. The tight coupling of SMB into all the Microsoft operating systems really makes it the only sane choice in those cases.
The really nice thing about NFS is its simplicity. It is a tiny protocol, and works really rather well between Unix systems. It does have some very poor points though, in particular the fact that current versions of NFS play dangerous games with coherence semantics.
Users of NFS servers just put up with the fact that they save a file, type make and nothing happens. They then take a sip of coffee, type make again and a build starts happening. This is a symptom of NFS using something called attribute cache times, which is basically a kind way of saying that NFS thinks that any information less that a few seconds old must still be correct. If you think about what can happen in a few seconds on modern machines then you will soon realise how foolish this idea is. You are really playing rather dangerous games with your data.
That will all change with NFSv4, which is a radical overhaul of the NFS protocol. In many ways NFSv4 is much closer to the SMB protocol than it is to NFSv2 and NFSv3. Unfortunately that means that NFSv4 is no longer a small protocol, but at least it is more cleanly designed than SMB.
When the NFSv4 protocol spec first came out I thought that it stood a good chance of being the main file sharing protocol of the future, but now I'm not so sure. A core problem seems to be that so much of the protocol is optional that you can have two implementations of NFSv4 that can't talk to each other in any reasonable fashion as each implements a different optional subset of the protocol. That makes it difficult for the protocol to really take off.
Q: Your résumé reads like a mini-history of Linux, starting in academia, moving to revenue-challenged Linux startups (Linuxcare and VA Linux), and now working at an established company (Quantum) presumably working on Linux enterprise software. While you have many other qualifications, have you been hired to work on Samba, more or less professionally, for the past couple of years?
It's even worse than you think -- I've changed company again! I left Quantum when they shut down their NAS division in October last year. This was rather similar to what happened a year before when VA shutdown its NAS division and I was laid off along with everyone else. I seem to have a knack for picking jobs that don't survive long, or at least it feels that way.
I recently joined IBM in the Almaden research labs, working remotely from the OzLabs lab in Canberra. I'm pretty sure that this time around I'll be there for a lot longer than one year! I'm having lots of fun doing NAS work, particularly in a research environment.
You are right that working on Samba has been an important part of my professional life for the last couple of years, but it hasn't been the only thing I've worked on. At each of the companies I have joined I have tended to do whatever needed to be done, from Linux kernel drivers to building systems and even writing a few bits of GUI code. I usually manage to spin bits of these off as new pieces of open source software, which is why I've released things like ccache , tserver and trd while working for these companies.
Each of these packages arose from problems that needed to be solved while helping to build products for the various companies I have been employed by. It is really nice to see that these are now being used by other people as well. I get a nice warm feeling knowing that I am able to help other people in the free software community by releasing these little bits of code.
Q: The Samba project seems like a good candidate for open source/private enterprise symbiosis, where if a company needs to interact with a Windows network they can hire a programmer to work on the Samba project and ideally get a little added expertise in implementing a Samba-based solution. Has the project actually worked this way?
In the early days of the project this didn't happen much, with almost all the development effort being put in by people on a purely hobby basis. Over the last few years that has changed a lot, with many Samba developers now employed by companies that are using Samba in a product or deriving revenue in some other way from Samba. The relationship between these companies and the Samba Team is quite informal, but extremely productive. The individual people tend to be hired to work on some particular aspect of Samba that the company cares about, but because each company has a different set of needs the overall effect is quite a well-balanced development team.
Q: Is Samba 100% compatible, no exceptions, with Windows NT 4.0 domains at this point? Can it be used interchangeably with Windows NT 4.0 Server as far as network interoperability is concerned?
Nothing is 100% compatible! Even different releases of Windows, or different service packs, are not 100% compatible, as any long-term Windows administrator knows only too well.
I can say that Samba is a very good SMB file and print server. It provides excellent compatibility with Windows and is used by a huge number of sites to provide highly reliable file and print services to Windows clients.
There are of course more things to do, and we are constantly improving Samba to provide a wider range of features or to improve performance. A good example is the fact that right now you can't mix Samba domain controllers and Microsoft domain controllers on the same domain. Most people don't want to do that, so it isn't a major problem, but it is something that we would like to be able to offer in the future and for some sites it is an important enough feature that they can't use Samba without it.
Q: Samba went through a bit of a security scare recently. Should computers connected directly to the Internet, without a firewall, be running Samba?
As with any network service you should only expose it to the Internet if you need to. There is nothing unique to Samba in this regard. You might think that because Samba is quite complex it poses a higher risk than other protocol servers, but this is offset by the fact that lots of people have inspected Samba for security holes, just as did the SuSE person who found the recent bug.
As part of the announcement of the recent security hole I wrote a small document on how to reduce your exposure by telling Samba to only allow connections from inside your network. If you follow those instructions then you will definitely reduce your risk.
Q: Windows, justified or not, has the reputation of being insecure. Does Samba inadvertently open *nix systems to the vulnerabilities that Windows systems have become notorious for?
No, there is nothing inherently insecure about a Samba installation. It is certainly true that the protocols that Samba implements don't have the level of paranoia built in that are inherent in protocols like SSH, but that doesn't imply that breaking into Samba is easy. There is also no reason to think that if Windows has a security hole that Samba will necessarily have the same hole. In fact, I can't think of a single case where a security hole in Windows has also been found in Samba. You tend to get holes in two programs at once when they share some source code, and of course Samba shares no source code with Windows so that just isn't an issue.
Q: Given your job history, have you gained any insight or opinions on how open source projects can secure funding?
I'm always rather skeptical when open source projects emphasize their funding rather than the technology. Most projects prosper with little or no funding, and I have noticed that projects that start off with funding as one of their core priorities in the early days of the project tend to wither and die quite quickly. The free software community thrives on the enthusiasm of individual contributors rather than on any particular funding model.
That doesn't mean that some funding isn't useful, and may even be necessary for the survival of some projects. The trick is to find companies or organisations that will benefit from the project, and approach them with a reasonable proposal that has a clear business case from their point of view. For example, that might mean that they employ someone who is working on the project, or fund someone to attend a conference or pay for some service such as Web hosting. All of these can be very useful contributions, but be careful you don't get into the position where the search for funding takes up more time than you spend on other parts of the project.
Q: How closely does the Samba project work with other Windows interoperability projects, such as Wine and Mono?
Not very closely at all. There are occasional discussions between members of the projects, but it tends to be very ad hoc. It's possible that it might increase in the future, especially if we start sharing a common piece of code. One candidate for that is the WIDL compiler in the Wine tree, which has interesting possibilities for the Samba MSRPC code. If we started using WIDL then I imagine we would start talking to the Wine developers a lot more.
Q: Having managed one of the better known open source projects for quite a while now, what advice can you offer aspiring project managers on how to supervise open source projects?
I have a very hands-off approach to project management, partly because I'm lazy and partly because I find that it works well. Managing an open source project is quite different from management in a traditional business because you can only ever ask people nicely to do things. You can't force anyone to do anything they don't want to do, as they are almost always working on the project voluntarily.
In fact, it would be hard for me to point at any particular action I take on a regular basis that could be called 'management.' All I do is participate in technical discussions and write code. My position as the original author of the project tends to give me a little more authority in those discussions than other people, but if I abused that position by forcing things in an unreasonable way then that authority would very quickly be eroded.
Some other free software project leaders tend to be much more proactive in how they manage the projects, and that's fine if it works for them. It's just not something that appeals to me.
Q: In an era where most people won't even put email addresses on Web pages, you list your home phone number. Why is this, and what kind of calls have you gotten?
There seems to be a strange reluctance to make international phone calls, despite the fact that they are really quite cheap for most people. When I first put my phone number up on my home page I thought that perhaps I'd get inundated with calls about Samba, just like I get huge numbers of Samba emails. In practice I tend to get one call every couple of weeks, and usually from people who really do need help urgently and have a good reason to call. Some of my most interesting discussions about Samba use in businesses have come from these phone calls, so I've left my phone number up. If it ever becomes a problem I can always remove it.
Q: Is there one Linux distribution or configuration you use more than others?
I only use two Linux distributions regularly, Debian and Red Hat . I prefer Debian, and run Debian 'unstable' on my development machine, updating every couple of weeks.
On some other machines I run Red Hat, usually because I need to share the administration with someone else who is more familiar with Red Hat. A good example is the main samba.org server, which runs Red Hat, but using the APT for RPM add-on to allow sane remote updates using the Debian apt system. I chose Red Hat for samba.org because it is located in a data center on the other side of the world, and if anything ever goes wrong I want to be running a distribution that the data center engineers are familiar with.
Q: While Linux in definitely succeeding in many areas, with every success the bar is raised higher, and the challenges facing Linux as a whole become more complex, require more coordination and cooperation. What's your opinion on the health of the Linux community and do you have any concerns about where Linux is headed?
I think that the Linux community is thriving, and will continue to thrive in all the ways that really matter. It's less certain that the companies trying to make a commercial success from Linux-based businesses will succeed, but luckily their success or failure will not prevent the parts of the Linux community that I really care about from continuing. From that point of view I think that the future of Linux is very bright indeed.
Q: What is the state of Linux in Australia?
We haven't quite got to the point of declaring ourselves to be a separate state, but apart from that Linux is doing extremely well in Australia. The Linux user groups are still growing well, and there are more and more indications of Linux taking off as an accepted platform in government departments and businesses.
The recent conference in Perth was an enormous success and was particularly enjoyable as it is one of the few conferences to have resisted the temptation to emphasise sponsors and corporate involvement over technical content. That allowed the conference to attract a really great set of speakers which made for very interesting technical discussions, which is what a Linux conference should be about.
Q: You've said in the past that you expect one of your other projects, rsync, to outlast and eclipse Samba in the long run. Is rsync on pace to make this true?
My comments were really about the rsync algorithm and the ideas behind it rather than the tool itself. I still think it's true, because rsync and delta compression in general is an emerging technology which has enormous potential to impact on such a wide range of computing tasks. In this way it is quite different from Samba, because Samba only exists to interoperate with a very specific set of clients, and Samba doesn't really contain anything that is fundamentally new or exciting in computer science terms. This limits the long-term prospects of Samba quite a lot.
Q: It's taken a couple of years or so for Active Directory support to make it into Samba. Over the course of the project how would you rate Samba's ability to adapt to major changes in Windows' networking? Is it driven more by resources or by demand? And, lastly, have you learned enough that the compatibility gap between native Windows and Samba will shrink going forward?
The gap is definately shrinking and I am quite sure that trend will continue, especially given the plans we currently have for Samba4.
As for Active Directory member support, that actually got done in about six months work, it's just that we didn't start on it until well after Windows 2000 was released. You are absolutely right that the delay in starting on the ADS work was driven by resources and demand, the problem being that the NT4 domain support was good enough for most people, so adding direct ADS member support didn't really become a priority until we started hearing from people who were running networks that had all NT4 domain support disabled. At that stage ADS support became a much higher priority and it got done pretty quickly.
Q: In your recent interview with DeveloperWorks you joked that you hoped the need for Samba would someday go away. While this is more up to Microsoft than the Samba team, do you think this will eventually happen?
Yes, I think it will. The protocol won't last forever, although it is hard to say when it will come to an end. The use of SMB/CIFS is still expanding at the moment, but I fully expect that someday Microsoft will switch to something different for its primary file sharing protocol.
The other way that Samba could come to an end is for people to stop installing Microsoft Windows on their desktops. That would really be the best result, but I'm not holding my breath.
Sponsored: Benefits from the lessons learned in HPC