Feeds

Study: Most projects on GitHub not open source licensed

Kids these days, they just don't care

The Power of One Brief: Top reasons to choose HP BladeSystem

Code-sharing website GitHub has grown so popular that it and open source are practically synonymous for many developers. But new research shows that most of the projects now on GitHub are released under license terms that are unclear, inconsistent, or nonexistent, leaving their legal status as open source software uncertain.

That's according to Aaron Williamson, senior staff counsel at the Software Freedom Law Center, who presented some of his findings on the matter at the Linux Collaboration Summit in San Francisco on Wednesday.

Williamson first became interested in software licensing trends on GitHub after reading a somewhat profane Twitter post from Redmonk's James Governor in 2012, to the effect that today's young developers can't be bothered to deal with the complexities of open source licensing and governance.

At the time, Governor's post inspired much debate on Twitter and beyond. But was what he said true? Are younger developers on GitHub really less likely to specify clear licensing for their projects than earlier generations of coders? Williamson decided to find out.

To that end, he wrote a Python script that continuously polled GitHub, looking for license files. He then ran those files through FOSSology, a tool developed by HP and some others that can identify software licenses by the specific language and phrases contained in them.

Williamson was quick to point out that his study was by no means scientific, nor was his data set complete. GitHub's APIs throttles the number of requests you can make per hour, so Williamson was unable to poll the entire archive – in fact, he only made it through the oldest 28 per cent of the repositories. He's also fairly certain that he missed some licenses and that there were some errors and duplications in the data. Still, his results are eye opening.

Licenses? Bah, who needs 'em

According to Williamson, out of the 1,692,135 code repositories he scanned, just 219,326 of them – 14.9 percent – had a file in their top-level directories that identified any kind of license at all. Of those, 28 per cent only announced their licenses in a README file, as opposed to recommended filenames such as LICENSE or COPYING.

Equally interesting, Williamson found that developers with projects on GitHub tend to shun so-called copyleft licenses such as the Gnu General Public License (GPL) – which require modified versions of the software to be released under the same license as the original – in favor of more permissive alternatives.

  Chart showing license use among GitHub projects  

Most developers on GitHub seem to prefer permissive licenses to the GPL (Source: Aaron Williamson)

Naturally, the GPL was still well represented. Williamson found some 61,000 projects that were licensed under some version of the GPL or Lesser GPL. But his scans turned up fully twice as many projects that were released under either the MIT, BSD, or Apache licenses, none of which are copyleft licenses.

Williamson added that although his data was just a snapshot, and therefore couldn't be used to establish any trends, data gathered by Redmonk does indicate an overall trend toward permissive licensing for projects written in many different languages.

Just why that is wasn't clear. But Luis Villa, deputy general counsel at the Wikimedia Foundation, has suggested that younger developers may be choosing more permissive licenses as a way of pushing back against what they see as a "permission culture." They prefer to let other developers just do whatever they want with their code – and, rightly or wrongly, this might be a reason why many projects are released with no license whatsoever.

Even when GitHub repositories included licenses, however, Williamson also found a lot of projects where the licensing was unclear. For example, many projects claimed to be licensed under "the Ruby license," but Ruby's licensing has changed over time, making it difficult to figure out just what the terms are for any specific project if they aren't stated explicitly.

Still other projects offered terms that were inconsistent; for example, a program that claimed to be licensed under the GPL but "for non-commercial use only," which contradicts the GPL's terms.

Seven Steps to Software Security

More from The Register

next story
KDE releases ice-cream coloured Plasma 5 just in time for summer
Melty but refreshing - popular rival to Mint's Cinnamon's still a work in progress
NO MORE ALL CAPS and other pleasures of Visual Studio 14
Unpicking a packed preview that breaks down ASP.NET
Secure microkernel that uses maths to be 'bug free' goes open source
Hacker-repelling, drone-protecting code will soon be yours to tweak as you see fit
Cheer up, Nokia fans. It can start making mobes again in 18 months
The real winner of the Nokia sale is *drumroll* ... Nokia
Put down that Oracle database patch: It could cost $23,000 per CPU
On-by-default INMEMORY tech a boon for developers ... as long as they can afford it
Another day, another Firefox: Version 31 is upon us ALREADY
Web devs, Mozilla really wants you to like this one
Google shows off new Chrome OS look
Athena springs full-grown from Chromium project's head
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Application security programs and practises
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Securing Web Applications Made Simple and Scalable
Learn how automated security testing can provide a simple and scalable way to protect your web applications.