Feeds

Study: Most projects on GitHub not open source licensed

Kids these days, they just don't care

Beginner's guide to SSL certificates

Code-sharing website GitHub has grown so popular that it and open source are practically synonymous for many developers. But new research shows that most of the projects now on GitHub are released under license terms that are unclear, inconsistent, or nonexistent, leaving their legal status as open source software uncertain.

That's according to Aaron Williamson, senior staff counsel at the Software Freedom Law Center, who presented some of his findings on the matter at the Linux Collaboration Summit in San Francisco on Wednesday.

Williamson first became interested in software licensing trends on GitHub after reading a somewhat profane Twitter post from Redmonk's James Governor in 2012, to the effect that today's young developers can't be bothered to deal with the complexities of open source licensing and governance.

At the time, Governor's post inspired much debate on Twitter and beyond. But was what he said true? Are younger developers on GitHub really less likely to specify clear licensing for their projects than earlier generations of coders? Williamson decided to find out.

To that end, he wrote a Python script that continuously polled GitHub, looking for license files. He then ran those files through FOSSology, a tool developed by HP and some others that can identify software licenses by the specific language and phrases contained in them.

Williamson was quick to point out that his study was by no means scientific, nor was his data set complete. GitHub's APIs throttles the number of requests you can make per hour, so Williamson was unable to poll the entire archive – in fact, he only made it through the oldest 28 per cent of the repositories. He's also fairly certain that he missed some licenses and that there were some errors and duplications in the data. Still, his results are eye opening.

Licenses? Bah, who needs 'em

According to Williamson, out of the 1,692,135 code repositories he scanned, just 219,326 of them – 14.9 percent – had a file in their top-level directories that identified any kind of license at all. Of those, 28 per cent only announced their licenses in a README file, as opposed to recommended filenames such as LICENSE or COPYING.

Equally interesting, Williamson found that developers with projects on GitHub tend to shun so-called copyleft licenses such as the Gnu General Public License (GPL) – which require modified versions of the software to be released under the same license as the original – in favor of more permissive alternatives.

  Chart showing license use among GitHub projects  

Most developers on GitHub seem to prefer permissive licenses to the GPL (Source: Aaron Williamson)

Naturally, the GPL was still well represented. Williamson found some 61,000 projects that were licensed under some version of the GPL or Lesser GPL. But his scans turned up fully twice as many projects that were released under either the MIT, BSD, or Apache licenses, none of which are copyleft licenses.

Williamson added that although his data was just a snapshot, and therefore couldn't be used to establish any trends, data gathered by Redmonk does indicate an overall trend toward permissive licensing for projects written in many different languages.

Just why that is wasn't clear. But Luis Villa, deputy general counsel at the Wikimedia Foundation, has suggested that younger developers may be choosing more permissive licenses as a way of pushing back against what they see as a "permission culture." They prefer to let other developers just do whatever they want with their code – and, rightly or wrongly, this might be a reason why many projects are released with no license whatsoever.

Even when GitHub repositories included licenses, however, Williamson also found a lot of projects where the licensing was unclear. For example, many projects claimed to be licensed under "the Ruby license," but Ruby's licensing has changed over time, making it difficult to figure out just what the terms are for any specific project if they aren't stated explicitly.

Still other projects offered terms that were inconsistent; for example, a program that claimed to be licensed under the GPL but "for non-commercial use only," which contradicts the GPL's terms.

Intelligent flash storage arrays

More from The Register

next story
Nexus 7 fandroids tell of salty taste after sucking on Google's Lollipop
Web giant looking into why version 5.0 of Android is crippling older slabs
Be real, Apple: In-app goodie grab games AREN'T FREE – EU
Cupertino stands down after Euro legal threats
Download alert: Nearly ALL top 100 Android, iOS paid apps hacked
Attack of the Clones? Yeah, but much, much scarier – report
SLURP! Flick your TONGUE around our LOLLIPOP – Google
Android 5 is coming – IF you're lucky enough to have the right gadget
Microsoft: Your Linux Docker containers are now OURS to command
New tool lets admins wrangle Linux apps from Windows
Bada-Bing! Mozilla flips Firefox to YAHOO! for search
Microsoft system will be the default for browser in US until 2020
prev story

Whitepapers

Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Internet Security Threat Report 2014
An overview and analysis of the year in global threat activity: identify, analyze, and provide commentary on emerging trends in the dynamic threat landscape.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.