Feeds

Study: Most projects on GitHub not open source licensed

Kids these days, they just don't care

Providing a secure and efficient Helpdesk

Code-sharing website GitHub has grown so popular that it and open source are practically synonymous for many developers. But new research shows that most of the projects now on GitHub are released under license terms that are unclear, inconsistent, or nonexistent, leaving their legal status as open source software uncertain.

That's according to Aaron Williamson, senior staff counsel at the Software Freedom Law Center, who presented some of his findings on the matter at the Linux Collaboration Summit in San Francisco on Wednesday.

Williamson first became interested in software licensing trends on GitHub after reading a somewhat profane Twitter post from Redmonk's James Governor in 2012, to the effect that today's young developers can't be bothered to deal with the complexities of open source licensing and governance.

At the time, Governor's post inspired much debate on Twitter and beyond. But was what he said true? Are younger developers on GitHub really less likely to specify clear licensing for their projects than earlier generations of coders? Williamson decided to find out.

To that end, he wrote a Python script that continuously polled GitHub, looking for license files. He then ran those files through FOSSology, a tool developed by HP and some others that can identify software licenses by the specific language and phrases contained in them.

Williamson was quick to point out that his study was by no means scientific, nor was his data set complete. GitHub's APIs throttles the number of requests you can make per hour, so Williamson was unable to poll the entire archive – in fact, he only made it through the oldest 28 per cent of the repositories. He's also fairly certain that he missed some licenses and that there were some errors and duplications in the data. Still, his results are eye opening.

Licenses? Bah, who needs 'em

According to Williamson, out of the 1,692,135 code repositories he scanned, just 219,326 of them – 14.9 percent – had a file in their top-level directories that identified any kind of license at all. Of those, 28 per cent only announced their licenses in a README file, as opposed to recommended filenames such as LICENSE or COPYING.

Equally interesting, Williamson found that developers with projects on GitHub tend to shun so-called copyleft licenses such as the Gnu General Public License (GPL) – which require modified versions of the software to be released under the same license as the original – in favor of more permissive alternatives.

  Chart showing license use among GitHub projects  

Most developers on GitHub seem to prefer permissive licenses to the GPL (Source: Aaron Williamson)

Naturally, the GPL was still well represented. Williamson found some 61,000 projects that were licensed under some version of the GPL or Lesser GPL. But his scans turned up fully twice as many projects that were released under either the MIT, BSD, or Apache licenses, none of which are copyleft licenses.

Williamson added that although his data was just a snapshot, and therefore couldn't be used to establish any trends, data gathered by Redmonk does indicate an overall trend toward permissive licensing for projects written in many different languages.

Just why that is wasn't clear. But Luis Villa, deputy general counsel at the Wikimedia Foundation, has suggested that younger developers may be choosing more permissive licenses as a way of pushing back against what they see as a "permission culture." They prefer to let other developers just do whatever they want with their code – and, rightly or wrongly, this might be a reason why many projects are released with no license whatsoever.

Even when GitHub repositories included licenses, however, Williamson also found a lot of projects where the licensing was unclear. For example, many projects claimed to be licensed under "the Ruby license," but Ruby's licensing has changed over time, making it difficult to figure out just what the terms are for any specific project if they aren't stated explicitly.

Still other projects offered terms that were inconsistent; for example, a program that claimed to be licensed under the GPL but "for non-commercial use only," which contradicts the GPL's terms.

Secure remote control for conventional and virtual desktops

More from The Register

next story
'Windows 9' LEAK: Microsoft's playing catchup with Linux
Multiple desktops and live tiles in restored Start button star in new vids
Not appy with your Chromebook? Well now it can run Android apps
Google offers beta of tricky OS-inside-OS tech
New 'Cosmos' browser surfs the net by TXT alone
No data plan? No WiFi? No worries ... except sluggish download speed
Greater dev access to iOS 8 will put us AT RISK from HACKERS
Knocking holes in Apple's walled garden could backfire, says securo-chap
NHS grows a NoSQL backbone and rips out its Oracle Spine
Open source? In the government? Ha ha! What, wait ...?
Google extends app refund window to two hours
You now have 120 minutes to finish that game instead of 15
Intel: Hey, enterprises, drop everything and DO HADOOP
Big Data analytics projected to run on more servers than any other app
prev story

Whitepapers

Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.