Feeds

Study: Most projects on GitHub not open source licensed

Kids these days, they just don't care

Top 5 reasons to deploy VMware with Tegile

Code-sharing website GitHub has grown so popular that it and open source are practically synonymous for many developers. But new research shows that most of the projects now on GitHub are released under license terms that are unclear, inconsistent, or nonexistent, leaving their legal status as open source software uncertain.

That's according to Aaron Williamson, senior staff counsel at the Software Freedom Law Center, who presented some of his findings on the matter at the Linux Collaboration Summit in San Francisco on Wednesday.

Williamson first became interested in software licensing trends on GitHub after reading a somewhat profane Twitter post from Redmonk's James Governor in 2012, to the effect that today's young developers can't be bothered to deal with the complexities of open source licensing and governance.

At the time, Governor's post inspired much debate on Twitter and beyond. But was what he said true? Are younger developers on GitHub really less likely to specify clear licensing for their projects than earlier generations of coders? Williamson decided to find out.

To that end, he wrote a Python script that continuously polled GitHub, looking for license files. He then ran those files through FOSSology, a tool developed by HP and some others that can identify software licenses by the specific language and phrases contained in them.

Williamson was quick to point out that his study was by no means scientific, nor was his data set complete. GitHub's APIs throttles the number of requests you can make per hour, so Williamson was unable to poll the entire archive – in fact, he only made it through the oldest 28 per cent of the repositories. He's also fairly certain that he missed some licenses and that there were some errors and duplications in the data. Still, his results are eye opening.

Licenses? Bah, who needs 'em

According to Williamson, out of the 1,692,135 code repositories he scanned, just 219,326 of them – 14.9 percent – had a file in their top-level directories that identified any kind of license at all. Of those, 28 per cent only announced their licenses in a README file, as opposed to recommended filenames such as LICENSE or COPYING.

Equally interesting, Williamson found that developers with projects on GitHub tend to shun so-called copyleft licenses such as the Gnu General Public License (GPL) – which require modified versions of the software to be released under the same license as the original – in favor of more permissive alternatives.

  Chart showing license use among GitHub projects  

Most developers on GitHub seem to prefer permissive licenses to the GPL (Source: Aaron Williamson)

Naturally, the GPL was still well represented. Williamson found some 61,000 projects that were licensed under some version of the GPL or Lesser GPL. But his scans turned up fully twice as many projects that were released under either the MIT, BSD, or Apache licenses, none of which are copyleft licenses.

Williamson added that although his data was just a snapshot, and therefore couldn't be used to establish any trends, data gathered by Redmonk does indicate an overall trend toward permissive licensing for projects written in many different languages.

Just why that is wasn't clear. But Luis Villa, deputy general counsel at the Wikimedia Foundation, has suggested that younger developers may be choosing more permissive licenses as a way of pushing back against what they see as a "permission culture." They prefer to let other developers just do whatever they want with their code – and, rightly or wrongly, this might be a reason why many projects are released with no license whatsoever.

Even when GitHub repositories included licenses, however, Williamson also found a lot of projects where the licensing was unclear. For example, many projects claimed to be licensed under "the Ruby license," but Ruby's licensing has changed over time, making it difficult to figure out just what the terms are for any specific project if they aren't stated explicitly.

Still other projects offered terms that were inconsistent; for example, a program that claimed to be licensed under the GPL but "for non-commercial use only," which contradicts the GPL's terms.

Remote control for virtualized desktops

More from The Register

next story
PEAK APPLE: iOS 8 is least popular Cupertino mobile OS in all of HUMAN HISTORY
'Nerd release' finally staggers past 50 per cent adoption
Microsoft to bake Skype into IE, without plugins
Redmond thinks the Object Real-Time Communications API for WebRTC is ready to roll
Microsoft promises Windows 10 will mean two-factor auth for all
Sneak peek at security features Redmond's baking into new OS
Mozilla: Spidermonkey ATE Apple's JavaScriptCore, THRASHED Google V8
Moz man claims the win on rivals' own benchmarks
Yes, Virginia, there IS a W3C HTML5 standard – as of now, that is
You asked for it! You begged for it! Then you gave up! And now it's HERE!
FTDI yanks chip-bricking driver from Windows Update, vows to fight on
Next driver to battle fake chips with 'non-invasive' methods
DEATH by PowerPoint: Microsoft warns of 0-day attack hidden in slides
Might put out patch in update, might chuck it out sooner
Ubuntu 14.10 tries pulling a Steve Ballmer on cloudy offerings
Oi, Windows, centOS and openSUSE – behave, we're all friends here
prev story

Whitepapers

Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.
Security and trust: The backbone of doing business over the internet
Explores the current state of website security and the contributions Symantec is making to help organizations protect critical data and build trust with customers.