Feeds

Study: Most projects on GitHub not open source licensed

Kids these days, they just don't care

SANS - Survey on application security programs

Code-sharing website GitHub has grown so popular that it and open source are practically synonymous for many developers. But new research shows that most of the projects now on GitHub are released under license terms that are unclear, inconsistent, or nonexistent, leaving their legal status as open source software uncertain.

That's according to Aaron Williamson, senior staff counsel at the Software Freedom Law Center, who presented some of his findings on the matter at the Linux Collaboration Summit in San Francisco on Wednesday.

Williamson first became interested in software licensing trends on GitHub after reading a somewhat profane Twitter post from Redmonk's James Governor in 2012, to the effect that today's young developers can't be bothered to deal with the complexities of open source licensing and governance.

At the time, Governor's post inspired much debate on Twitter and beyond. But was what he said true? Are younger developers on GitHub really less likely to specify clear licensing for their projects than earlier generations of coders? Williamson decided to find out.

To that end, he wrote a Python script that continuously polled GitHub, looking for license files. He then ran those files through FOSSology, a tool developed by HP and some others that can identify software licenses by the specific language and phrases contained in them.

Williamson was quick to point out that his study was by no means scientific, nor was his data set complete. GitHub's APIs throttles the number of requests you can make per hour, so Williamson was unable to poll the entire archive – in fact, he only made it through the oldest 28 per cent of the repositories. He's also fairly certain that he missed some licenses and that there were some errors and duplications in the data. Still, his results are eye opening.

Licenses? Bah, who needs 'em

According to Williamson, out of the 1,692,135 code repositories he scanned, just 219,326 of them – 14.9 percent – had a file in their top-level directories that identified any kind of license at all. Of those, 28 per cent only announced their licenses in a README file, as opposed to recommended filenames such as LICENSE or COPYING.

Equally interesting, Williamson found that developers with projects on GitHub tend to shun so-called copyleft licenses such as the Gnu General Public License (GPL) – which require modified versions of the software to be released under the same license as the original – in favor of more permissive alternatives.

  Chart showing license use among GitHub projects  

Most developers on GitHub seem to prefer permissive licenses to the GPL (Source: Aaron Williamson)

Naturally, the GPL was still well represented. Williamson found some 61,000 projects that were licensed under some version of the GPL or Lesser GPL. But his scans turned up fully twice as many projects that were released under either the MIT, BSD, or Apache licenses, none of which are copyleft licenses.

Williamson added that although his data was just a snapshot, and therefore couldn't be used to establish any trends, data gathered by Redmonk does indicate an overall trend toward permissive licensing for projects written in many different languages.

Just why that is wasn't clear. But Luis Villa, deputy general counsel at the Wikimedia Foundation, has suggested that younger developers may be choosing more permissive licenses as a way of pushing back against what they see as a "permission culture." They prefer to let other developers just do whatever they want with their code – and, rightly or wrongly, this might be a reason why many projects are released with no license whatsoever.

Even when GitHub repositories included licenses, however, Williamson also found a lot of projects where the licensing was unclear. For example, many projects claimed to be licensed under "the Ruby license," but Ruby's licensing has changed over time, making it difficult to figure out just what the terms are for any specific project if they aren't stated explicitly.

Still other projects offered terms that were inconsistent; for example, a program that claimed to be licensed under the GPL but "for non-commercial use only," which contradicts the GPL's terms.

High performance access to file storage

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Oh no, Joe: WinPhone users already griping over 8.1 mega-update
Hang on. Which bit of Developer Preview don't you understand?
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
IRS boss on XP migration: 'Classic fix the airplane while you're flying it attempt'
Plus: Condoleezza Rice at Dropbox 'maybe she can find ... weapons of mass destruction'
Ditch the sync, paddle in the Streem: Upstart offers syncless sharing
Upload, delete and carry on sharing afterwards?
New Facebook phone app allows you to stalk your mates
Nearby Friends feature goes live in a few weeks
Microsoft TIER SMEAR changes app prices whether devs ask or not
Some go up, some go down, Redmond goes silent
prev story

Whitepapers

Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.