Is Google legal?

Big in Belgium

globalisation

Analysis A Belgian court ruled against Google’s use of newspaper stories in early September. If you believe Google, it did nothing wrong and failed to defend itself because it was unaware of the publishers’ lawsuit. If you believe the publishers, Google is lying and infringes copyright on a colossal scale. The parties return to court on 23rd November in a case that finds legal uncertainty looming over the world’s leading search engines.

The case focused on Google’s news aggregation service, which automatically scans the websites of newspapers, extracting headlines and snippets of text from each story. These are displayed at Google News and the headlines link users to the full stories on the source sites. Newspaper group Copiepresse, which represents leading Belgian, French and German publications, said this amounted to copyright infringement and a breach of database rules because its members had not been asked for permission.

Copiepresse could have stopped Google without going to court but chose not to. Instead, it wants Google to continue directing traffic to its sites – and it wants Google to pay for the privilege.

The court also ruled that Google’s cache, which is not part of Google News, infringed copyright.

When a person performs a search at Google, results are displayed with a link to the page on the third party site and also a link to a ‘cached’ copy of the same page stored at Google’s own site. The newspapers say this copy undermines their sale of archive stories. Why buy an archived story if you can find it in Google’s cache? Again, newspapers could have stopped their pages being cached.

Margaret Boribon, Secretary General of Copiepresse, told OUT-LAW that Google’s behaviour is “totally illegal” because it does not seek permission before extracting content for Google News or copying pages to its cache. Google disagrees.

Understanding Google’s position within the law means understanding how the search engine works.

Google uses an automated program to crawl across the internet, known as its Googlebot. It locates billions of pages and copies each one to its index. In doing so it breaks the page into tiny pieces, analysing and cross-referencing every element. That index is what Google interrogates to return search results for users. When the Googlebot visits a page, it also takes a snapshot that is stored in Google’s cache, a separate archive that lets users see how a page looked the last time the Googlebot visited.

It is easy for a website to keep Googlebot or other search engine robots away from all or particular pages. A standard has existed since 1994 called the robots exclusion standard.

Add ‘/robots.txt’ to the end of any site’s web address and you’ll find that site’s instructions for search engines. Google also offers a simple way to prevent a page being cached: just write the word ‘NOARCHIVE’ in the code of a page.

When asked why her members’ news sites didn’t follow these steps to exclude Google, Boribon replied, "then you admit that their reasoning is correct". She said all search engines should obtain permission before indexing pages that carry copyright notices.

But the real reason for not opting-out with a robots.txt file or mandating against caching is that Belgium’s newspapers want to be indexed by Google. “Yes, we have a problem with Google, but we don’t want to be out of Google,” Boribon said. “We want Google to respect the rules. If Google wanted to index us, they need to ask.”

Copiepresse also wants Google to pay for indexing sites. Boribon declined to discuss how or how much. "That has to be negotiated," she said.

The argument is not unique. The World Association of Newspapers (WAN), which represents 18,000 newspapers in 102 countries, said in January it would “explore ways to challenge the exploitation of content by search engines without fair compensation to copyright owners.”

At that time, WAN did not have a strategy for challenge. Copiepresse did. It took direct action and convinced the Brussels Court of First Instance to order Google to withdraw from its sites all the articles and photographs of Copiepresse member sites. Google was given 10 days to comply with the threat of a €1 million fine for each day of delay.

Sponsored: 10 ways wire data helps conquer IT complexity