UPDATED | Google at 20: Going back to its roots as clouds...

UPDATED | Google at 20: Going back to its roots as clouds gather

UPDATED 12 SEPTEMBER As Google turns 20 with a new search engine launch, Chris Middleton looks at what it has been up to lately – and at the storm clouds gathering in the US and Europe for it and other large technology platforms.

Internet of Business says

Google geared up for its twentieth birthday this month by going back to its search roots. The company launched Dataset Search on 5 September, a new search engine that enables academics, researchers, and policymakers to look for scientific and other large datasets online.

According to a post by Serpninja, Dataset Search is live now in beta, in multiple languages, indexing datasets in areas such as the environment, social sciences, and government. But the aim is to embrace all academic resources over time, as institutions worldwide scrabble to tag their open datasets for indexing on the service (see below).

Google already runs Google Scholar, a popular search engine for academic research.

The new search page will doubtless remind some middle-aged researchers of the day, 20 years ago, when they first clicked on the Google homepage and found themselves looking at an acre of white space – and a keyhole into a world of information.

Since then, Google’s commitment to making information easier to find has persuaded some to game its algorithms, distorting the nature of information itself (hello, SEO agencies). Others never look past page one in their searches, suggesting that as the data landscape becomes ever more vast, many of us are surveying it through smaller and smaller apertures – such as the Google search bar.

Imagine a courier carrying a vast parcel of knowledge that he wants to deliver to someone, who has fallen asleep on the floor by his locked front door, waiting for it to arrive. The huge parcel can’t fit through the tiny letterbox, so the courier leaves it outside and posts a piece of junk mail through the hole instead. The recipient wakes up, picks up the flyer, orders the product, and never sees or opens the huge parcel waiting on his doorstep.

In many ways, that’s the world that Google has created over the last two decades – whether by accident in a sincere attempt to put information at everyone’s fingertips, or by design to sell advertising.

Either way, it’s made many of us lazy. We’ve stopped looking for information ourselves, and many people wait for someone to deliver an easy answer through Google’s letterbox and into our laps. If the information isn’t there in a split second, the implication is that it can’t be worth waiting for – let alone looking for.

Might this same problem afflict academic data in future? In a blog post explaining the company’s latest launch, Google AI research scientist Natasha Noy wrote, “To create Dataset Search, we developed guidelines for dataset providers to describe their data in a way that Google (and other search engines) can better understand the content of their pages.

“These guidelines include salient information about datasets: who created the dataset, when it was published, how the data was collected, what the terms are for using the data, etc. We then collect and link this information, analyse where different versions of the same dataset might be, and find publications that may be describing or discussing the dataset.

“Our approach is based on an open standard for describing this information (schema.org) and anybody who publishes data can describe their dataset this way. We encourage dataset providers, large and small, to adopt this common standard so that all datasets are part of this robust ecosystem.”

Nevertheless – if the past 20 years are anything to go by – it’s conceivable that academics may start gaming search engines too, changing the content of their research to make it easier for Google to rank. Perhaps they’ve been doing it for years… or perhaps Google may simply encourage them to write in plain English. Who knows?

Jeni Tennison, CEO of the London-based Open Data Institute (ODI), co-founded in 2012 by Sir Tim Berners-Lee, said, “Dataset search has always been a difficult thing to support, and I’m hopeful that Google stepping in will make it easier.

“Simply understanding how people search is important… what kind of terms they use, and how they express them. If we want to get to grips with how people search for data and make it more accessible, it would be great if Google opened up its own data on this.”

Indeed – and perhaps the company will.

In recent days, Google has reinforced its commitment to openness by, among other things, handing over full operational management of its Kubernetes containerisation/virtualisation tool to the open source community, via the Cloud Native Computing Foundation, and making the Ethereum blockchain dataset available to its Big Query analytics data warehouse.

But in other ways, it has been rather less open.

Chroming the Web

The latest update to Google’s Chrome browser this week strips the ‘www.’ and ‘m.’ from URLs in its address bar – characters that the development team considers trivial.

However, some have pointed out that ‘www.’ and ‘m.’ aren’t trivial in some Web addresses, while others have suggested that the move will hide whether sites are being served by Google’s AMP tool, allowing it to grab those sites’ traffic for itself, invisibly.

And political openness is also causing the search, advertising, and cloud giant problems.

While Twitter CEO Jack Dorsey and Facebook COO Sheryl Sandberg have been among those finding their platforms picked apart in the glare of contemporary US politics this week, Google CEO Sundar Pichai has not appeared before the Senate to answer questions about foreign interference in elections, along with other hearings on alleged political bias and ‘fake news’.

US president Donald Trump has ramped up the rhetoric in recent days, suggesting that a number of companies, including Google and Facebook, are biased against conservative causes and opinions – a charge that the companies have denied.

Google responded: “Search is not used to set a political agenda and we don’t bias our results toward any political ideology. Every year, we issue hundreds of improvements to our algorithms to ensure they surface high-quality content in response to users’ queries. We continually work to improve Google Search and we never rank search results to manipulate political sentiment.”

In a blog post, Federal Communications Commission chairman Ajit Pai called for new laws requiring companies like Facebook, Google, and Twitter to disclose how they decide on bans and other policy decisions. Meanwhile, US attorney general Jeff Sessions is considering launching an investigation of social media companies.

However, not all of the criticisms have come from Trump allies or appointees. “The era of the Wild West in social media is coming to an end,” said Democrat senator Mark Warner, vice-chairman of the Senate intelligence committee, adding, “Where we go from here now is an open question.”

Warner said that tech companies and social platforms were not doing enough to stop foreign influence in domestic matters – arguably overlooking the fact that Twitter, Facebook, and the rest, are global platforms, not US mouthpieces.

However, it is a matter of public record that fake accounts, troll farms, and Russian-backed social engineering programmes have been fanning the flames of any cause within the West that serves Russian interests – Zuckerberg and others have admitted as such.

Rock vs. hard place

These controversies leave companies such as Google between a rock and a hard place. On the one hand, they oppose tighter regulations of any kind, such as California’s new GDPR-like data privacy act (which comes into force in 2020). Google and others have been lobbying the government for a watered-down federal solution that serves their commercial interests.

But on the other, seeing this administration as an ally – especially against their home state of California – is a high-stakes gamble indeed, when the president regards them as biased against his own political beliefs.

There is certainly a global sense that these discussions, when set alongside the Facebook/Cambridge Analytica scandal, represent a watershed moment for the industry – one way or another.

The idea that some companies, such as Google and Facebook, are now too big, too complex, and too powerful for their own or others’ good is widely shared in Europe. Indeed, it was one of the spurs for the introduction of GDPR.

Watchdogs threaten to bite

Germany is among those threatening to rein in the giants, by bolstering the powers of its competition watchdog.

The idea is to prevent future platforms from becoming monopolies, and to stop companies like Google from simply buying up reams of smaller players – a policy that will rattle any startups whose goal is being acquired.

This week, German economy minister Peter Altmaier said it was essential to strike the right balance “between the growth chances of German and European platforms and preventing the abuse of market power”.

He published a 173-page report on proposals to give Germany’s antitrust regulator powers to act before any company reaches a tipping point in market influence – a process that can happen at high speed in the connected age, due to the network effect. The rapid rise of Facebook, Twitter, and others, demonstrates this.

The report says [translated from the German by Internet of Business], “Developments in the digital economy – in particular the growing importance of (a) data as critical input resource in production and distribution processes, and (b) digital platforms in some highly concentrated markets – raise the question of whether current antitrust laws are adequate to deal effectively and rapidly with new competition risks.

“Specifically, the question arises as to whether the threshold for antitrust laws’ ability to intervene and control abuse – generally or in certain cases – is currently set too high and prevents timely intervention.

“Against this background, this study analyses whether the rules to protect against abuse of economic power in (as yet) un-dominated markets are sufficiently clear and effective.”

Tipping backwards?

Markets with strong network effects can quickly tip over into a monopoly, confirms the report. However, this process is often not “natural”, but can be “favoured or even induced by certain practices of individual actors. These practices include unilateral behaviours, such as targeted obstruction of multihoming [use of more than one network].”

Since this tipping point can’t be reversed after the fact, the report recommends giving antitrust regulators the power to intervene before it is reached.

It also recommends giving regulators the power to prevent a merger or acquisition, even if the deal does not create a monopoly but is merely “an expression of an overall strategy” to buy up early-stage companies before they become a threat, setting the purchaser on the path towards monopoly.

Were such a law to be adopted in Germany – and in Europe as a whole – it would challenge the core business model of large sections of the US technology industry, and any other company that has grown by aggressive acquisition.

The report also proposes a new ‘data for all’ law that would require dominant platforms, such as Google, to share the data that feeds them, allowing smaller competitors to train their own algorithms to a similar standard.

That would really challenge just how far companies such as Google are prepared to be open.

‘Link taxes’ and privacy laws

And there are yet more problems for Google and other cloud platforms in Europe.

France’s data regulator, the Commission Nationale de l’Informatique et des Libertés, is seeking to extend the so-called Right to be Forgotten globally, arguing that any Europe-only removal of data is meaningless on a global platform in an age of IP cloaking.

Google and others oppose the plan – just as they oppose all GDPR-style rules than threaten their advertising-based businesses, such as California’s incoming privacy regulations.

And on 12 September, the European Parliament voted to amend two articles in the EU copyright directive.

The amended Article 11 would force news aggregation and search sites, such as Google, to pay publishers for showing news snippets and links.

The move is supported by many content owners, who believe that Google is syphoning off their revenues, but opposed by tech luminaries, such as Sir Tim Berners Lee, as a ‘link tax’ that threatens the very concept of hypertext and the World Wide Web.

Meanwhile, Article 13 would force platforms such as YouTube to seek licences for content from copyright owners – a move supported by many musicians, for example, who believe that, while Google gives them a global profile, it denies them fair or commensurate revenue streams for their work.

So happy 20th birthday, Google. We all use that letterbox-shaped search window, but have somehow learned to ignore the giant logo that stands above it, branding every piece of information in the world.

Read more: A.I.: Google releases open source framework for reinforcement learning