Google’s DeepMind AI is learning to navigate cities without a map

Google’s DeepMind AI is learning to navigate cities without a map

The machine learning capabilities of Google’s DeepMind AI are now being applied to help robots and AIs navigate using visual cues, rather than maps and location-based services, says Andrew Hobbs.

Autonomous robots and vehicles currently navigate using GPS and built-in maps, but new advances in the use of deep reinforcement learning to navigate through mazes have paved the way to a more human-like approach.

Google-owned DeepMind has applied its pioneering AI technology in numerous fields, from eye diagnostics to power efficiency. Now the artificial intelligence leader is using the worldwide coverage of Google’s Street View images to enable an “end-to-end deep reinforcement learning approach” that can be applied at city scale.

In short, DeepMind is training its neural network to navigate cities in Google Street View using only landmarks and visual cues.

What makes the system impressive is that the AI agent solely uses Street View images to establish its position, rather than the underlying map or location coordinates. Using these cues, it can navigate towards its destination based on learned landmark locations – in much the same way as a human being might when exploring a real town or city for themselves.

AI navigation without maps

Navigating through unstructured environments is a basic capability of intelligent creatures, and so is of fundamental interest in the study and development of artificial intelligence. When we move around areas we know, we do so by landmarks of various scales – and at various distances – such as buildings, shops, and fountains, relating them to images in our minds and choosing our direction accordingly.

DeepMind’s research, titled ‘Learning to Navigate in Cities Without a Map’, highlights just what this new approach to autonomous pathfinding hopes to achieve:

Long-range navigation is a complex cognitive task that relies on developing an internal representation of space, grounded by recognisable landmarks and robust visual processing, that can simultaneously support continuous self-localisation (‘I am here’) and a representation of the goal (‘I am going there’).

This dual pathway architecture (which uses both location-specific features and general policies) allows the technology to be easily applied to multiple cities – as the video below demonstrates.

This kind of capability has implications across various fields of technology and research, with potential long-term applications in industry and transportation. The DeepMind paper explains:

The subject of navigation is attractive to various research disciplines and technology domains alike, being at once a subject of inquiry, from the point of view of neuroscientists wishing to crack the code of grid and place cells, as well as a fundamental aspect of robotics research, wishing to build mobile robots that can reach a given destination.

DeepMind map-less navigation
Diagram showing how DeepMind’s map-less navigation works

Internet of Business says

While the research is still in its infancy (it has yet to be applied to real-world navigation – with elements such as traffic modelling and vehicle handling), it is leading the way towards more advanced methods of autonomous navigation.

While maps may rapidly become out of date, landmark-based navigation may offer a dependable way to strengthen the reliability of autonomous agents with a more human, multi-faceted approach to getting from A to B.

This research may lead to the creation of AIs that have a sense of self-location, allowing them to perceive where they are based on what they can see, rather than map data and coordinates. This may be vital to navigating unstructured environments over long distances.

If DeepMind can make the technology work in a real-world practical setting, it will be a significant achievement, because it will involve AIs being able to recognise both objects and object types, and relate them to physical environments, at various scales and distances.

For example, to a computer, a car or a statue is merely a set of pixels, as is a motorway or a railway line. AI systems need to be trained to recognise the critical differences between, say, a statue and a person, or between a road, a path, and a river, or between a street in New York and a street in Tokyo. To human beings, the visual cues may be obvious, but artificial intelligence needs to be taught to recognise them at every scale and distance.