News

DeepMind Releases Self-Supervised Learning Algorithm BYOL-Explore

June 27, 2022

BYOL-Explore was assessed against the ten most challenging exploration Atari games to demonstrate the generalisability of the methodology

Deepmind researchers recently introduced a curiosity-driven exploration algorithm, “BYOL-Explore”. The strategy is based on Bootstrap Your Own Latent (BYOL), a self-supervised latent-predictive method that forecasts an earlier version of its latent representation. To handle the problems of creating the world model representation and the curiosity-driven policy, BYOL-Explore learns a world model with a self-supervised prediction loss and trains a curiosity-driven policy using the same loss.

An RL agent that maximises these intrinsic incentives steers itself toward situations where the world model is unreliable or unsatisfactory, creating new paths for the world model. In other words, the quality of the exploration policy is influenced by the characteristics of the world model, which in turn helps the world model by collecting new data. Therefore, it might be crucial to approach learning the world model and learning the exploratory policy as one cohesive problem to be solved rather than two separate tasks.

Computer vision, learning about graph representations, and RL representation learning have all successfully used this bootstrapping approach. In contrast, BYOL-Explore goes one step further and not only learns a flexible world model but also exploits the world model’s loss to motivate exploration.

BYOL-Explore has been tested using the DM-HARD-8 set of eight challenging first-person, 3-D activities with little rewards. Since these activities involve completing a series of exact, organised interactions with the actual objects in the environment, which are unlikely to occur under a vanilla random exploration method, they call for efficient exploration.

BYOL-Explore has also been assessed against the ten most challenging exploration Atari games to demonstrate the generalisability of the methodology. BYOL-Explore surpasses well-known curiosity-driven exploration techniques in each area, including Random Network Distillation (RND) and the Intrinsic Curiosity Module (ICM). In DM-HARD-8, BYOL-Explore performs most tasks at a human level, utilising simply extrinsic rewards supplemented by intrinsic rewards, whereas earlier substantial advancements needed human demonstrations.

Surprisingly, BYOL-Explore achieves this performance with just one world model and one policy network concurrently trained across all tasks. Finally, as additional proof of its generalizability, BYOL-Explore outperforms other rival agents like Agent57 and Go-Explore in the ten most challenging exploration Atari games while having a more straightforward architecture. BYOL-Explore opens the avenues of research for algorithms to handle 2-D or 3-D, single or multi-task, fully or partially observable environments.

DeepMind Releases Self-Supervised Learning Algorithm BYOL-Explore

BYOL-Explore was assessed against the ten most challenging exploration Atari games to demonstrate the generalisability of the methodology

Latest Posts

OpenAI’s o3-Pro Is Here; Open-Weights Model Delayed

Mistral AI Unveils Its First Reasoning Model

Meta’s Zuckerberg Hiring for New ‘Superintelligence’ AI Team: Report

Apple Says AI Models Collapse When Facing Hard Puzzles

Meta in Talks to Invest in Scale AI

Reddit Sues Anthropic Over Alleged Data Scraping for AI Training