Motivation

ContactExplorer is an exploration method for general-purpose dexterous manipulation, which encourages diverse finger-object contact patterns by tracking state-conditioned contact coverage.

Simulation Tasks

Cluttered Object Singulation
Constrained Object Retrieval
In-Hand Reorientation (Elephant)
In-Hand Reorientation (Mug)
Bimanual Manipulation (Waffle)
Bimanual Manipulation (Box)
In-Hand Reorientation (Cube)
In-Hand Reorientation (Bunny)

Real-World Results

We distill the simulation oracle policy into a vision-based policy and evaluate real-world object singulation with randomized target objects. Videos are shown at 1x speed.

Select trial
Global view
Top-down view
Side view

Abstract

Deep Reinforcement Learning (DRL) has achieved remarkable success in domains with well-defined reward structures, such as Atari games and locomotion. In contrast, dexterous manipulation lacks general-purpose reward formulations and typically depends on task-specific, handcrafted priors to guide hand-object interactions. We propose ContactExplorer, a general exploration method designed for general-purpose dexterous manipulation tasks. ContactExplorer represents contact state as the intersection between object surface points and predefined hand keypoints, encouraging dexterous hands to discover diverse and novel contact patterns, namely which fingers contact which object regions. It maintains a contact counter conditioned on discretized object states obtained via learned hash codes, capturing how frequently each finger interacts with different object regions. This counter is leveraged in two complementary ways: (1) to assign a count-based contact coverage reward that promotes exploration of novel contact patterns, and (2) to provide an energy-based reaching reward that guides the agent toward under-explored contact regions. We evaluate ContactExplorer on a diverse set of dexterous manipulation tasks, including cluttered object singulation, constrained object retrieval, in-hand reorientation, and bimanual manipulation. Experimental results show that ContactExplorer substantially improves training efficiency and success rates over existing exploration methods, and that the contact patterns learned with ContactExplorer transfer robustly to real-world robotic systems.

Deep Reinforcement Learning (DRL) has achieved remarkable success in domains with well-defined reward structures, such as Atari games and locomotion. In contrast, dexterous manipulation lacks general-purpose reward formulations and typically depends on task-specific, handcrafted priors to guide hand-object interactions. We propose ContactExplorer, a general exploration method designed for general-purpose dexterous manipulation tasks. ContactExplorer represents contact state as the intersection between object surface points and predefined hand keypoints, encouraging dexterous hands to discover diverse and novel contact patterns, namely which fingers contact which object regions. It maintains a contact counter conditioned on discretized object states obtained via learned hash codes, capturing how frequently each finger interacts with different object regions. This counter is leveraged in two complementary ways: (1) to assign a count-based contact coverage reward that promotes exploration of novel contact patterns, and (2) to provide an energy-based reaching reward that guides the agent toward under-explored contact regions. We evaluate ContactExplorer on a diverse set of dexterous manipulation tasks, including cluttered object singulation, constrained object retrieval, in-hand reorientation, and bimanual manipulation. Experimental results show that ContactExplorer substantially improves training efficiency and success rates over existing exploration methods, and that the contact patterns learned with ContactExplorer transfer robustly to real-world robotic systems.

Method

ContactExplorer proposes a contact coverage-guided exploration method that explicitly models hand-object interactions, consisting of three key components: a learned state hashing module that discretizes continuous object states into compact state clusters, a contact coverage counter that records state-conditioned finger-region interactions, and a structured exploration reward.

The exploration reward is further decomposed into a contact coverage reward, which encourages exploration of under-explored contact regions after contact occurs, and an energy-based reaching reward, which guides the policy to unexplored object regions, facilitating efficient contact discovery before interaction occurs. The current object state and the goal state are visualized as colored point clouds, with colors indicating different object surface regions.

contact_explorer_overview_update
Figure. Overview of ContactExplorer.

Experiment

Training curves across tasks
Figure. Training curves across tasks.

We compare ContactExplorer, against baselines on a range of challenging manipulation tasks. ContactExplorer shows (i) more stable training across different random seeds, (ii) improved sample efficiency, and (iii) achieves higher task performance overall.