Object Pose Estimation through Dexterous Touch

Amir-Hossein Shahidzadeh^*,1, Jiyue Zhu^*,2, Kezhou Chen², Sha Yi²
Cornelia Fermüller¹, Yiannis Aloimonos¹, Xiaolong Wang²

¹ University of Maryland, ² UC San Diego

-->

Abstract

Robust object pose estimation is essential for manipulation and interaction tasks in robotics, particularly in scenarios where visual data is limited or sensitive to lighting, occlusions, and appearances. Tactile sensors often offer limited and local contact information, making it challenging to reconstruct the pose from partial data. Our approach uses sensorimotor exploration to actively control a robot hand to interact with the object. We train with Reinforcement Learning (RL) to explore and collect tactile data. The collected 3D point clouds are used to iteratively refine the object’s shape and pose. In our setup, one hand holds the object steady while the other performs active exploration. We show that our method can actively explore an object’s surface to identify critical pose features without prior knowledge of the object's geometry.

with pose reward

The pose reward drives the agent to explore the object and improve pose estimation accuracy, measured by AUC of ADD-S. As shown in the video, this helps the agent efficiently sample key features like edges and corners, achieving higher AUC of 96.5% in 11 steps while quickly targeting pre-defined boundaries.

without pose reward

Without the pose reward, the agent explores the object with the objective of maximizing the IoU. As shown in the video, while the agent successfully samples a broad range of points on the object, it fails to target critical pose-related features. As a result, the agent takes 20 steps but only gets an AUC of 89.2%.

Unseen Objects Holding and Exploration

"Due to the computational cost of the pose estimation metric (AUC), it is evaluated every 20 steps."

Duplo

Speaker

Sugar Box

Mug

Bleach Cleanser

Chips Can

Epson

Wood Block

Nesquik

Training Objects Holding and Exploration

Cuboid

Long Cuboid

Rectangle Box

Long Cylinder

Cylinder

Training curves (Reward Ablation)

The figure above shows the training curves for AUC of ADD-S versus training steps for policies with different reward ablations, each trained for 300k steps with three seeds. T refers to the Tactile reward, M to the short Memory reward, B to the curiosity Bonus reward, and P to the Pose estimation reward. As shown, TMBP outperforms other policies with lower variance in AUC of ADD-S, highlighting the effectiveness of combining all rewards. Compared to TMB, TMBP achieves higher accuracy with less variance, demonstrating the value of the pose estimation reward. Similarly, TMBP surpasses TMP by efficiently exploring objects with the curiosity bonus, and outperforms TBP by avoiding revisited poses using the short memory reward.

Training curves (State Representation Ablation)

The figure above shows the training curves for IoU versus training steps across policies with different state representations. The unsmoothed data is shown in the background, appearing noisy as IoU is recorded per episode. BFTRM represents Boundary distance, Finger joints, Touch state, hand Rotation, and local contact Memory in state representation. As shown, BFTRM consistently outperforms other policies after 50k steps, demonstrating the effectiveness of combining all state representations. Comparing BFTRM with FTRM and BFTR highlights the importance of boundary distance and local contact memory—boundary distance helps the agent localize its position, while local contact memory provides information about the current sampled point cloud.