The figure above shows the training curves for AUC of ADD-S versus training steps for policies with different
reward
ablations, each trained for 300k steps with three seeds. T refers to the Tactile reward, M to
the short Memory reward, B
to the curiosity Bonus reward, and P to the Pose estimation reward.
As shown, TMBP outperforms other policies with lower variance in AUC of ADD-S, highlighting the effectiveness of
combining all rewards. Compared to TMB, TMBP achieves higher accuracy with less variance, demonstrating the value
of the
pose estimation reward. Similarly, TMBP surpasses TMP by efficiently exploring objects with the curiosity bonus,
and
outperforms TBP by avoiding revisited poses using the short memory reward.