Affordances are central to robotic manipulation, where most tasks can be simplified to interactions with task-specific regions on objects. By focusing on these key regions, we can abstract away task-irrelevant information, simplifying the learning process, and enhancing generalisation. In this paper, we propose an affordance-centric policy-learning approach that centres and appropriately orients a task frame on these affordance regions allowing us to achieve both intra-category invariance -- where policies can generalise across different instances within the same object category -- and spatial invariance --- which enables consistent performance regardless of object placement in the environment. We propose a method to leverage existing generalist large vision models to extract and track these affordance frames, and demonstrate that our approach can learn manipulation tasks using behaviour cloning from as little as 10 demonstrations, with equivalent generalisation to an image-based policy trained on 305 demonstrations.
Affordance-Centric Policy Learning. Affordance Detection: We propose a framework to detect affordance frames using pre-trained large vision models. Affordance Tracking: Once the frame is detected we utilise Foundation Pose to continuously track the frame in real-time as the robot interacts with it. Policy Learning: At the start of each episode we appropriately orient the frame towards the tool frame of the robot and train a state-based diffusion policy that operates with this frame as its task frame.
@misc{rana2024affordancecentricpolicylearningsample,
title={Affordance-Centric Policy Learning: Sample Efficient and Generalisable Robot Policy Learning using Affordance-Centric Task Frames},
author={Krishan Rana and Jad Abou-Chakra and Sourav Garg and Robert Lee and Ian Reid and Niko Suenderhauf},
year={2024},
eprint={2410.12124},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2410.12124},
}