Learning Task Decomposition with Order-Memory Policy Network

Anonymous

Learning Task Decomposition with Order-Memory Policy Network

Yuchen Lu, Yikang Shen, Siyuan Zhou, Aaron Courville, Joshua B. Tenenbaum, Chuang Gan (privately revealed to you)

02 Dec 2021ICLR 2021 PosterReaders: Everyone

Keywords: Task Segmentation, Hierarchical Imitation Learning, Network Inductive Bias

Abstract: Many complex real-world tasks are composed of several levels of sub-tasks. Humans leverage these hierarchical structures to accelerate the learning process and achieve better generalization. To simulate this process, we introduce Ordered Memory Policy Network (OMPN) to discover task decomposition by imitation learning from demonstration. OMPN has an explicit inductive bias to model a hierarchy of sub-tasks. Experiments on Craft world and Dial demonstrate that our model can more accurately recover the task boundaries with behavior cloning under both unsupervised and weakly supervised setting than previous methods. OMPN can also be directly applied to partially observable environments and still achieve high performance. Our visualization further confirms the intuition that OMPN can learn to expand the memory at higher levels when one subtask is close to completion.

One-sentence Summary: We introduce an Ordered Memory Policy Network (OMPN) to discover task decomposition by imitation learning from demonstration.

Supplementary Material: zip

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

0 Replies

Loading