Which One? Leveraging Context Between Objects and Multiple Views for Language Grounding

NAACL-HLT(2024)

引用 0|浏览42
暂无评分
摘要
When connecting objects and their language referents in an embodied 3Denvironment, it is important to note that: (1) an object can be bettercharacterized by leveraging comparative information between itself and otherobjects, and (2) an object's appearance can vary with camera position. As such,we present the Multi-view Approach to Grounding in Context (MAGiC), whichselects an object referent based on language that distinguishes between twosimilar objects. By pragmatically reasoning over both objects and acrossmultiple views of those objects, MAGiC improves over the state-of-the-art modelon the SNARE object reference task with a relative error reduction of 12.9%(representing an absolute improvement of 2.7%). Ablation studies show thatreasoning jointly over object referent candidates and multiple views of eachobject both contribute to improved accuracy. Code:https://github.com/rcorona/magic_snare/
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要