O-STaR: Open-Vocabulary Object Search through Spatio-Temporal Reasoning on Dynamic Scene Graphs
Abstract:
Service robots in home environments must be able to locate frequently repositioned everyday objects under only partial observability. In such scenarios, exhaustive search is impractical and generic semantic priors fail to capture individual household-specific habits.
We therefore present Open-vocabulary object search through Spatio-Temporal Reasoning~(O-STaR), a unified framework that bridges semantic common-sense, geometric feasibility, and adaptive temporal reasoning for personalized object search.
In order to reflect dynamic household environments, we model them as 3D semantic scene graphs, treating furniture as static anchors and corresponding objects as dynamic nodes with latent transition processes.
Additionally, our approach grounds large language model priors with physical scene geometry, maintains a belief over object locations using Dirichlet-Categorical models, updated from sparse observations, and enables personalized object search through learned household relocation patterns.
We study demand-driven object search, where a robot receives episodic queries and must locate objects under partial observability and unobserved relocation.
Physical experiments on a Stretch mobile manipulator show that geometric reasoning reduces concealed-space search time by 68 %.
Simulation benchmarks over 60-day scenarios demonstrate that our adaptive model recovers performance even under corrupted initial priors, achieving a 72.67\% success rate



