Publications
Peer-reviewed work on cross-modal retrieval and question answering.
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections
MADQA — a multimodal agentic document-QA benchmark that scores full search trajectories, not just answers, and shows agents match human accuracy only by working ~5× harder.
AdaNav: Query-Adaptive Multi-Granularity Navigation for Long Document Understanding
AdaNav builds a multimodal document tree and navigates it at query-adaptive granularity — no embedding retriever — beating open-source VLM agent systems by over 5% on MMLongBench-Doc while reading fewer pages.
Retrieval-based Question Answering with Passage Expansion Using a Knowledge Graph
A multimodal retriever that combines knowledge-graph entity features with dense text retrieval, improving open-domain QA precision on rare, entity-centric questions where dense retrievers fall short.
Fine-grained label learning via siamese network for cross-modal information retrieval
Fine-grained labels capture the "hardness" of text–image pairs; a siamese network and a weighted pairwise loss exploit them to improve cross-modal retrieval on three benchmarks.