Publications

Peer-reviewed work on cross-modal retrieval and question answering.

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Łukasz Borchmann, Jordy Van Landeghem, Michał Turski, Shreyansh Padarha, Ryan Othniel Kearns, Adam Mahdi, Niels Rogge, Clémentine Fourrier, Siwei Han, Huaxiu Yao, Artemis Llabrés, Yiming Xu, Dimosthenis Karatzas, Hao Zhang, Anupam Datta · ICML · Oral (2026)

MADQA — a multimodal agentic document-QA benchmark that scores full search trajectories, not just answers, and shows agents match human accuracy only by working ~5× harder.

AdaNav: Query-Adaptive Multi-Granularity Navigation for Long Document Understanding

Yiming Xu, Eric López, Artemis Llabrés, Maximiliano Hormazábal, Ernest Valveny, Dimosthenis Karatzas · ICDAR · Accepted (2026)

AdaNav builds a multimodal document tree and navigates it at query-adaptive granularity — no embedding retriever — beating open-source VLM agent systems by over 5% on MMLongBench-Doc while reading fewer pages.

Retrieval-based Question Answering with Passage Expansion Using a Knowledge Graph

Benno Kruit, Yiming Xu, Jan-Christoph Kalo · LREC-COLING · Oral (2024)

A multimodal retriever that combines knowledge-graph entity features with dense text retrieval, improving open-domain QA precision on rare, entity-centric questions where dense retrievers fall short.

Fine-grained label learning via siamese network for cross-modal information retrieval

Yiming Xu, Jing Yu, Jingjing Guo, Yue Hu, Jianlong Tan · ICCS (2019)

Fine-grained labels capture the "hardness" of text–image pairs; a siamese network and a weighted pairwise loss exploit them to improve cross-modal retrieval on three benchmarks.