Yiming Xu
PhD Fellow working on Multimodal Document Understanding at the Computer Vision Center, Universitat Autònoma de Barcelona.
I work in the Vision & Language Group at CVC, UAB, advised by Dimosthenis Karatzas and Ernest Valveny — building agentic document-understanding systems and foundation models for document understanding.
Before Barcelona I earned an MSc at the University of Amsterdam (thesis on Multimodal RAG) and spent two years as principal developer building e-commerce AI agents.
Selected project
All projects →Hierarchical Planner
A navigation-first planner that walks a document's hierarchy instead of flat top-k retrieval, using layered embeddings as routing hints.
E-commerce AI Agent
Flagship conversational shopping agent. Led development end-to-end; the project went on to raise close to USD 7M in funding.
Multimodal RAG
Master's thesis on retrieval-augmented generation across text and image, supervised by Benno Kruit and Jan-Christoph Kalo.
Recent writing
All writing →- 2025-10-07Planner design
- 2025-09-21Context engineering
- 2025-08-14DocLens
Recent publications
All publications →- ICML 2026
- ICDAR 2026
- LREC-COLING 2024
Awards
- 2025 FPI (Formación de Personal Investigador) Fellowship
- 2017 Chinese Academy of Sciences Science and Innovation Program Scholarship