Reading List

2025-11-11 ยท 1 min read ยท #
PaperPaper LinkCode / ModelVenueBrand
MDocAgent: A Multi-Modal Multi-Agent Framework for Document UnderstandingLinkCodearXivAIMing Lab
Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time ScalingLinkCodearXivQwen
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning AgentsLinkCodearXivAlibaba NLP
DeepSeek-OCR: Contexts Optical CompressionLinkCodearXivDeepSeek
DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document UnderstandingLinkCodearXivAlibaba
SlideAgent: Hierarchical Agentic Framework for Multi-Page Visual Document UnderstandingLinkProjectarXivGeorgia Tech & JPMorgan Research
DocLens : A Tool-Augmented Multi-Agent Framework for Long Visual Document UnderstandingLinkProjectarXivGoogle Cloud