-
Linguistic Unit Discovery
-
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
-
Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval.
-
Cross-modal Graph Matching Network for Image-text Retrieval
-
Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval