Multimodal Generative Retrieval Model with Staged Pretraining for Food Delivery on Meituan
#Meituan #Multimodal Retrieval #Generative AI #Food Delivery #Staged Pretraining #Search Optimization #arXiv
📌 Key Takeaways
- Meituan researchers introduced a multimodal generative retrieval model to improve food delivery search accuracy.
- The model addresses the 'modality dominance' issue where text often overshadows visual data during AI training.
- A staged pretraining strategy was implemented to ensure balanced learning across different data types.
- The new architecture moves beyond traditional dual-tower models to provide more precise and diverse search results.
📖 Full Retelling
🏷️ Themes
Artificial Intelligence, E-commerce, Machine Learning
📚 Related People & Topics
Generative artificial intelligence
Subset of AI using generative models
# Generative Artificial Intelligence (GenAI) **Generative artificial intelligence** (also referred to as **generative AI** or **GenAI**) is a specialized subfield of artificial intelligence focused on the creation of original content. Utilizing advanced generative models, these systems are capable ...
Meituan
Chinese group buying and food delivery website
Meituan (Chinese: 美团; pinyin: Měituán; formerly Meituan–Dianping) is a Chinese technology company that operates a platform for local services including on‑demand food delivery, in‑store services and consumer reviews under the Dazhong Dianping brand, hotel and travel bookings, and instant retail. The...
🔗 Entity Intersection Graph
Connections for Generative artificial intelligence:
- 🌐 Machine learning (4 shared articles)
- 🌐 Large language model (3 shared articles)
- 🌐 ChatGPT (3 shared articles)
- 🏢 Databricks (2 shared articles)
- 🌐 Software as a service (2 shared articles)
- 🌐 Meta (2 shared articles)
- 🌐 Artificial intelligence (2 shared articles)
- 🌐 Chatbot (2 shared articles)
- 🌐 Apple (2 shared articles)
- 🏢 OpenAI (2 shared articles)
- 🏢 Enterprise software (1 shared articles)
- 👤 Ali Ghodsi (1 shared articles)
📄 Original Source Content
arXiv:2602.06654v1 Announce Type: cross Abstract: Multimodal retrieval models are becoming increasingly important in scenarios such as food delivery, where rich multimodal features can meet diverse user needs and enable precise retrieval. Mainstream approaches typically employ a dual-tower architecture between queries and items, and perform joint optimization of intra-tower and inter-tower tasks. However, we observe that joint optimization often leads to certain modalities dominating the traini