3/6/2026 | USA | technology | ✓ Verified - arxiv.org

Differentially Private Multimodal In-Context Learning

#differential privacy #multimodal learning #in-context learning #data protection #AI models

📌 Key Takeaways

Differential privacy is applied to multimodal in-context learning to protect sensitive data.
The approach integrates privacy mechanisms into models handling multiple data types like text and images.
It aims to maintain model utility while ensuring user privacy during training and inference.
The research addresses challenges in balancing privacy guarantees with performance in complex AI tasks.

📖 Full Retelling

arXiv:2603.04894v1 Announce Type: new Abstract: Vision-language models are increasingly applied to sensitive domains such as medical imaging and personal photographs, yet existing differentially private methods for in-context learning are limited to few-shot, text-only settings because privacy cost scales with the number of tokens processed. We present Differentially Private Multimodal Task Vectors (DP-MTV), the first framework enabling many-shot multimodal in-context learning with formal $(\va

🏷️ Themes

Privacy, AI Learning

Entity Intersection Graph

No entity connections available yet for this article.

}

Original Source

              --> Computer Science > Artificial Intelligence arXiv:2603.04894 [Submitted on 5 Mar 2026] Title: Differentially Private Multimodal In-Context Learning Authors: Ivoline C. Ngong , Zarreen Reza , Joseph P. Near View a PDF of the paper titled Differentially Private Multimodal In-Context Learning, by Ivoline C. Ngong and 2 other authors View PDF HTML Abstract: Vision-language models are increasingly applied to sensitive domains such as medical imaging and personal photographs, yet existing differentially private methods for in-context learning are limited to few-shot, text-only settings because privacy cost scales with the number of tokens processed. We present Differentially Private Multimodal Task Vectors (DP-MTV), the first framework enabling many-shot multimodal in-context learning with formal $(\varepsilon, \delta)$-differential privacy by aggregating hundreds of demonstrations into compact task vectors in activation space. DP-MTV partitions private data into disjoint chunks, applies per-layer clipping to bound sensitivity, and adds calibrated noise to the aggregate, requiring only a single noise addition that enables unlimited inference queries. We evaluate on eight benchmarks across three VLM architectures, supporting deployment with or without auxiliary data. At $\varepsilon=1.0$, DP-MTV achieves 50% on VizWiz compared to 55% non-private and 35% zero-shot, preserving most of the gain from in-context learning under meaningful privacy constraints. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2603.04894 [cs.AI] (or arXiv:2603.04894v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2603.04894 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Ivoline Ngong [ view email ] [v1] Thu, 5 Mar 2026 07:36:02 UTC (811 KB) Full-text links: Access Paper: View a PDF of the paper titled Differentially Private Multimodal In-Context Learning, by Ivoline C. Ngong and 2 other authors View PDF HTML TeX Source view li...
            

Read full article at source

Source

arxiv.org

Differentially Private Multimodal In-Context Learning

📌 Key Takeaways

📖 Full Retelling

🏷️ Themes

Entity Intersection Graph

Source

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine