GEM-TFL: Bridging Weak and Full Supervision for Forgery Localization through EM-Guided Decomposition and Temporal Refinement
#forgery localization #weak supervision #full supervision #EM-guided decomposition #temporal refinement #video analysis #deepfake detection
📌 Key Takeaways
- GEM-TFL is a new method for forgery localization in videos.
- It bridges weak and full supervision using EM-guided decomposition.
- The approach includes temporal refinement to improve accuracy.
- It aims to enhance detection of manipulated or forged content.
📖 Full Retelling
arXiv:2603.05095v1 Announce Type: cross
Abstract: Temporal Forgery Localization (TFL) aims to precisely identify manipulated segments within videos or audio streams, providing interpretable evidence for multimedia forensics and security. While most existing TFL methods rely on dense frame-level labels in a fully supervised manner, Weakly Supervised TFL (WS-TFL) reduces labeling cost by learning only from binary video-level labels. However, current WS-TFL approaches suffer from mismatched traini
🏷️ Themes
Video Forensics, Machine Learning
Entity Intersection Graph
No entity connections available yet for this article.
Original Source
--> Computer Science > Computer Vision and Pattern Recognition arXiv:2603.05095 [Submitted on 5 Mar 2026] Title: GEM-TFL: Bridging Weak and Full Supervision for Forgery Localization through EM-Guided Decomposition and Temporal Refinement Authors: Xiaodong Zhu , Yuanming Zheng , Suting Wang , Junqi Yang , Yuhong Yang , Weiping Tu , Zhongyuan Wang View a PDF of the paper titled GEM-TFL: Bridging Weak and Full Supervision for Forgery Localization through EM-Guided Decomposition and Temporal Refinement, by Xiaodong Zhu and 6 other authors View PDF HTML Abstract: Temporal Forgery Localization aims to precisely identify manipulated segments within videos or audio streams, providing interpretable evidence for multimedia forensics and security. While most existing TFL methods rely on dense frame-level labels in a fully supervised manner, Weakly Supervised TFL (WS-TFL) reduces labeling cost by learning only from binary video-level labels. However, current WS-TFL approaches suffer from mismatched training and inference objectives, limited supervision from binary labels, gradient blockage caused by non-differentiable top-k aggregation, and the absence of explicit modeling of inter-proposal relationships. To address these issues, we propose GEM-TFL (Graph-based EM-powered Temporal Forgery Localization), a two-phase classification-regression framework that effectively bridges the supervision gap between training and inference. Built upon this foundation, (1) we enhance weak supervision by reformulating binary labels into multi-dimensional latent attributes through an EM-based optimization process; (2) we introduce a training-free temporal consistency refinement that realigns frame-level predictions for smoother temporal dynamics 3) we design a graph-based proposal refinement module that models temporal-semantic relationships among proposals for globally consistent confidence estimation. Extensive experiments on benchmark datasets demonstrate that GEM-TFL achieves more accurate a...
Read full article at source