CASR: Refining Action Segmentation via marginalizing frame-level causal relationships

Sep 21, 2025·

Keqing Du, Xinyu Yang, Hang Chen

· 0 min read

Abstract

Integrating deep learning and causal discovery has increased the necessity for a causal relationship between frames as evidence for explainability in Temporal Action Segmentation (TAS) tasks. However, frame-level causal relationships apparently emerge noise outside the segment, making it infeasible to suggest macro action relationships through frame relationships. To address this research gap, we propose a method of marginalizing frame-level noise relationships and introduce a Causal Abstraction Segmentation Refiner (CASR) to enhance the segmentation ability. Specifically, we retain all cross-segment relationships while discarding all inter-segment relationships over the frame-level model, satisfying a consistent mapping of causal abstraction in terms of action semantics from frames to segments. Given the pre-segmentation of the backbone, we treat the whitening frame relationships of the same and different segments in a video as positive and negative cases, respectively. Through contrastive learning, we identify whether each frame belongs to the corresponding segment, thereby enhancing the segmentation performance. In addition, we propose a loss function independent of the action segment engineer to evaluate the causal interpretability of segmentation results. Extensive experimental results on mainstream datasets indicate that our method not only significantly surpasses existing methods in action segmentation performance, but also performs better in evaluating causal models. Our CASR can be plugged into various action segmentation engineers (MS-TCN++, ASRF, C2F-TCN, CETNet) with different backbones.

Type

Journal article

Publication

IEEE Transactions on Multimedia