Online unsupervised video object segmentation via contrastive motion clustering

Xi, Lin ORCID: https://orcid.org/0000-0001-6075-5614, Chen, Weihai, Wu, Xingming, Liu, Zhong and Li, Zhengguo (2024) Online unsupervised video object segmentation via contrastive motion clustering. IEEE Transactions on Circuits and Systems for Video Technology, 34 (2). pp. 995-1006. ISSN 1051-8215

[thumbnail of Online_Unsupervised_Video_Object_Segmentation_via_Contrastive_Motion_Clustering]
Preview
PDF (Online_Unsupervised_Video_Object_Segmentation_via_Contrastive_Motion_Clustering) - Accepted Version
Download (5MB) | Preview

Abstract

Online unsupervised video object segmentation (UVOS) uses the previous frames as its input to automatically separate the primary object(s) from a streaming video without using any further manual annotation. A major challenge is that the model has no access to the future and must rely solely on the history, i.e., the segmentation mask is predicted from the current frame as soon as it is captured. In this work, a novel contrastive motion clustering algorithm with an optical flow as its input is proposed for the online UVOS by exploiting the common fate principle that visual elements tend to be perceived as a group if they possess the same motion pattern. We build a simple and effective auto-encoder to iteratively summarize non-learnable prototypical bases for the motion pattern, while the bases in turn help learn the representation of the embedding network. Further, a contrastive learning strategy based on a boundary prior is developed to improve foreground and background feature discrimination in the representation learning stage. The proposed algorithm can be optimized on arbitrarily-scale data (i.e., frame, clip, dataset) and performed in an online fashion. Experiments on DAVIS 16, FBMS, and SegTrackV2 datasets show that the accuracy of our method surpasses the previous state-of-the-art (SoTA) online UVOS method by a margin of 0.8%, 2.9%, and 1.1%, respectively. Furthermore, by using an online deep subspace clustering to tackle the motion grouping, our method is able to achieve higher accuracy at 3 × faster inference time compared to SoTA online UVOS method, and making a good trade-off between effectiveness and efficiency. Our code is available at https://github.com/xilin1991/CluterNet.

Item Type: Article
Additional Information: Funding Information: This work was supported in part by the National Natural Science Foundation of China under Grant 51975029 and U1909215, the Key Research and Development Program of Zhejiang Province under Grant 2021C03050, the Scientific Research Project of Agriculture and Social Development of Hangzhou under Grant 2020ZDSJ0881, and in part by the National Natural Science Foundation of China under Grant 61620106012 and 61573048.
Uncontrolled Keywords: clustering methods,image motion analysis,object segmentation,optical flow,self-supervised learning,unsupervised learning,media technology,electrical and electronic engineering ,/dk/atira/pure/subjectarea/asjc/2200/2214
Faculty \ School: Faculty of Science > School of Computing Sciences
Related URLs:
Depositing User: LivePure Connector
Date Deposited: 05 Nov 2024 12:30
Last Modified: 12 Nov 2024 13:30
URI: https://ueaeprints.uea.ac.uk/id/eprint/97505
DOI: 10.1109/TCSVT.2023.3288878

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item