Omnimatte: Associating Objects And Their Effects In Video

Current computer vision technologies can segment objects in images and videos; however, little attention has been paid to determining the effects of objects, for instance, shadows of a human on the floor and distant walls or reflections in windows.

A recent study proposes a method to identify those effects in videos.

Video editing. Image credit: DaleshTV via Wikimedia, CC-BY-SA-4.0

Given an input video with segmented moving subjects, the model produces an opacity map and color image that includes the subject and segment regions correlated with it. The approach uses self-supervised training without observing additional examples. It can detect the effects of a variety of objects, like animals, cars, and people, and captures different effects such as shadows, reflections, dust, and smoke. The introduced task can be useful in domains of video editing as object removal or background replacement.

Computer vision is increasingly effective at segmenting objects in images and videos; however, scene effects related to the objects—shadows, reflections, generated smoke, etc—are typically overlooked. Identifying such scene effects and associating them with the objects producing them is important for improving our fundamental understanding of visual scenes, and can also assist a variety of applications such as removing, duplicating, or enhancing objects in video. In this work, we take a step towards solving this novel problem of automatically associating objects with their effects in video. Given an ordinary video and a rough segmentation mask over time of one or more subjects of interest, we estimate an omnimatte for each subject—an alpha matte and color image that includes the subject along with all its related time-varying scene elements. Our model is trained only on the input video in a self-supervised manner, without any manual labels, and is generic—it produces omnimattes automatically for arbitrary objects and a variety of effects. We show results on real-world videos containing interactions between different types of subjects (cars, animals, people) and complex effects, ranging from semi-transparent elements such as smoke and reflections, to fully opaque effects such as objects attached to the subject.

Research paper: Lu, E., Cole, F., Dekel, T., Zisserman, A., Freeman, W. T., and Rubinstein, M., “Omnimatte: Associating Objects and Their Effects in Video”, 2021. Link: https://arxiv.org/abs/2105.06993