Generative adversarial networks are widely used for video generation. However, the exact foundations of the synthesis are not fully understood, and some flaws occur. For instance, fine details appear to be fixed in pixel coordinates rather than appearing on the surfaces of depicted objects.
A recent study tries to create more natural architecture, where the exact position of each feature is exclusively inherited from the underlying coarse features. Researchers find that current upsampling filters are not aggressive enough in suppressing aliasing, which is an important reason why networks partially bypass the hierarchical construction.
A solution to aliasing caused by pointwise nonlinearities is proposed by considering their effect in the continuous domain and appropriately filtering the results. After the adjustments, details are correctly attached to underlying surfaces, and the quality of generated videos is improved.
We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. This manifests itself as, e.g., detail appearing to be glued to image coordinates instead of the surfaces of depicted objects. We trace the root cause to careless signal processing that causes aliasing in the generator network. Interpreting all signals in the network as continuous, we derive generally applicable, small architectural changes that guarantee that unwanted information cannot leak into the hierarchical synthesis process. The resulting networks match the FID of StyleGAN2 but differ dramatically in their internal representations, and they are fully equivariant to translation and rotation even at subpixel scales. Our results pave the way for generative models better suited for video and animation.