In-Domain GAN Inversion for Real Image Editing
Last updated
Last updated
GANs learn a deep generative model that can synthesize novel, high-dimensional data samples. The latent space is known to encode rich semantic information, and varying the latent code leads to manipulating the corresponding attributes occurring in the generated image. You can learn more about generative models and latent space in this Towards Deep Generative Modeling with W&B report.
How about applying specific manipulations to your face image? Maybe you want to add sunglasses to your input image. GANs cannot take a particular image as input to infer its latent code.
The GAN inversion technique overcomes this. This technique allows you to input a real image(a face), which is mapped to the latent code. The aim is to have the most accurate latent code for recovering the input image using the generator. This is done either by learning an extra encoder beyond the GAN or directly optimizing the latent code for an individual image. Techniques like in-domain GAN inversion combine these two ideas by using the learned encoder to generate an initialization for optimization.
Given a pre-trained GAN model, GAN inversion aims at finding the most accurate latent code for the input image to reconstruct the input image. The generator of the GAN does this reconstruction. To this end, existing inversion methods focus on rebuilding the target image at the pixel level without considering the inverted latent code's semantic information. If the process does not use this semantic information, it will fail to do high-quality semantic editing. We certainly don't want our edited face to look something else.
On this note, a suitable GAN inversion method should reconstruct the target image at the pixel level and align the inverted latent code with the semantic information encoded in the latent space. The authors have named their GAN inversion method in-domain GAN inversion because it uses in-domain code, which is semantically meaningful.