DeepFaceDrawing: An Overview
Last updated
Last updated
Image-to-image translation is a class of computer vision deep learning tasks where the goal is to learn the mapping between an input image and an output image. In an analogy to automatic language translation(ex: English to French), image-to-image translation is the task of translating one possible representation of a scene to another representation. On a high level, it is predicting pixels from pixels. Some of the example tasks can be:
Deep Image Inpainting - Here, the input image is corrupted with missing pixel values, and the task is to fill those patches as the output. Here is an introduction to image inpainting with deep learning.
Colorize image - We feed black & white images and want to get them colorized most realistically. Here is a report by Boris Dayma covering DeOldify.
Sketch to Image - We feed an object's sketch and want to get a realistic image as output. The paper that this report is covering is an example to sketch to image translation. Let us dive into it.
Recent developments in image-to-image translation allow fast generation of face images from freehand sketches. However, all these techniques quickly overfit to input sketches. Thus, it requires professionally drawn sketches, limiting the number of people using applications based on these techniques. These deep learning-based solutions take an input sketch as a hard constraint and try to infer missing texture and shading information. The problem(task) is thus formulated as a reconstruction problem. Further, these models are trained with pairs of realistic images and their corresponding edge maps. This is why test images are required to have a quality similar to edge maps.
The authors of DeepFaceDrawing have proposed a novel sketch for image synthesis to tackle overfitting and the need for professionally drawn sketches. The following summarizes the contributions of this work:
The idea of input sketches as a soft constraint instead of hard constraint to guide image synthesis - the key idea here is to implicitly learn a space of plausible face sketches from real face sketch images and find the closest point in this space(using manifold projection) to approximate an input sketch. This enables the proposed work to produce high-quality face images even from rough/incomplete sketch.
Local to global approach - the idea of learning space of plausible face sketches globally is not feasible due to limited training data. The authors thus proposed to learn feature embeddings of key face components. These key components include eyes, mouth, and nose. The idea here is to push the corresponding components in the input sketch towards underlying component manifolds learned.
Novel deep neural network to map the embedded component features to realistic images with multi-channel feature maps as intermediate results to improve information flow(More in the next section).
Before we go through the proposed deep learning framework, here is a video by TwoMinutePapers. This will help build more intuition about this particular deep learning task.