FAIR introduced TextStyleBrush, the first self-learning neural network that copies the style of text in a photo. TextStyleBrush allows you to replace text in an image using only one example word as input.
The complexity of the problem solved by TextStyleBrush is due not only to the unlimited number of text styles, but also to the unclassifiability of transformations such as rotations, bends, and deformations of the surface on which the text is applied. For these reasons, it is not possible to accurately segment text by its background, and it is impractical to create annotated examples of each possible text style for the entire alphabet and numbers.
Unlike the previous approaches, which define specific text parameters, such as font, TextStyleBrush implements a deeper learning approach, which consists in separating the content of the text from all the characteristics of its appearance and then transferring the style to the new text.
The neural network architecture is based on StyleGAN2, which, however, has limitations when solving the problem of copying the text style. First, StyleGAN2 is an unconditional model, meaning it generates images by selecting a random hidden vector. However, to transfer the text style, you need to control the output based on two separate sources: the target text content and the style. Second, the text style includes both large-scale features (such as font and size) and small-scale features (such as individual handwriting features). To circumvent these constraints, TextStyleBrush processes large-scale features by extracting the information inherent in each layer and then injecting it on each layer of the generator. In addition to creating the target image in the desired style, the generator also generates a soft mask image that denotes the foreground pixels (areas of text). Due to this, the generator takes into account both large-and small-scale style features. TextStyleBrush surpasses modern precision in both automated and manual texts.