Aquileo | Cycle Generative Adversarial Network (CycleGAN)

CycleGAN is a GAN architecture used for image-to-image translation without requiring paired training data. It uses two generators and two discriminators to transform images between domains and reconstruct the original image using cycle consistency loss.

pairedvsunpaired — Paired vs Unpaired images

Performs image translation without paired images
Uses two generators and two discriminators
Learns reversible mappings between image domains
Uses cycle consistency loss for reconstruction
Applied in style transfer and image transformation tasks

Architecture of CycleGAN

CycleGAN uses two generators and two discriminators to perform image translation between two domains without paired data.

1. Generators

Create new images in the target style.

CycleGAN has two generators G and F:

G transforms images from domain X like photos to domain Y like artwork.
F transforms images from domain Y back to domain X.

The generator mapping functions are as follows:

\begin{array}{l} G : X \rightarrow Y \\ F : Y \rightarrow X \end{array}

where X is the input image distribution and Y is the desired output distribution such as Van Gogh styles.

2. Discriminators

Decide if images are real (from dataset) or fake (generated). There are two discriminators Dₓ and Dᵧ.

Dₓ distinguishes between real images from X and generated images from F(y).
Dᵧ distinguishes between real images from Y and generated images from G(x).

Cycle Consistency Loss

To further regularize the mappings the CycleGAN uses two more loss function in addition to adversarial loss.

1. Forward Cycle Consistency Loss: Ensures that when we apply G and then F to an image we get back the original image

For example: .x --> G(x) -->F(G(x)) \approx x

2. Backward Cycle Consistency Loss: Ensures that when we apply F and then G to an image we get back the original image.

For example: x \xrightarrow{G} G(x) \xrightarrow{F} F(G(x)) \approx x

Generator Architecture

Each CycleGAN generator consists of an encoder, transformer and decoder for image translation.

Encoder: The input image is passed through three convolution layers which extract features and compress the image while increasing the number of channels. For example a 256×256×3 image is reduced to 64×64×256 after this step.
Transformer: The encoded image is processed through 6 or 9 residual blocks depending on the input size which helps retain important image details.
Decoder: The transformed image is up-sampled using two deconvolution layers and restoring it to its original size.

Generator Structure:

c7s1-64 → d128 → d256 → R256 (×6 or 9) → u128 → u64 → c7s1-3

c7s1-k: 7×7 convolution layer with k filters.
dk: 3×3 convolution with stride 2 (down-sampling).
Rk: Residual block with two 3×3 convolutions.
uk: Fractional-stride deconvolution (up-sampling).

Discriminator Architecture (PatchGAN)

In CycleGAN the discriminator uses a PatchGAN instead of a regular GAN discriminator.

A regular GAN discriminator looks at the entire image (e.g 256×256 pixels) and outputs a single score that says whether the whole image is real or fake.
PatchGAN breaks the image into smaller patches (e.g 70×70 patches). It outputs a grid (like 70×70 values) where each value judges if the corresponding patch is real or fake.

Discriminator Structure

C64 → C128 → C256 → C512 → Final Convolution

Ck: 4×4 convolution with k filters, InstanceNorm and LeakyReLU except the first layer.
The final layer produces a 1×1 output and marking real vs. fake patches.

Cost Function in CycleGAN

CycleGAN uses a combined loss function to train generators and discriminators effectively. The total cost function consists of adversarial loss and cycle consistency loss.

1. Adversarial Loss

Adversarial loss helps generators produce realistic images that can fool the discriminators.

Loss_{advers}\left ( G, D_y, X, Y \right ) =\frac{1}{m}\sum \left ( 1 - D_y\left ( G\left ( x \right ) \right ) \right )^{2}
Loss_{advers}\left ( F, D_x, Y, X \right ) =\frac{1}{m}\sum \left ( 1 - D_x\left ( F\left ( y \right ) \right ) \right )^{2}

Encourages generators to create realistic images
Helps discriminators distinguish real and fake images

2. Cycle Consistency Loss

Cycle consistency loss ensures that translating an image to another domain and back reconstructs the original image.

Preserves important image content during translation
Ensures meaningful reversible mappings between domains

Loss_{cyc}\left ( G, F, X, Y \right ) =\frac{1}{m}\left [ \left ( F\left ( G\left ( x_i \right ) \right )-x_i \right ) +\left ( G\left ( F\left ( y_i \right ) \right )-y_i \right ) \right ]

The Cost function we used is the sum of adversarial loss and cyclic consistent loss:

L\left ( G, F, D_x, D_y \right ) = L_{advers}\left (G, D_y, X, Y \right ) + L_{advers}\left (F, D_x, Y, X \right ) + \lambda L_{cycl}\left ( G, F, X, Y \right )

and our aim is :

arg \underset{G, F}{min}\underset{D_x, D_y}{max}L\left ( G, F, D_x, D_y \right )

Applications

1. Collection Style Transfer: CycleGAN can learn styles from entire artwork collections such as Van Gogh, Monet and Cezanne, allowing it to generate diverse artistic image styles.

Comparison of different Style Transfer Results

2. Object Transformation: CycleGAN can transform objects between different classes, such as apples to oranges or zebras to horses, making it useful for image editing and content generation.

3. Seasonal Transfer: CycleGAN can transform images between different seasons, such as converting winter scenes into summer landscapes and vice versa.

4. Photo Generation from Paintings: CycleGAN can transform paintings into realistic photos and convert photos into artistic paintings, making it useful for artistic and image editing applications. This loss can be defined as

L_{identity}\left ( G, F \right ) =\mathbb{E}_{y~p\left ( y \right )}\left [ \left \| G(y)-y \right \|_1 \right ] + \mathbb{E}_{x~p\left ( x \right )}\left [ \left \| F(x)-x \right \|_1 \right ]

5. Photo Enhancement: CycleGAN can enhance smartphone photos to resemble DSLR-quality images by improving visual quality and depth effects.

Evaluating CycleGAN’s Performance

CycleGAN performance is evaluated using both human perception and quantitative metrics.

AMT Perceptual Studies: Real users compare generated images with actual images to judge visual realism
FCN Scores: Measure image understanding accuracy using metrics such as pixel accuracy and Intersection over Union (IoU)

Limitations

CycleGAN is effective for texture and style transformation but has limitations in handling major structural changes.

Works better for changing textures and colors than object shapes
Struggles with large structural modifications
Generated images may sometimes contain distortions or unrealistic details
Can produce unpredictable results in complex transformations

Cycle Generative Adversarial Network (CycleGAN)