Discrete Distribution Networks
A Novel Generative Model with Simple Principles and Unique Properties
Anonymous authors
(Code/Talk Coming soon)
Contributions of this paper:
Left: Illustrates the process of image reconstruction and latent acquisition in DDN. Each layer of DDN outputs
Right: Shows the tree-structured representation space of DDN's latent variables. Each sample can be mapped to a leaf node on this tree.
We introduce a novel generative model, the Discrete Distribution Networks (DDN), that approximates data distribution using hierarchical discrete distributions. We posit that since the features within a network inherently contain distributional information, liberating the network from a single output to concurrently generate multiple samples proves to be highly effective. Therefore, DDN fits the target distribution, including continuous ones, by generating multiple discrete sample points. To capture finer details of the target data, DDN selects the output that is closest to the Ground Truth (GT) from the coarse results generated in the first layer. This selected output is then fed back into the network as a condition for the second layer, thereby generating new outputs more similar to the GT. As the number of DDN layers increases, the representational space of the outputs expands exponentially, and the generated samples become increasingly similar to the GT. This hierarchical output pattern of discrete distributions endows DDN with two intriguing properties: highly compressed representation and more general zero-shot conditional generation. We demonstrate the efficacy of DDN and these intriguing properties through experiments on CIFAR-10 and FFHQ.
DDN enables more general zero-shot conditional generation. DDN supports zero-shot conditional generation across non-pixel domains, and notably, without relying on gradient, such as text-to-image generation using a black-box CLIP model. Images enclosed in yellow borders serve as the ground truth. The abbreviations in the table header correspond to their respective tasks as follows: โSRโ stands for Super-Resolution, with the following digit indicating the resolution of the condition. โSTโ denotes Style Transfer, which computes Perceptual Losses with the condition.
(a) The data flow during the training phase of DDN is shown at the top. As the network depth increases, the generated images become increasingly similar to the training images. Within each Discrete Distribution Layer (DDL),
The DDN model consists of
Here,
By recursively unfolding the above equations, we can derive the latent variable
Here,
The numerical values at the bottom of each figure represent the Kullback-Leibler (KL) divergence. Due to phenomena such as โdead nodesโ and โdensity shiftโ, the application of Gradient Descent alone fails to properly fit the Ground Truth (GT) density. However, by employing the Split-and-Prune strategy, the KL divergence is reduced to even lower than that of the Real Samples.
The text at the top is the guide text for that column.
The DDN balances the steering forces of CLIP and Inpainting according to their associated weights.
Columns 4 and 5 display the generated results under the guidance of other images, where the produced image strives to adhere to the style of the guided image as closely as possible while ensuring compliance with the condition. The resolution of the generated images is 256x256.
We trained a DDN with output level