Image Manipulation

Mike Klase
May 11
1 min read

Updated: May 16

Modern AI models can create detailed images from simple text descriptions. Generative Image AI (like Midjourney, Stable Diffusion, and DALL-E) uses text or image prompts to generate high-quality images. These models are trained on billions of curated images. Two main types of GenAI models dominate today: Diffusion Models and Generative Adversarial Networks (GANs).

DEFINITIONS:

Diffusion models are inspired by how food coloring spreads in water. They start with a clear image and gradually turn it into static. The model learns how to reverse this process, turning static back into a clear image. The training process involves turning images into static, while generating new images is like reversing that process. The randomness of the static is why diffusion models create different images each time, even with the same prompts.

Generative Adversarial Networks (GANs) use two models: a Generator and a Discriminator. The Generator creates images, while the Discriminator checks if they look real. If the Discriminator spots a fake, the Generator gets feedback to improve. This process repeats until the Generator produces convincing images.