DDPM 原论文
Ho et al., 2020
一句话定义:扩散模型是一种生成模型,通过学习逆转噪声添加过程来生成数据。训练时逐步给图片加噪声直到变成纯噪声;生成时从纯噪声开始,逐步去噪还原成图片。
核心包含两个过程:
代表模型:DDPM (Denoising Diffusion Probabilistic Models), Stable Diffusion, DALL-E 2/3, Midjourney。
想象一幅名画:
关键是:修复师不知道原画长什么样!他只学习了”墨水是怎么泼上去的规律”,然后逆向操作。
扩散模型就是这个修复师:
亲眼观察前向扩散(加噪)和反向扩散(去噪)的过程!
💡 前向扩散将图像变成噪声 | 反向扩散从噪声恢复图像
| 模型 | 优点 | 缺点 |
|---|---|---|
| GAN | 生成速度快,质量高 | 训练不稳定,模式坍塌 |
| VAE | 训练稳定,有潜空间 | 生成图像模糊 |
| Diffusion | 质量最高,训练稳定 | 生成速度慢(需多步) |
扩散模型胜出原因:
从数据 开始,逐步添加噪声:
重参数化技巧:可以直接从 跳到 :
学习逆转噪声添加:
简化后的损失函数:
从 开始,逐步去噪:
for t = T, T-1, ..., 1: z ~ N(0, I) if t > 1 else z = 0 x_{t-1} = (1/√α_t) * (x_t - (β_t/√(1-ᾱ_t)) * ε_θ(x_t, t)) + σ_t * zimport torchimport torch.nn as nnimport torch.nn.functional as F
class SimpleDiffusion: def __init__(self, num_steps=1000, beta_start=1e-4, beta_end=0.02): self.num_steps = num_steps
# 线性噪声调度 self.betas = torch.linspace(beta_start, beta_end, num_steps) self.alphas = 1 - self.betas self.alpha_bars = torch.cumprod(self.alphas, dim=0)
def forward_diffusion(self, x0, t): """前向扩散: x0 -> xt""" # 获取对应时间步的参数 alpha_bar = self.alpha_bars[t].view(-1, 1, 1, 1)
# 采样噪声 noise = torch.randn_like(x0)
# 重参数化 xt = torch.sqrt(alpha_bar) * x0 + torch.sqrt(1 - alpha_bar) * noise
return xt, noise
def training_step(self, model, x0): """训练一步""" batch_size = x0.shape[0]
# 随机采样时间步 t = torch.randint(0, self.num_steps, (batch_size,))
# 前向扩散 xt, noise = self.forward_diffusion(x0, t)
# 预测噪声 noise_pred = model(xt, t)
# MSE 损失 loss = F.mse_loss(noise_pred, noise)
return loss
@torch.no_grad() def sample(self, model, shape): """从噪声生成图像""" device = next(model.parameters()).device
# 从纯噪声开始 x = torch.randn(shape, device=device)
# 逐步去噪 for t in reversed(range(self.num_steps)): t_batch = torch.full((shape[0],), t, device=device, dtype=torch.long)
# 预测噪声 noise_pred = model(x, t_batch)
# 计算去噪参数 alpha = self.alphas[t] alpha_bar = self.alpha_bars[t] beta = self.betas[t]
# 去噪公式 if t > 0: noise = torch.randn_like(x) else: noise = 0
x = (1 / torch.sqrt(alpha)) * ( x - (beta / torch.sqrt(1 - alpha_bar)) * noise_pred ) + torch.sqrt(beta) * noise
return x
# 简单的 UNet 噪声预测网络(实际中会更复杂)class SimpleNoisePredictor(nn.Module): def __init__(self, channels=3, dim=64): super().__init__() self.time_embed = nn.Embedding(1000, dim) self.net = nn.Sequential( nn.Conv2d(channels, dim, 3, padding=1), nn.ReLU(), nn.Conv2d(dim, dim, 3, padding=1), nn.ReLU(), nn.Conv2d(dim, channels, 3, padding=1), )
def forward(self, x, t): # 简化版:实际中 time embedding 会更复杂 t_emb = self.time_embed(t)[:, :, None, None] return self.net(x) + t_emb.expand(-1, -1, x.shape[2], x.shape[3])[:, :x.shape[1]]from diffusers import StableDiffusionPipelineimport torch
# 加载预训练模型pipe = StableDiffusionPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)pipe = pipe.to("cuda")
# 文本到图像生成prompt = "a photo of an astronaut riding a horse on mars"image = pipe(prompt).images[0]image.save("astronaut.png")
# 使用 negative prompt 提高质量image = pipe( prompt="a beautiful sunset over mountains, 4k, detailed", negative_prompt="blurry, low quality, distorted", num_inference_steps=50, guidance_scale=7.5,).images[0]