当前位置：首页 > news >正文

Diffusion Model

news 2025/4/19 5:58:28

Diffusion Model 是图片生成模型，Diffusion 的原理是将杂音图片还原成原始图片，通过提示词生成最终的图片。本文只是用 Diffusion Model，不输入任何的提示词。

下图为 Stable Diffusion 的网络架构，本文使用的是一个 UNet，没有 Text 也没有 Latent。
在这里插入图片描述

下载模型并生成

本文使用的模型是 google/ddpm-celebahq-256，DDPM 模型，这个模型没有做过任何优化，需要 1000 步才能生成图片，这里只是用来学习，需要 GPU 运行。

from diffusers import DDPMPipeline, DDIMPipeline, PNDMPipeline
model_id = "google/ddpm-celebahq-256"# load model and scheduler
image_pipe = DDPMPipeline.from_pretrained("google/ddpm-celebahq-256")
image_pipe.to("cuda")image = image_pipe().images[0]

模型输出的为一张图片
在这里插入图片描述
使用教堂数据集

from diffusers import UNet2DModelrepo_id = "google/ddpm-church-256"
model = UNet2DModel.from_pretrained(repo_id)

Model 为 UNet 架构

model.config

在这里插入图片描述
UNet 就是从 Noise Image 转到 Image 的过程，首先要创建一个 Noise Image。

import torchtorch.manual_seed(0)noisy_sample = torch.randn(1, model.config.in_channels, model.config.sample_size, model.config.sample_size
)
noisy_sample.shape

在这里插入图片描述
给定时间戳，可以得到输出的图片，输入/输出的形状一致

with torch.no_grad():noisy_residual = model(sample=noisy_sample, timestep=2).sample

在这里插入图片描述
Diffusion Model 中的 Scheduler 在训练过程中负责在图片中添加噪声，在推理过程中，同样需要它。

from diffusers import DDPMSchedulerscheduler = DDPMScheduler.from_config(repo_id)

在这里插入图片描述
进行推理，并将过程中的图片进行打印

import PIL.Image
import numpy as npdef display_sample(sample, i):image_processed = sample.cpu().permute(0, 2, 3, 1)image_processed = (image_processed + 1.0) * 127.5image_processed = image_processed.numpy().astype(np.uint8)image_pil = PIL.Image.fromarray(image_processed[0])display(f"Image at step {i}")display(image_pil)import tqdm
model.to("cuda")
noisy_sample = noisy_sample.to("cuda")
sample = noisy_samplefor i, t in enumerate(tqdm.tqdm(scheduler.timesteps)):# 1. predict noise residualwith torch.no_grad():residual = model(sample, t).sample# 2. compute less noisy image and set x_t -> x_t-1sample = scheduler.step(residual, t, sample).prev_sample# 3. optionally look at imageif (i + 1) % 300 == 0:display_sample(sample, i + 1)

可以看到图片逐渐清晰。
在这里插入图片描述