当前位置: 首页 > news >正文

论文阅读笔记:Denoising Diffusion Implicit Models (2)

0、快速访问

论文阅读笔记:Denoising Diffusion Implicit Models (1)
论文阅读笔记:Denoising Diffusion Implicit Models (2)
论文阅读笔记:Denoising Diffusion Implicit Models (3)
论文阅读笔记:Denoising Diffusion Implicit Models (4)

3、非马尔可夫正向加噪过程

与DDPM中的正向加噪过程不同,DDIM的加噪过程是非马尔可夫的,按照论文中的表述,如公式(1) 和(2)所示。
q σ ( x 1 : T ∣ x 0 ) : = q σ ( x T ∣ x 0 ) ∏ t = 2 T q σ ( x t − 1 ∣ x t , x 0 ) \begin{equation} \begin{split} q_{\sigma}(x_{1:T}|x_0):&=q_{\sigma}(x_T|x_0)\prod_{t=2}^{T}q_{\sigma}(x_{t-1}|x_t,x_0) \end{split} \end{equation} qσ(x1:Tx0):=qσ(xTx0)t=2Tqσ(xt1xt,x0)
式中
q σ ( x T ∣ x 0 ) = N ( x T ; α T x 0 , ( 1 − α T ) I ) ⇔ x T = α T ⋅ x 0 + 1 − α T ⋅ z ( z 为标准正态分布 ) q σ ( x t − 1 ∣ x t , x 0 ) = N ( x t − 1 ; 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 ⏟ 令其等于 = μ ( x t , x 0 ) , σ t 2 I ) = N ( x t − 1 ; μ ( x t , x 0 ) , σ t 2 I ) \begin{equation} \begin{split} q_{\sigma}(x_T|x_0)&=N(x_T;\sqrt{\alpha_T}x_0,(1-\alpha_T)I)\Leftrightarrow x_T=\sqrt{\alpha_T}\cdot x_0+\sqrt{1-\alpha_T}\cdot z(z为标准正态分布) \\ q_{\sigma}(x_{t-1}|x_t,x_0)&=N\Bigg(x_{t-1};\underbrace{\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0}_{令其等于=\mu(x_t,x_0)} ,\sigma_t^2 I\Bigg) \\ &=N\Bigg(x_{t-1};\mu(x_t,x_0) ,\sigma_t^2 I\Bigg) \\ \end{split} \end{equation} qσ(xTx0)qσ(xt1xt,x0)=N(xT;αT x0,(1αT)I)xT=αT x0+1αT z(z为标准正态分布)=N(xt1;令其等于=μ(xt,x0) 1αt1αt1σt2 xt+[αt1 1αt αt(1αt1σt2 )]x0,σt2I)=N(xt1;μ(xt,x0),σt2I)
下图展示了这个加噪过程
在这里插入图片描述

对于这个采样过程,首先证明以下引理:
Lemma 1: q σ ( x t ∣ x 0 ) = N ( x t ; α t x 0 , ( 1 − α t ) I ) ⇔ x t = α t ⋅ x 0 + 1 − α t ⋅ z \begin{equation} \begin{split} \text{Lemma 1}:q_{\sigma}(x_t|x_0)&=N(x_t;\sqrt{\alpha_t} x_0,(1-\alpha_t)I) \\ \Leftrightarrow x_t&=\sqrt{\alpha_t}\cdot x_0+\sqrt{1-\alpha_t}\cdot z \\ \end{split} \end{equation} Lemma 1qσ(xtx0)xt=N(xt;αt x0,(1αt)I)=αt x0+1αt z
使用数学归纳法证明Lemma 1,方法分为3步,如所示

  1. t = T t=T t=T时, t = T t=T t=T时, q σ ( x T ∣ x 0 ) q_{\sigma}(x_T|x_0) qσ(xTx0)满足 x T = α T ⋅ x 0 + 1 − α T ⋅ z x_T=\sqrt{\alpha_T}\cdot x_0+\sqrt{1-\alpha_T}\cdot z xT=αT x0+1αT z,符合Lemma 1。
  2. 假设 t = t t=t t=t q σ ( x t ∣ x 0 ) q_{\sigma}(x_t|x_0) qσ(xtx0)满足Lemma 1,即 q σ ( x t ∣ x 0 ) = N ( x t ; α t x 0 , ( 1 − α t ) ) q_{\sigma}(x_t|x_0)=N\big(x_t;\sqrt{\alpha_t} x_0,(1-\alpha_t)\big) qσ(xtx0)=N(xt;αt x0,(1αt))
  3. 这一步需要证明:当 t = t − 1 t=t-1 t=t1时,由于 q σ ( x t − 1 ∣ x 0 ) q_{\sigma}(x_{t-1}|x_0) qσ(xt1x0)也满足Lemma 1。这个证明过程有两种方法。
    方法1:
    q σ ( x t − 1 ∣ x 0 ) q_{\sigma}(x_{t-1}|x_0) qσ(xt1x0) q σ ( x t − 1 , x t ∣ x 0 ) q_{\sigma}(x_{t-1},x_t|x_0) qσ(xt1,xtx0)的边缘分布,因此 q σ ( x t − 1 ∣ x 0 ) q_{\sigma}(x_{t-1}|x_0) qσ(xt1x0)满足公式(4)。
    q σ ( x t − 1 ∣ x 0 ) = ∫ q σ ( x t − 1 , x t ∣ x 0 ) ⋅ d x t = ∫ q σ ( x t ∣ x 0 ) ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t \begin{equation} \begin{split} q_{\sigma}(x_{t-1}|x_0)&= \int q_{\sigma}(x_{t-1},x_t|x_0) \cdot d{x_t}\\ &=\int q_{\sigma}(x_t|x_0)\cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot d{x_t} \end{split} \end{equation} qσ(xt1x0)=qσ(xt1,xtx0)dxt=qσ(xtx0)qσ(xt1xt,x0)dxt
    q σ ( x t − 1 ∣ x 0 ) q_{\sigma}(x_{t-1}|x_0) qσ(xt1x0)表示:在给定 x 0 x_0 x0的条件下, x t − 1 x_{t-1} xt1的分布。 x t − 1 x_{t-1} xt1是一个高斯分布,并且设其均值和方差分别为 μ \mu μ σ \sigma σ,其计算过程分别如公式(5)和公式(6)所示。
    μ = E ( q σ ( x t − 1 ∣ x 0 ) ) = ∫ x t − 1 ⋅ q σ ( x t − 1 ∣ x 0 ) ⋅ d x t − 1 = ∫ x t − 1 ⋅ ( ∫ q σ ( x t ∣ x 0 ) ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t ) ⋅ d x t − 1 = ∬ x t − 1 ⋅ q σ ( x t ∣ x 0 ) ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t d x t − 1 = ∫ ( ∫ x t − 1 ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t − 1 ) q σ ( x t ∣ x 0 ) ⋅ d x t = ∫ μ ( x t , x 0 ) ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t = E x t ∼ q σ ( x t ∣ x 0 ) ( μ ( x t , x 0 ) ) = E x t ∼ q σ ( x t ∣ x 0 ) ( 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 ) = 1 − α t − 1 − σ t 2 1 − α t ⋅ E x t ∼ q σ ( x t ∣ x 0 ) ( x t ) ⏟ = α t ⋅ x 0 + [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 = 1 − α t − 1 − σ t 2 1 − α t ⋅ x 0 + [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 = α t − 1 ⋅ x 0 \begin{equation} \begin{split} \mu&=E\big(q_{\sigma}(x_{t-1}|x_0)\big)\\ &=\int x_{t-1}\cdot q_{\sigma}(x_{t-1}|x_0) \cdot dx_{t-1} \\ &=\int x_{t-1} \cdot \Bigg( \int q_{\sigma}(x_t|x_0)\cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot d{x_t}\Bigg)\cdot dx_{t-1} \\ &= \iint x_{t-1}\cdot q_\sigma(x_t|x_0)\cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot dx_t dx_{t-1} \\ &=\int \Big(\int x_{t-1}\cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot dx_{t-1}\Big) q_\sigma(x_t|x_0)\cdot dx_t \\ &=\int \mu(x_t,x_0)\cdot q_\sigma(x_t|x_0)\cdot dx_t \\ &=E_{x_t\sim q_{\sigma}(x_t|x_0)}\Big(\mu(x_t,x_0)\Big)\\ &=E_{x_t\sim q_{\sigma}(x_t|x_0)}\Bigg(\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0\Bigg) \\ &=\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot \underbrace{E_{x_t\sim q_{\sigma}(x_t|x_0)}(x_t)}_{=\sqrt{\alpha_t}\cdot x_0}+\bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0 \\ &=\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_0+\bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0 \\ &=\sqrt{\alpha_{t-1}}\cdot x_0 \end{split} \end{equation} μ=E(qσ(xt1x0))=xt1qσ(xt1x0)dxt1=xt1(qσ(xtx0)qσ(xt1xt,x0)dxt)dxt1=xt1qσ(xtx0)qσ(xt1xt,x0)dxtdxt1=(xt1qσ(xt1xt,x0)dxt1)qσ(xtx0)dxt=μ(xt,x0)qσ(xtx0)dxt=Extqσ(xtx0)(μ(xt,x0))=Extqσ(xtx0)(1αt1αt1σt2 xt+[αt1 1αt αt(1αt1σt2 )]x0)=1αt1αt1σt2 =αt x0 Extqσ(xtx0)(xt)+[αt1 1αt αt(1αt1σt2 )]x0=1αt1αt1σt2 x0+[αt1 1αt αt(1αt1σt2 )]x0=αt1 x0
    σ 2 = V a r ( q σ ( x t − 1 ∣ x 0 ) ) = ∫ ( x t − 1 − μ ) 2 ⋅ q σ ( x t − 1 ∣ x 0 ) ⋅ d x t − 1 = ∫ ( x t − 1 2 − 2 μ ⋅ x t − 1 + μ 2 ) ⋅ ( ∫ q σ ( x t ∣ x 0 ) ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t ) ⋅ d x t − 1 = ∫ ∫ ( x t − 1 2 − 2 μ ⋅ x t − 1 + μ 2 ) ⋅ q σ ( x t ∣ x 0 ) ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t ⋅ d x t − 1 = ∬ x t − 1 2 ⋅ q σ ( x t ∣ x 0 ) ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t ⋅ d x t − 1 − ∬ 2 μ ⋅ x t − 1 ⋅ q σ ( x t ∣ x 0 ) ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t ⋅ d x t − 1 + ∬ μ 2 ⋅ q σ ( x t ∣ x 0 ) ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t ⋅ d x t − 1 = ∫ ( ∫ x t − 1 2 ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t − 1 ) ⏟ = E ( x t − 1 2 ) = μ ( x t , x 0 ) 2 + σ t 2 ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t − 2 ⋅ μ ∫ ( ∫ x t − 1 ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t − 1 ) ⏟ = E ( x t − 1 ) = μ ( x t , x 0 ) ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t + μ 2 ⋅ ∬ q σ ( x t ∣ x 0 ) ⋅ q σ ( x t − 1 ∣ x t , x 0 ) ⋅ d x t ⋅ d x t − 1 ⏟ = 1 = ∫ ( μ ( x t , x 0 ) 2 + σ t 2 ) ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t − 2 ⋅ μ ∫ μ ( x t , x 0 ) ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t ⏟ = μ + μ 2 = ∫ μ ( x t , x 0 ) 2 ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t + σ t 2 ⋅ ∫ q σ ( x t ∣ x 0 ) ⋅ d x t ⏟ = 1 − 2 ⋅ μ 2 + μ 2 = ∫ ( 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 ⏟ 为定值,设为 A ) 2 ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t + σ t 2 − μ 2 = ∫ ( 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + A ) 2 ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t + σ t 2 − μ 2 = ∫ ( 1 − α t − 1 − σ t 2 1 − α t ⋅ x t 2 + 2 ⋅ A ⋅ 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + A 2 ) 2 ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t + σ t 2 − μ 2 = 1 − α t − 1 − σ t 2 1 − α t ⋅ ∫ x t 2 ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t ⏟ = E x t ∼ q σ ( x t ∣ x 0 ) ( x t 2 ) + 2 ⋅ A ⋅ 1 − α t − 1 − σ t 2 1 − α t ⋅ ∫ x t ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t ⏟ = E x t ∼ q σ ( x t ∣ x 0 ) ( x t ) + A 2 ⋅ ∫ q σ ( x t ∣ x 0 ) ⋅ d x t ⏟ = 1 + σ t 2 − μ 2 = 1 − α t − 1 − σ t 2 1 − α t ⋅ ∫ x t 2 ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t ⏟ = E x t ∼ q σ ( x t ∣ x 0 ) ( x t 2 ) = α t x 0 2 + ( 1 − α t ) + 2 ⋅ A ⋅ 1 − α t − 1 − σ t 2 1 − α t ⋅ ∫ x t ⋅ q σ ( x t ∣ x 0 ) ⋅ d x t ⏟ = E x t ∼ q σ ( x t ∣ x 0 ) ( x t ) = α t ⋅ x 0 + A 2 ⋅ ∫ q σ ( x t ∣ x 0 ) ⋅ d x t ⏟ = 1 + σ t 2 − μ 2 = 1 − α t − 1 − σ t 2 1 − α t ⋅ [ α t x 0 2 + ( 1 − α t ) ] + 2 ⋅ ( [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 ) ⋅ 1 − α t − 1 − σ t 2 1 − α t ⋅ α t ⋅ x 0 + ( [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 ) 2 + σ t 2 − μ 2 = 1 − α t − 1 − σ t 2 1 − α t ⋅ [ α t x 0 2 + ( 1 − α t ) ] + 2 ⋅ α t − 1 ⋅ x 0 2 α t ⋅ 1 − α t − 1 − σ t 2 1 − α t − 2 ⋅ α t ⋅ x 0 2 ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t + x 0 2 ⋅ α t − 1 − 2 ⋅ x 0 2 ⋅ α t ⋅ α t − 1 ⋅ 1 − α t − 1 − σ t 2 1 − α t + x 0 2 ⋅ α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t + σ t 2 − μ 2 = 1 − α t − 1 − σ t 2 1 − α t ⋅ [ α t x 0 2 + ( 1 − α t ) − 2 ⋅ α t ⋅ x 0 2 + α t ⋅ x 0 2 ] + 2 ⋅ α t − 1 ⋅ x 0 2 α t ⋅ 1 − α t − 1 − σ t 2 1 − α t + x 0 2 ⋅ α t − 1 − 2 ⋅ x 0 2 ⋅ α t ⋅ α t − 1 ⋅ 1 − α t − 1 − σ t 2 1 − α t + σ t 2 − μ 2 = 1 − α t − 1 − σ t 2 1 − α t ⋅ [ α t x 0 2 + ( 1 − α t ) − 2 ⋅ α t ⋅ x 0 2 + α t ⋅ x 0 2 ] + 2 ⋅ α t − 1 ⋅ x 0 2 α t ⋅ 1 − α t − 1 − σ t 2 1 − α t + x 0 2 ⋅ α t − 1 − 2 ⋅ x 0 2 ⋅ α t ⋅ α t − 1 ⋅ 1 − α t − 1 − σ t 2 1 − α t + σ t 2 − μ 2 = 1 − α t − 1 − σ t 2 1 − α t ⋅ ( 1 − α t ) + 2 ⋅ α t − 1 ⋅ x 0 2 α t ⋅ 1 − α t − 1 − σ t 2 1 − α t + x 0 2 ⋅ α t − 1 − 2 ⋅ x 0 2 ⋅ α t ⋅ α t − 1 ⋅ 1 − α t − 1 − σ t 2 1 − α t + σ t 2 − μ 2 ⏟ = α t − 1 ⋅ x 0 2 = 1 − α t − 1 − σ t 2 1 − α t ⋅ ( 1 − α t ) + 2 ⋅ α t − 1 ⋅ x 0 2 α t ⋅ 1 − α t − 1 − σ t 2 1 − α t − 2 ⋅ x 0 2 ⋅ α t ⋅ α t − 1 ⋅ 1 − α t − 1 − σ t 2 1 − α t + σ t 2 = 1 − α t − 1 − σ t 2 1 − α t ⋅ ( 1 − α t ) + σ t 2 = 1 − α t − 1 − σ t 2 + σ t 2 = 1 − α t − 1 \begin{equation} \begin{split} \sigma^2&=Var\big(q_{\sigma}(x_{t-1}|x_0)\big)\\ &=\int (x_{t-1}-\mu)^2\cdot q_{\sigma}(x_{t-1}|x_0)\cdot dx_{t-1} \\ &=\int (x_{t-1}^2-2\mu\cdot x_{t-1}+\mu^2)\cdot \Big(\int q_{\sigma}(x_t|x_0)\cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot d{x_t} \Big)\cdot dx_{t-1} \\ &=\int \int (x_{t-1}^2-2\mu\cdot x_{t-1}+\mu^2)\cdot q_{\sigma}(x_t|x_0)\cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot d{x_t} \cdot dx_{t-1} \\ &=\iint x_{t-1}^2\cdot q_{\sigma}(x_t|x_0)\cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot d{x_t} \cdot dx_{t-1} -\iint 2\mu\cdot x_{t-1} \cdot q_{\sigma}(x_t|x_0)\cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot d{x_t} \cdot dx_{t-1}+ \iint \mu^2 \cdot q_{\sigma}(x_t|x_0)\cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot d{x_t} \cdot dx_{t-1} \\ &=\int \underbrace{\Bigg(\int x_{t-1}^2 \cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot dx_{t-1} \Bigg)}_{=E(x_{t-1}^2)=\mu(x_t,x_0)^2+\sigma_t^2} \cdot q_{\sigma}(x_t|x_0)\cdot d{x_t} -2 \cdot \mu \int \underbrace{\Bigg(\int x_{t-1} \cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot dx_{t-1} \Bigg)}_{=E(x_{t-1})=\mu(x_t,x_0)} \cdot q_{\sigma}(x_t|x_0)\cdot d{x_t} + \mu^2 \cdot \underbrace{\iint q_{\sigma}(x_t|x_0)\cdot q_{\sigma}(x_{t-1}|x_t,x_0) \cdot d{x_t} \cdot dx_{t-1}}_{=1} \\ &=\int \bigg(\mu(x_t,x_0)^2+\sigma_t^2\bigg) \cdot q_{\sigma}(x_t|x_0)\cdot d{x_t} -2 \cdot \mu \underbrace{ \int \mu(x_t,x_0) \cdot q_{\sigma}(x_t|x_0)\cdot d{x_t}}_{=\mu} + \mu^2 \\ &=\int \mu(x_t,x_0)^2 \cdot q_{\sigma}(x_t|x_0)\cdot d{x_t} + \sigma_t^2 \cdot \underbrace{ \int q_{\sigma}(x_t|x_0)\cdot d{x_t}}_{=1} -2 \cdot \mu ^2 + \mu^2\\ &=\int\Bigg(\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \underbrace{\bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0}_{为定值,设为A} \Bigg)^2 \cdot q_{\sigma}(x_t|x_0)\cdot d{x_t} + \sigma_t^2 - \mu ^2 \\ &=\int\Bigg(\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ A \Bigg)^2 \cdot q_{\sigma}(x_t|x_0)\cdot d{x_t} + \sigma_t^2 - \mu ^2 \\ &=\int\Bigg(\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}\cdot x_t^2+2\cdot A \cdot \sqrt{ \frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ A^2 \Bigg)^2 \cdot q_{\sigma}(x_t|x_0)\cdot d{x_t} + \sigma_t^2 - \mu ^2 \\ &=\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} \cdot \underbrace{\int x_t^2 \cdot q_{\sigma}(x_t|x_0)\cdot d{x_t}}_{=E_{x_t\sim q_{\sigma}(x_t|x_0)}(x_t^2)}+2\cdot A\cdot \sqrt {\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} }\cdot \underbrace{\int x_t\cdot q_{\sigma}(x_t|x_0)\cdot d{x_t}}_{=E_{x_t\sim q_{\sigma}(x_t|x_0)}(x_t)}+A^2 \cdot \underbrace{ \int q_{\sigma}(x_t|x_0)\cdot d{x_t} }_{=1}+ \sigma_t^2 - \mu ^2 \\ &=\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} \cdot \underbrace{\int x_t^2 \cdot q_{\sigma}(x_t|x_0)\cdot d{x_t}}_{=E_{x_t\sim q_{\sigma}(x_t|x_0)}(x_t^2)=\alpha_t x_0^2+(1-\alpha_t)}+2\cdot A\cdot \sqrt \frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} \cdot \underbrace{\int x_t\cdot q_{\sigma}(x_t|x_0)\cdot d{x_t}}_{=E_{x_t\sim q_{\sigma}(x_t|x_0)}(x_t)=\sqrt{\alpha_t}\cdot x_0}+A^2 \cdot \underbrace{ \int q_{\sigma}(x_t|x_0)\cdot d{x_t} }_{=1}+ \sigma_t^2 - \mu ^2 \\ &=\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} \cdot \bigg[\alpha_t x_0^2+(1-\alpha_t)\bigg]+2\cdot\Bigg(\bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0\Bigg) \cdot \sqrt \frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} \cdot \sqrt{\alpha_t}\cdot x_0+\Bigg(\bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0\Bigg)^2 + \sigma_t^2 - \mu ^2 \\ &=\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} \cdot \bigg[\alpha_t x_0^2+(1-\alpha_t)\bigg]+2\cdot \sqrt{\alpha_{t-1}}\cdot x_0^2 \sqrt{\alpha_t} \cdot \sqrt \frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} - \frac{2\cdot \alpha_t \cdot x_0^2 \cdot (1-\alpha_{t-1}-\sigma_t^2)}{1-\alpha_t}+x_0^2\cdot \alpha_{t-1}-\frac{2\cdot x_0^2 \cdot \sqrt{\alpha_t}\cdot \sqrt{\alpha_{t-1}}\cdot \sqrt{1-\alpha_{t-1}-\sigma_t^2}}{\sqrt{1-\alpha_t}}+ \frac{x_0^2 \cdot \alpha_t \cdot (1-\alpha_{t-1}-\sigma_t^2)}{1-\alpha_t}+ \sigma_t^2 - \mu ^2\\ &=\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} \cdot \bigg[\alpha_t x_0^2+(1-\alpha_t) -2 \cdot \alpha_t\cdot x_0^2 +\alpha_t \cdot x_0^2\bigg]+2\cdot \sqrt{\alpha_{t-1}}\cdot x_0^2 \sqrt{\alpha_t} \cdot \sqrt \frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} +x_0^2\cdot \alpha_{t-1}-\frac{2\cdot x_0^2 \cdot \sqrt{\alpha_t}\cdot \sqrt{\alpha_{t-1}}\cdot \sqrt{1-\alpha_{t-1}-\sigma_t^2}}{\sqrt{1-\alpha_t}}+ \sigma_t^2 - \mu ^2\\ &=\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} \cdot \bigg[\bcancel{\alpha_t x_0^2}+(1-\alpha_t) -\bcancel{2 \cdot \alpha_t\cdot x_0^2} +\bcancel{\alpha_t \cdot x_0^2}\bigg]+2\cdot \sqrt{\alpha_{t-1}}\cdot x_0^2 \sqrt{\alpha_t} \cdot \sqrt \frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} +x_0^2\cdot \alpha_{t-1}-\frac{2\cdot x_0^2 \cdot \sqrt{\alpha_t}\cdot \sqrt{\alpha_{t-1}}\cdot \sqrt{1-\alpha_{t-1}-\sigma_t^2}}{\sqrt{1-\alpha_t}}+ \sigma_t^2 - \mu ^2\\ &=\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} \cdot (1-\alpha_t)+2\cdot \sqrt{\alpha_{t-1}}\cdot x_0^2 \sqrt{\alpha_t} \cdot \sqrt \frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} +\bcancel{x_0^2\cdot \alpha_{t-1}}-\frac{2\cdot x_0^2 \cdot \sqrt{\alpha_t}\cdot \sqrt{\alpha_{t-1}}\cdot \sqrt{1-\alpha_{t-1}-\sigma_t^2}}{\sqrt{1-\alpha_t}}+ \sigma_t^2 - \underbrace{ \bcancel{\mu ^2}}_{=\alpha_{t-1}\cdot x_0^2}\\ &=\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} \cdot (1-\alpha_t)+\bcancel {2\cdot \sqrt{\alpha_{t-1}}\cdot x_0^2 \sqrt{\alpha_t} \cdot \sqrt \frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}} -\bcancel{\frac{2\cdot x_0^2 \cdot \sqrt{\alpha_t}\cdot \sqrt{\alpha_{t-1}}\cdot \sqrt{1-\alpha_{t-1}-\sigma_t^2}}{\sqrt{1-\alpha_t}} }+ \sigma_t^2 \\ &=\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}} \cdot (1-\alpha_t)+ \sigma_t^2 \\ &=1-\alpha_{t-1}-\sigma_t^2 + \sigma_t^2 \\ &=1-\alpha_{t-1} \end{split} \end{equation} σ2=Var(qσ(xt1x0))=(xt1μ)2qσ(xt1x0)dxt1=(xt122μxt1+μ2)(qσ(xtx0)qσ(xt1xt,x0)dxt)dxt1=∫∫(xt122μxt1+μ2)qσ(xtx0)qσ(xt1xt,x0)dxtdxt1=xt12qσ(xtx0)qσ(xt1xt,x0)dxtdxt12μxt1qσ(xtx0)qσ(xt1xt,x0)dxtdxt1+μ2qσ(xtx0)qσ(xt1xt,x0)dxtdxt1==E(xt12)=μ(xt,x0)2+σt2 (xt12qσ(xt1xt,x0)dxt1)qσ(xtx0)dxt2μ=E(xt1)=μ(xt,x0) (xt1qσ(xt1xt,x0)dxt1)qσ(xtx0)dxt+μ2=1 qσ(xtx0)qσ(xt1xt,x0)dxtdxt1=(μ(xt,x0)2+σt2)qσ(xtx0)dxt2μ=μ μ(xt,x0)qσ(xtx0)dxt+μ2=μ(xt,x0)2qσ(xtx0)dxt+σt2=1 qσ(xtx0)dxt2μ2+μ2=(1αt1αt1σt2 xt+为定值,设为A [αt1 1αt αt(1αt1σt2 )]x0)2qσ(xtx0)dxt+σt2μ2=(1αt1αt1σt2 xt+A)2qσ(xtx0)dxt+σt2μ2=(1αt1αt1σt2xt2+2A1αt1αt1σt2 xt+A2)2qσ(xtx0)dxt+σt2μ2=1αt1αt1σt2=Extqσ(xtx0)(xt2) xt2qσ(xtx0)dxt+2A1αt1αt1σt2 =Extqσ(xtx0)(xt) xtqσ(xtx0)dxt+A2=1 qσ(xtx0)dxt+σt2μ2=1αt1αt1σt2=Extqσ(xtx0)(xt2)=αtx02+(1αt) xt2qσ(xtx0)dxt+2A1αt1αt1σt2 =Extqσ(xtx0)(xt)=αt x0 xtqσ(xtx0)dxt+A2=1 qσ(xtx0)dxt+σt2μ2=1αt1αt1σt2[αtx02+(1αt)]+2([αt1 1αt αt(1αt1σt2 )]x0)1αt1αt1σt2 αt x0+([αt1 1αt αt(1αt1σt2 )]x0)2+σt2μ2=1αt1αt1σt2[αtx02+(1αt)]+2αt1 x02αt 1αt1αt1σt2 1αt2αtx02(1αt1σt2)+x02αt11αt 2x02αt αt1 1αt1σt2 +1αtx02αt(1αt1σt2)+σt2μ2=1αt1αt1σt2[αtx02+(1αt)2αtx02+αtx02]+2αt1 x02αt 1αt1αt1σt2 +x02αt11αt 2x02αt αt1 1αt1σt2 +σt2μ2=1αt1αt1σt2[αtx02 +(1αt)2αtx02 +αtx02 ]+2αt1 x02αt 1αt1αt1σt2 +x02αt11αt 2x02αt αt1 1αt1σt2 +σt2μ2=1αt1αt1σt2(1αt)+2αt1 x02αt 1αt1αt1σt2 +x02αt1 1αt 2x02αt αt1 1αt1σt2 +σt2=αt1x02 μ2 =1αt1αt1σt2(1αt)+2αt1 x02αt 1αt1αt1σt2 1αt 2x02αt αt1 1αt1σt2 +σt2=1αt1αt1σt2(1αt)+σt2=1αt1σt2+σt2=1αt1
    由公式(5)和公式(6)可以得出公式(7)所示结论,Lemma 1得到证明。
    q σ ( x t − 1 ∣ x 0 ) = N ( x t − 1 ; α t − 1 ⋅ x 0 , ( 1 − α t − 1 ) I ) \begin{equation} \begin{split} q_{\sigma}(x_{t-1}|x_0)=N(x_{t-1};\sqrt{\alpha_{t-1}}\cdot x_0,(1-\alpha_{t-1})I) \end{split} \end{equation} qσ(xt1x0)=N(xt1;αt1 x0,(1αt1)I)
    方法2:
    这个证明过程就是论文中的证明过程 ,该过程引用了 《Pattern Recognition and Machine Learning》一书中93页的公式(2.113)、(2.114)、(2.115),公式内容如下图所示。
    在这里插入图片描述
    q σ ( x t ∣ x 0 ) = N ( x t ; α t ⋅ x 0 , ( 1 − α t ) I ) ⇕ p ( x ) = N ( x ∣ μ , Λ − 1 ) ( 2.113 ) q σ ( x t − 1 ∣ x t , x 0 ) = N ( x t − 1 ; 1 − α t − 1 − σ t 2 1 − α t ⋅ x t + [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 , σ t 2 I ) ⇕ p ( y ∣ x ) = N ( y ∣ A x + b , L − 1 ) ( 2.114 ) q σ ( x t − 1 ∣ x 0 ) ⇔ p ( y ) = N ( y ∣ A μ + b , L − 1 + A Λ − 1 A T ) \begin{equation} \begin{split} q_{\sigma}(x_{t}|x_0)&=N(x_{t};\sqrt{\alpha_{t}}\cdot x_0,(1-\alpha_{t})I)\\ &\Updownarrow\\ p(x)&=N(x|\mu,\Lambda^{-1}) (2.113)\\ q_{\sigma}(x_{t-1}|x_t,x_0)&=N\Bigg(x_{t-1};\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot x_t+ \bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0 ,\sigma_t^2 I\Bigg) \\ &\Updownarrow\\ p(y|x)&=N(y|Ax+b,L^{-1})(2.114)\\ q_\sigma(x_{t-1}|x_0)& \Leftrightarrow p(y)=N(y|A\mu+b,L^{-1}+A\Lambda^{-1}A^T) \end{split} \end{equation} qσ(xtx0)p(x)qσ(xt1xt,x0)p(yx)qσ(xt1x0)=N(xt;αt x0,(1αt)I)=N(xμ,Λ1)(2.113)=N(xt1;1αt1αt1σt2 xt+[αt1 1αt αt(1αt1σt2 )]x0,σt2I)=N(yAx+b,L1)(2.114)p(y)=N(yAμ+b,L1+AΛ1AT)
    对比可以知道,(2.113)和(2.114)中的各项分别如下所示
    μ = α t ⋅ x 0 Λ − 1 = 1 − α t A = 1 − α t − 1 − σ t 2 1 − α t b = [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 L − 1 = σ t 2 \begin{equation} \begin{split} \mu&=\sqrt{\alpha_{t}}\cdot x_0\\ \Lambda^{-1}&=1-\alpha_{t}\\ A&=\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\\ b&=\bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0 \\ L^{-1}&=\sigma_t^2 \end{split} \end{equation} μΛ1AbL1=αt x0=1αt=1αt1αt1σt2 =[αt1 1αt αt(1αt1σt2 )]x0=σt2
    分布 q σ ( x t − 1 ∣ x 0 ) q_\sigma(x_{t-1}|x_0) qσ(xt1x0)的均值和方差分别如下所示:
    E ( q σ ( x t − 1 ∣ x 0 ) ) = E ( p ( y ) ) = A μ + b = 1 − α t − 1 − σ t 2 1 − α t ⋅ α t ⋅ x 0 + [ α t − 1 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ] ⋅ x 0 = 1 − α t − 1 − σ t 2 1 − α t ⋅ α t ⋅ x 0 + α t − 1 ⋅ x 0 − α t ⋅ ( 1 − α t − 1 − σ t 2 ) 1 − α t ⋅ x 0 = α t − 1 ⋅ x 0 V a r ( q σ ( x t − 1 ∣ x 0 ) ) = V a r ( p ( y ) ) = L − 1 + A Λ − 1 A T = σ t 2 + 1 − α t − 1 − σ t 2 1 − α t ⋅ ( 1 − α t ) ⋅ 1 − α t − 1 − σ t 2 1 − α t = σ t 2 + 1 − α t − 1 − σ t 2 = 1 − α t − 1 \begin{equation} \begin{split} E\big(q_\sigma(x_{t-1}|x_0)\big)&=E\big(p(y)\big)\\ &=A\mu+b\\ &=\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot \sqrt{\alpha_{t}}\cdot x_0+\bigg[\sqrt{\alpha_{t-1}}- \frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \bigg] \cdot x_0 \\ &=\bcancel{\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot \sqrt{\alpha_{t}}\cdot x_0}+\sqrt{\alpha_{t-1}}\cdot x_0-\bcancel{\frac{\sqrt{ \alpha_t\cdot (1-\alpha_{t-1}-\sigma_t^2} )}{\sqrt{1-\alpha_t}} \cdot x_0}\\ &=\sqrt{\alpha_{t-1}}\cdot x_0 \\ \\ \\ Var\big(q_\sigma(x_{t-1}|x_0)\big)&=Var\big(p(y)\big)\\ &=L^{-1}+A\Lambda^{-1}A^T \\ &=\sigma_t^2+\sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}}\cdot (1-\alpha_{t}) \cdot \sqrt{\frac{1-\alpha_{t-1}-\sigma_t^2}{1-\alpha_{t}}} \\ &=\sigma_t^2+1-\alpha_{t-1}-\sigma_t^2\\ &=1-\alpha_{t-1} \end{split} \end{equation} E(qσ(xt1x0))Var(qσ(xt1x0))=E(p(y))=Aμ+b=1αt1αt1σt2 αt x0+[αt1 1αt αt(1αt1σt2 )]x0=1αt1αt1σt2 αt x0 +αt1 x01αt αt(1αt1σt2 )x0 =αt1 x0=Var(p(y))=L1+AΛ1AT=σt2+1αt1αt1σt2 (1αt)1αt1αt1σt2 =σt2+1αt1σt2=1αt1
    证毕!

http://www.mrgr.cn/news/96795.html

相关文章:

  • STM32_HAL之程序编写、编译、烧写、上板测试初体验
  • 使用SpringBoot + Thymeleaf + iText实现动态PDF导出
  • git 按行切割 csv文件
  • echarts+HTML 绘制3d地图,加载散点+散点点击事件
  • C#:第一性原理拆解属性(property)
  • Anaconda和Pycharm的区别,以及如何选择两者
  • k8s 1.30 安装ingress-nginx
  • 为什么 Three.js 里 Cannon.js 物体堆叠时会有空隙?
  • 【C语言】深入理解指针(三):C语言中的高级指针应用
  • Prompt攻击是什么
  • Anolis系统下安装Jenkins
  • 检查是否存在占用内存过大的SQL
  • Unity中 粒子系统使用整理(一)
  • Vue3.5 企业级管理系统实战(十二):组件尺寸及多语言实现
  • Cesium学习(未完继续)
  • 虚幻5入门
  • 【目标检测】【深度学习】【Pytorch版本】YOLOV2模型算法详解
  • vue3使用v-md-editor完成Markdown内容展示
  • 01_使用Docker将Coding上项目部署到k8s平台
  • 编译玄铁处理器RISC-V指令测试用例