文献分享: Muvera多向量到单向量的转化方法(Part3——引理证明的补充)
原文章
文章目录
由于 CSDN \text{CSDN} CSDN字数严格的限制,很多详细过程腰斩了,完整版见标题的超链接
1. \textbf{1. } 1. 事实 A.2 \textbf{A.2} A.2的证明
1.0. \textbf{1.0. } 1.0. 定理的内容
👉前提:设随机矩阵 S = ( s i j ) ∈ R d proj × d \mathbf{S}{=}(s_{ij}){∈}\mathbb{R}^{d_\text{proj}{×}d} S=(sij)∈Rdproj×d
- 每元素 s i j s_{ij} sij独立且同分布于 D \mathcal{D} D,其中 D \mathcal{D} D关于竖轴对称且 ∀ s i j ∼ D ∀s_{ij}{∼}\mathcal{D} ∀sij∼D有 E [ s i j 2 ] = 1 \mathbb{E}[s_{ij}^2]{=}1 E[sij2]=1
- 对任意固定向量 u ∈ R d × 1 u{∈}\mathbb{R}^{d{×}1} u∈Rd×1(即每个分量 u j u_{j} uj不随机),定义 ψ ( u ) = 1 t ( S u ) ψ{(u)}{=}\cfrac{1}{\sqrt{t}}(\mathbf{S}u) ψ(u)=t1(Su)
👉结论 1 1 1: ∥ ψ ( u ) ∥ 2 \left\|ψ(u)\right\|^2 ∥ψ(u)∥2与 ∥ u ∥ 2 \|u\|^2 ∥u∥2逼近
- 负半边:令 E [ s i j 4 ] = B < ∞ \mathbb{E}[s_{ij}^4]{=}B{<}\infty E[sij4]=B<∞,对 ∀ ε > 0 ∀{ε}{>}0 ∀ε>0有 Pr [ ∥ u ′ ∥ 2 ≤ ( 1 – ε ) ∥ u ∥ 2 ] ≤ e – ( ε 2 – ε 3 ) t 2 ( B + 1 ) \Pr\left[\left\|u^{\prime}\right\|^2{≤}(1–ε)\|u\|^2\right]{≤}e^{–\frac{\left(ε^2–ε^3\right)t}{2(B{+}1)}} Pr[∥u′∥2≤(1–ε)∥u∥2]≤e–2(B+1)(ε2–ε3)t
- 正半边:若 ∃ L > 0 \text{∃}L{>}0 ∃L>0对 ∀ m > 0 ∀m{>}0 ∀m>0满足矩 E [ s 2 m ] ≤ ( 2 m ) ! 2 m m ! L 2 m \mathbb{E}\left[s^{2m}\right]{≤}\cfrac{(2m)!}{2^mm!}L^{2m} E[s2m]≤2mm!(2m)!L2m,对 ∀ ε > 0 ∀{ε}{>}0 ∀ε>0有 Pr [ ∥ u ′ ∥ 2 ≥ ( 1 + ε ) L 2 ∥ u ∥ 2 ] ≤ ( ( 1 + ε ) e – ε ) t 2 ≤ e – ( ε 2 – ε 3 ) t 4 \Pr\left[\left\|u^{\prime}\right\|^2{≥}(1{+}ε)L^2\|u\|^2\right]{≤}\left((1{+}ε)e^{–ε}\right)^{\frac{t}{2}}{≤}e^{–\frac{\left(ε^2–ε^3\right)t}{4}} Pr[∥u′∥2≥(1+ε)L2∥u∥2]≤((1+ε)e–ε)2t≤e–4(ε2–ε3)t
1.1. \textbf{1.1. } 1.1. 矩分析
➡️设 X i = ∑ j = 1 d s i j u j = S i ⋅ u \displaystyle{}X_i{=}\sum_{j{=}1}^ds_{ij}u_{j}{=}\mathbf{S}_{i\cdot}u Xi=j=1∑dsijuj=Si⋅u,则 ψ i ( u ) = 1 d proj ( S i ⋅ u ) = 1 d proj X i ψ_i(u){=}\cfrac{1}{\sqrt{d_\text{proj}}}\left(\mathbf{S}_{i\cdot}u\right){=}\cfrac{1}{\sqrt{d_\text{proj}}}X_i ψi(u)=dproj1(Si⋅u)=dproj1Xi
![]()
➡️分析 X i X_i Xi的 1/2/4 \text{1/2/4} 1/2/4阶矩
- 一阶矩: E [ X i ] = E [ ∑ j = 1 d s i j u j ] ⇒ 由于每个 u j 固定 E [ X i ] = ∑ j = 1 d u j E [ s i j ] ⇒ s i j 的分布关于竖轴对称 E [ X i ] = 0 \displaystyle\mathbb{E}\left[X_i\right]{=}\mathbb{E}\left[\sum_{j{=}1}^ds_{ij}u_{j}\right]\xRightarrow{由于每个u_{j}固定}\mathbb{E}\left[X_i\right]{=}\sum_{j{=}1}^du_{j}\mathbb{E}\left[s_{ij}\right]\xRightarrow{s_{ij}的分布关于竖轴对称}\mathbb{E}\left[X_i\right]{=}0 E[Xi]=E[j=1∑dsijuj]由于每个uj固定E[Xi]=j=1∑dujE[sij]sij的分布关于竖轴对称E[Xi]=0
- 二阶矩: E [ X i 2 ] = E [ ( ∑ j = 1 d s i j u j ) 2 ] = E [ ∑ j = 1 d ( s i j u j ) 2 ] + E [ 2 ∑ j 1 < j 2 ( s i j 1 u j 1 ) ( s i j 2 u j 2 ) ] = 1 \displaystyle\mathbb{E}\left[X_i^2\right]{=}\mathbb{E}\left[\left(\sum_{j{=}1}^ds_{ij}u_j\right)^2\right]{=}\mathbb{E}\left[\sum_{j{=}1}^d\left(s_{ij}u_j\right)^2\right]{+}\mathbb{E}\left[2\sum_{j_1{<}j_2}\left(s_{ij_1}u_{j_1}\right)\left(s_{ij_2}u_{j_2}\right)\right]{=}1 E[Xi2]=E (j=1∑dsijuj)2 =E[j=1∑d(sijuj)2]+E[2j1<j2∑(sij1uj1)(sij2uj2)]=1
- 四阶矩: E [ X i 4 ] = ∑ j 1 , 2 , 3 , 4 E [ s i j 1 s i j 2 s i j 3 s i j 4 ] \displaystyle\mathbb{E}\left[X_i^4\right]{=}\sum_{j_{1,2,3,4}}\mathbb{E}\left[s_{ij_1}s_{ij_2}s_{ij_3}s_{ij_4}\right] E[Xi4]=j1,2,3,4∑E[sij1sij2sij3sij4]
- s i j s_{ij} sij相互独立且关于竖轴对称,故 E [ s i j 2 n +1 ] = E [ s i j 2 n ] E [ s i j ] = 0 \mathbb{E}\left[s_{ij}^{2n\text{+1}}\right]{=}\mathbb{E}\left[s_{ij}^{2n}\right]\mathbb{E}\left[s_{ij}\right]{=}0 E[sij2n+1]=E[sij2n]E[sij]=0及 E [ s i j 2 n ] ≠ 0 \mathbb{E}\left[s_{ij}^{2n}\right]≠0 E[sij2n]=0
- E [ s i j 1 s i j 2 s i j 3 s i j 4 ] \mathbb{E}\left[s_{ij_1}s_{ij_2}s_{ij_3}s_{ij_4}\right] E[sij1sij2sij3sij4]四种情况
情形 次数 E [ s i j 1 s i j 2 s i j 3 s i j 4 ] \boldsymbol{\mathbb{E}\left[s_{ij_1}s_{ij_2}s_{ij_3}s_{ij_4}\right]} E[sij1sij2sij3sij4]值 j 1 = j 2 = j 3 = j 4 j_1=j_2=j_3=j_4 j1=j2=j3=j4 1 1 1 B B B j 1 = j 2 ≠ j 3 = j 4 j_1=j_2≠j_3=j_4 j1=j2=j3=j4 C 4 2 / 2 =3 {C_4^2}/{2}\text{=3} C42/2=3 1 1 1 j 1 = j 2 ≠ j 3 ≠ j 4 j_1=j_2≠j_3≠j_4 j1=j2=j3=j4 N/A \text{N/A} N/A 0 0 0 j 1 ≠ j 2 ≠ j 3 ≠ j 4 j_1≠j_2≠j_3≠j_4 j1=j2=j3=j4 N/A \text{N/A} N/A 0 0 0 - E [ X i 4 ] \displaystyle\mathbb{E}\left[X_i^4\right] E[Xi4]的情况
索引情形 次数 E [ X i 4 ] \boldsymbol{\mathbb{E}\left[X_i^4\right]} E[Xi4] j 1 = j 2 = j 3 = j 4 j_1=j_2=j_3=j_4 j1=j2=j3=j4 1 1 1 ∑ j = 1 d u j 4 E [ s i j 4 ] = B ∑ j = 1 d u j 4 \displaystyle\sum_{j{=}1}^du_j^4\mathbb{E}\left[s_{ij}^4\right]{=}B\sum_{j{=}1}^du_j^4 j=1∑duj4E[sij4]=Bj=1∑duj4 j 1 = j 2 ≠ j 3 = j 4 j_1=j_2≠j_3=j_4 j1=j2=j3=j4 3 3 3 ∑ j α , j β u j α 2 u j β 2 – ∑ j α = j β u j α 2 u j β 2 = 1 − ∑ j = 1 d u j 4 \displaystyle\sum_{j_{α}\text{,}j_{\beta}}u_{j_{α}}^2u_{j_{\beta}}^2–\sum_{j_{α}{=}j_{\beta}}u_{j_{α}}^2u_{j_{\beta}}^2{=}1{-}\sum_{j{=}1}^du_j^4 jα,jβ∑ujα2ujβ2–jα=jβ∑ujα2ujβ2=1−j=1∑duj4 - 合并得 E [ X i 4 ] = ( B – 3 ) ∑ j = 1 d u j 4 + 3 \mathbb{E}\left[X_i^4\right]\displaystyle{=}(B–3)\sum_{j=1}^du_j^4{+}3 E[Xi4]=(B–3)j=1∑duj4+3,而 ∑ j = 1 d u j 4 ≤ ∑ j = 1 d u j 2 =1 \displaystyle\sum_{j{=}1}^du_j^4{≤}\sum_{j{=}1}^du_j^2\text{=1} j=1∑duj4≤j=1∑duj2=1故 E [ X i 4 ] ≤ B \displaystyle\mathbb{E}\left[X_i^4\right]{≤}B E[Xi4]≤B
➡️分析 X i X_i Xi的类二阶矩母函数: E [ e α X i 2 ] ≤ 1 1 – 2 α L 2 \mathbb{E}\left[e^{αX_i^2}\right]{≤}\cfrac{1}{\sqrt{1–2αL^{2}}} E[eαXi2]≤1–2αL21
- 构造变量 s i j ′ ∼ N ( 0 , L ) s_{ij}^{\prime}{∼}N(0,L) sij′∼N(0,L),定义其线性变换 Z i = ∑ j = 1 d s i j ′ u j \displaystyle{Z}_i{=}\sum_{j{=}1}^ds_{ij}^{\prime}u_{j} Zi=j=1∑dsij′uj
- 正态分布性质中有 E [ s i j ′ 2 m ] = ( 2 m – 1 ) ! ! σ 2 m = ( 2 m ) ! 2 m m ! σ 2 m \mathbb{E}\left[s_{ij}^{\prime\text{ }2m}\right]{=}(2m–1)!!σ^{2m}{=}\cfrac{(2m)!}{2^mm!}σ^{2m} E[sij′ 2m]=(2m–1)!!σ2m=2mm!(2m)!σ2m,结合条件 E [ s i j 2 m ] ≤ ( 2 m ) ! 2 m m ! L 2 m \mathbb{E}\left[s^{2m}_{ij}\right]{≤}\cfrac{(2m)!}{2^mm!}L^{2m} E[sij2m]≤2mm!(2m)!L2m故 E [ s i j 2 m ] ≤ E [ s i j ′ 2 m ] \mathbb{E}\left[s^{2m}_{ij}\right]{≤}\mathbb{E}\left[s_{ij}^{\prime\text{ }2m}\right] E[sij2m]≤E[sij′ 2m]
- 由 E [ s i j 2 m ] ≤ E [ s i j ′ 2 m ] \mathbb{E}\left[s_{ij}^{2m}\right]{≤}\mathbb{E}\left[s_{ij}^{\prime\text{ }2m}\right] E[sij2m]≤E[sij′ 2m]可得 E [ X i 2 m ] ≤ E [ Z i 2 m ] \mathbb{E}\left[X_i^{2m}\right]{≤}\mathbb{E}\left[Z_i^{2m}\right] E[Xi2m]≤E[Zi2m]
- 因 D \mathcal{D} D及 N ( 0 , L ) N(0,L) N(0,L)皆关于竖轴对称,则 E [ s i j 2 m + 1 ] = E [ s i j ′ 2 m + 1 ] = 0 \mathbb{E}\left[s^{2m{+}1}_{ij}\right]{=}\mathbb{E}\left[s^{\prime{\text{ }}2m{+}1}_{ij}\right]{=}0 E[sij2m+1]=E[sij′ 2m+1]=0,故在 E [ X i 2 m ] \mathbb{E}\left[X_i^{2m}\right] E[Xi2m]和 E [ Z i 2 m ] \mathbb{E}\left[Z_i^{2m}\right] E[Zi2m]展开直接忽略奇数项(如下)
- E [ X i 2 m ] = ∑ ∑ α = 2 m ∀ α = 2 k ( 2 m ) ! ∏ i =1 d α i ! ( ∏ j = 1 d u j α j ) ∏ j = 1 d E [ s i j α j ] \displaystyle\mathbb{E}\left[X_i^{2m}\right]{=}\sum_{\substack{\sum\alpha=2m\\∀\alpha{=}2k}}\frac{(2m)!}{\prod_{i\text{=1}}^{d}\alpha_i\text{!}}\left(\prod_{j=1}^du_j^{\alpha_j}\right)\prod_{j=1}^d\mathbb{E}\left[s_{ij}^{\alpha_j}\right] E[Xi2m]=∑α=2m∀α=2k∑∏i=1dαi!(2m)!(j=1∏dujαj)j=1∏dE[sijαj]
- E [ Z i 2 m ] \mathbb{E}\left[Z_i^{2m}\right] E[Zi2m]将上式 s i j s_{ij} sij换成 s i j ′ s_{ij}^\prime sij′即可
- 两式 E [ s i j α j ] \mathbb{E}\left[s_{ij}^{\alpha_j}\right] E[sijαj]与 E [ s i j ′ α j ] \mathbb{E}\left[s_{ij}^{\prime{\text{ }}\alpha_j}\right] E[sij′ αj]中 α j \alpha_j αj为偶数,故 E [ s i j α j ] ≤ E [ s i j ′ α j ] \mathbb{E}\left[s_{ij}^{\alpha_j}\right]{≤}\mathbb{E}\left[s_{ij}^{\prime{\text{ }}\alpha_j}\right] E[sijαj]≤E[sij′ αj],故 E [ X i 2 m ] ≤ E [ Z i 2 m ] \mathbb{E}\left[X_i^{2m}\right]{≤}\mathbb{E}\left[Z_i^{2m}\right] E[Xi2m]≤E[Zi2m]
- 由 E [ X i 2 m ] ≤ E [ Z i 2 m ] \mathbb{E}\left[X_i^{2m}\right]{≤}\mathbb{E}\left[Z_i^{2m}\right] E[Xi2m]≤E[Zi2m]得 E [ e α X i 2 ] ≤ E [ e α Z i 2 ] \mathbb{E}\left[e^{αX_i^2}\right]{≤}\mathbb{E}\left[e^{αZ_i^2}\right] E[eαXi2]≤E[eαZi2],有正态分布性质知当 α < 1 2 L 2 α{<}\cfrac{1}{2L^2} α<2L21时 E [ e α X i 2 ] ≤ E [ e α Z i 2 ] = 1 1 – 2 α L 2 \mathbb{E}\left[e^{αX_i^2}\right]{≤}\mathbb{E}\left[e^{αZ_i^2}\right]{=}\cfrac{1}{\sqrt{1–2αL^{2}}} E[eαXi2]≤E[eαZi2]=1–2αL21
1.2. \textbf{1.2. } 1.2. 负半边的证明
➡️设 Y = ∑ i = 1 d proj X i 2 \displaystyle{Y}{=}\sum_{i{=}1}^{d_\text{proj}}X_i^2 Y=i=1∑dprojXi2,默认 ∥ u ∥ = 1 \|u\|{=}1 ∥u∥=1
- 则 E [ Y ] = E [ ∑ i = 1 d proj X i 2 ] = ∑ i = 1 d proj E [ X i 2 ] = d proj \displaystyle\mathbb{E}[Y]{=}\mathbb{E}\left[\sum_{i{=}1}^{d_\text{proj}}X_i^2\right]{=}\sum_{i{=}1}^{d_\text{proj}}\mathbb{E}\left[X_i^2\right]{=}{d_\text{proj}} E[Y]=E i=1∑dprojXi2 =i=1∑dprojE[Xi2]=dproj
- 及 Y = ∑ i = 1 d proj ( ψ i ( u ) d proj ) 2 = d proj ∥ ψ ( u ) ∥ 2 \displaystyle{Y}{=}\sum_{i=1}^{d_\text{proj}}\left(ψ_i(u)\sqrt{d_\text{proj}}\right)^2{=}d_\text{proj}\left\|ψ{(u)}\right\|^2 Y=i=1∑dproj(ψi(u)dproj)2=dproj∥ψ(u)∥2,故 Pr [ ∥ ψ ( u ) ∥ 2 ≤ ( 1 – ε ) ∥ u ∥ 2 ] = Pr [ Y ≤ ( 1 – ε ) d proj ] \Pr\left[\left\|ψ{(u)}\right\|^{2}{≤}(1–ε)\|u\|^{2}\right]{=}\Pr[Y{≤}(1–ε){d_\text{proj}}] Pr[∥ψ(u)∥2≤(1–ε)∥u∥2]=Pr[Y≤(1–ε)dproj]
➡️马可夫不等式 Pr [ Y ≤ ( 1 – ε ) d proj ] = Pr [ e – α Y ≥ e – α ( 1 – ε ) d proj ] ≤ E [ e – α Y ] e – α ( 1 – ε ) d proj \Pr[Y{≤}(1–ε){d_\text{proj}}]{=}\Pr\left[e^{–αY}{≥}e^{–α(1–ε){d_\text{proj}}}\right]{≤}\cfrac{\mathbb{E}\left[e^{–αY}\right]}{e^{–α(1–ε){d_\text{proj}}}} Pr[Y≤(1–ε)dproj]=Pr[e–αY≥e–α(1–ε)dproj]≤e–α(1–ε)dprojE[e–αY]
- 分子 E [ e – α Y ] = E [ e – α ( X 1 2 + ⋅ ⋅ ⋅ + X d proj 2 ) ] = ∏ i = 1 d proj E [ e – α X i 2 ] = E d proj [ e – α X i 2 ] \mathbb{E}\left[e^{–αY}\right]{=}\displaystyle\mathbb{E}\left[e^{–α\left(X^2_1{+}···{+}X^2_{d_\text{proj}}\right)}\right]{=}\prod_{i=1}^{d_\text{proj}}\mathbb{E}\left[e^{–αX_i^2}\right]{=}\mathbb{E}^{d_\text{proj}}\left[e^{–αX_i^2}\right] E[e–αY]=E[e–α(X12+⋅⋅⋅+Xdproj2)]=i=1∏dprojE[e–αXi2]=Edproj[e–αXi2]
- 已证 E [ X i 4 ] ≤ B + 1 \mathbb{E}\left[X_i^4\right]{≤}B{+}1 E[Xi4]≤B+1,故泰勒展开得 E [ e – α X i 2 ] ≤ 1 – α + α 2 2 E [ X i 4 ] ≤ 1 – α + α 2 2 ( B + 1 ) \mathbb{E}\left[e^{–αX_i^2}\right]{≤}1–α{+}\cfrac{α^2}{2}\mathbb{E}\left[X_i^4\right]{≤}1–α{+}\cfrac{α^2}{2}(B{+}1) E[e–αXi2]≤1–α+2α2E[Xi4]≤1–α+2α2(B+1)
- 代回得 Pr [ ∥ ψ ( u ) ∥ 2 < ( 1 – ε ) ∥ u ∥ 2 ] ≤ ( ( 1 – α + α 2 2 ( B + 1 ) ) e α ( 1 – ε ) ) d proj \Pr\left[\left\|ψ{(u)}\right\|^{2}{<}(1–ε)\|u\|^{2}\right]{≤}\left(\left(1–α{+}\cfrac{α^2}{2}(B{+}1)\right)e^{α(1–ε)}\right)^{d_\text{proj}} Pr[∥ψ(u)∥2<(1–ε)∥u∥2]≤((1–α+2α2(B+1))eα(1–ε))dproj
- 又有 1 + ( – α + α 2 2 ( B + 1 ) ) ≤ e ( – α + α 2 2 ( B + 1 ) ) 1{+}\left(–α{+}\cfrac{α^2}{2}(B{+}1)\right){≤}e^{\left(–α{+}\frac{α^2}{2}(B{+}1)\right)} 1+(–α+2α2(B+1))≤e(–α+2α2(B+1)),故令 α = ε 1 + B α{=}\cfrac{ε}{1{+}B} α=1+Bε则 Pr [ ∥ ψ ( u ) ∥ 2 < ( 1 – ε ) ∥ u ∥ 2 ] ≤ e ( – ε 2 2 ( B + 1 ) ) d proj \Pr\left[\left\|ψ{(u)}\right\|^{2}{<}(1–ε)\|u\|^{2}\right]{≤}e^{\left(–\frac{ε^2}{2(B{+}1)}\right){d_\text{proj}}} Pr[∥ψ(u)∥2<(1–ε)∥u∥2]≤e(–2(B+1)ε2)dproj
- 而 1≤ e – – ε 3 2 ( B + 1 ) \text{1≤}e^{–\frac{–ε^3}{2(B{+}1)}} 1≤e–2(B+1)–ε3( ε > 0 ε{>}0 ε>0),所以 Pr [ ∥ ψ ( u ) ∥ 2 < ( 1 – ε ) ∥ u ∥ 2 ] ≤ e ( – ε 2 2 ( B + 1 ) ) t ≤ e ( – ε 2 – ε 3 2 ( B + 1 ) ) t \Pr\left[\left\|ψ{(u)}\right\|^{2}{<}(1–ε)\|u\|^{2}\right]{≤}e^{\left(–\frac{ε^2}{2(B{+}1)}\right)t}{≤}e^{\left(–\frac{ε^2–ε^3}{2(B{+}1)}\right)t} Pr[∥ψ(u)∥2<(1–ε)∥u∥2]≤e(–2(B+1)ε2)t≤e(–2(B+1)ε2–ε3)t,证毕
1.3. \textbf{1.3. } 1.3. 正半边的证明
➡️设 Y = ∑ i = 1 d proj X i 2 \displaystyle{Y}{=}\sum_{i{=}1}^{d_\text{proj}}X_i^2 Y=i=1∑dprojXi2,默认 ∥ u ∥ = 1 \|u\|{=}1 ∥u∥=1
- 则 E [ Y ] = E [ ∑ i = 1 d proj X i 2 ] = ∑ i = 1 d proj E [ X i 2 ] = d proj \displaystyle\mathbb{E}[Y]{=}\mathbb{E}\left[\sum_{i{=}1}^{d_\text{proj}}X_i^2\right]{=}\sum_{i{=}1}^{d_\text{proj}}\mathbb{E}\left[X_i^2\right]{=}{d_\text{proj}} E[Y]=E i=1∑dprojXi2 =i=1∑dprojE[Xi2]=dproj
- 及 Y = ∑ i = 1 d proj ( ψ i ( u ) d proj ) 2 = d proj ∑ i = 1 d proj ψ i 2 ( u ) = d proj ∥ ψ ( u ) ∥ 2 \displaystyle{}Y{=}\sum_{i=1}^{d_\text{proj}}\left(ψ_i(u)\sqrt{d_\text{proj}}\right)^2{=}d_\text{proj}\sum_{i{=}1}^{d_\text{proj}}ψ^2_i(u){=}d_\text{proj}\left\|ψ(u)\right\|^2 Y=i=1∑dproj(ψi(u)dproj)2=dproji=1∑dprojψi2(u)=dproj∥ψ(u)∥2,故 Pr [ ∥ ψ ( u ) ∥ 2 ≥ ( 1 + ε ) L 2 ∥ u ∥ 2 ] = Pr [ Y ≥ ( 1 + ε ) L 2 d proj ] \Pr\left[\left\|ψ{(u)}\right\|^{2}{≥}(1{+}ε)L^2\|u\|^{2}\right]{=}\Pr[Y{≥}(1{+}ε)L^2{d_\text{proj}}] Pr[∥ψ(u)∥2≥(1+ε)L2∥u∥2]=Pr[Y≥(1+ε)L2dproj]
➡️马可夫不等式 Pr [ Y ≥ ( 1 + ε ) L 2 d proj ] = Pr [ e α Y ≥ e α ( 1 + ε ) L 2 d proj ] ≤ E [ e α Y ] e α ( 1 + ε ) L 2 d proj \Pr[Y{≥}(1{+}ε)L^2{d_\text{proj}}]{=}\Pr\left[e^{αY}{≥}e^{α(1{+}ε)L^2{d_\text{proj}}}\right]{≤}\cfrac{\mathbb{E}\left[e^{αY}\right]}{e^{ α(1{+}ε)L^2{d_\text{proj}}}} Pr[Y≥(1+ε)L2dproj]=Pr[eαY≥eα(1+ε)L2dproj]≤eα(1+ε)L2dprojE[eαY]
- 对 E [ e α Y ] = E [ e α ( X 1 2 + X 2 2 + ⋅ ⋅ ⋅ + X d proj 2 ) ] = E [ e α X 1 2 ⋅ ⋅ ⋅ e α X d proj 2 ] = ∏ i = 1 d proj E [ e α X i 2 ] = E d proj [ e α X i 2 ] \displaystyle\mathbb{E}\left[e^{αY}\right]{=}\mathbb{E}\left[e^{α\left(X^2_1{+}X^2_2{+}···{+}X^2_{d_\text{proj}}\right)}\right]{=}\mathbb{E}\left[e^{αX^2_1}···e^{αX^2_{d_\text{proj}}}\right]{=}\prod_{i=1}^{d_\text{proj}}\mathbb{E}\left[e^{αX_i^2}\right]{=}\mathbb{E}^{d_\text{proj}}\left[e^{αX_i^2}\right] E[eαY]=E[eα(X12+X22+⋅⋅⋅+Xdproj2)]=E[eαX12⋅⋅⋅eαXdproj2]=i=1∏dprojE[eαXi2]=Edproj[eαXi2]
- 其中 E [ e α Y ] = E d proj [ e α X i 2 ] ≤ ( 1 1 – 2 α L 2 ) d proj 2 \displaystyle\mathbb{E}\left[e^{αY}\right]{=}\mathbb{E}^{d_\text{proj}}\left[e^{αX_i^2}\right]{≤}\left(\frac{1}{1–2\alpha L^{2}}\right)^{\frac{{d_\text{proj}}}{2}} E[eαY]=Edproj[eαXi2]≤(1–2αL21)2dproj,即 Pr [ ∥ ψ ( u ) ∥ 2 ≥ ( 1 + ε ) L 2 ∥ u ∥ 2 ] ≤ ( e – 2 α L 2 ( 1 + ε ) 1 – 2 α L 2 ) d proj 2 \Pr\left[\left\|ψ{(u)}\right\|^{2}{≥}(1{+}ε)L^2\|u\|^{2}\right]{≤}\left(\cfrac{e^{–2\alpha L^{2}(1{+}ε)}}{1–2\alpha L^{2}}\right)^{\frac{{d_\text{proj}}}{2}} Pr[∥ψ(u)∥2≥(1+ε)L2∥u∥2]≤(1–2αL2e–2αL2(1+ε))2dproj
- 分析 ϕ ( α , ε ) = ( e – 2 α L 2 ( 1 + ε ) 1 – 2 α L 2 ) d proj 2 ϕ(\alpha,ε){=}\left(\cfrac{e^{–2\alpha L^{2}(1{+}ε)}}{1–2\alpha L^{2}}\right)^{\frac{{d_\text{proj}}}{2}} ϕ(α,ε)=(1–2αL2e–2αL2(1+ε))2dproj界
- 令 α = ε 2 L 2 ( 1 + ε ) α{=}\cfrac{ε}{2L^2(1{+}ε)} α=2L2(1+ε)ε,得 ϕ ( α , ε ) = ϕ ( ε ) = ( ( 1 + ε ) e – ε ) d proj 2 ϕ(\alpha,ε){=}ϕ(ε){=}\left((1{+}ε)e^{–ε}\right)^{\frac{{d_\text{proj}}}{2}} ϕ(α,ε)=ϕ(ε)=((1+ε)e–ε)2dproj
- 泰勒展开 ϕ ( ε ) = ( ( 1 + ε ) e – ε ) d proj 2 = e d proj 2 ( ln ( 1 + ε ) – ε ) ≤ e d proj 2 ( – e 2 2 + s 3 2 ) ϕ(ε){=}\left((1{+}ε) e^{–ε}\right)^{\frac{{d_\text{proj}}}{2}}=e^{\frac{{d_\text{proj}}}{2}(\ln (1+ε)–ε)} \leq e^{\frac{{d_\text{proj}}}{2}\left(–\frac{e^2}{2}+\frac{s^3}{2}\right)} ϕ(ε)=((1+ε)e–ε)2dproj=e2dproj(ln(1+ε)–ε)≤e2dproj(–2e2+2s3)
- 故 Pr [ ∥ ψ ( u ) ∥ 2 ≥ ( 1 + ε ) L 2 ∥ u ∥ 2 ] ≤ ( ( 1 + ε ) e – ε ) d proj 2 ≤ e – ( ε 2 – ε 3 ) t 4 \Pr\left[\left\|ψ{(u)}\right\|^2{≥}(1{+}ε)L^2\|u\|^2\right]{≤}\left((1{+}ε)e^{–ε}\right)^{\frac{{d_\text{proj}}}{2}}{≤}e^{–\frac{\left(ε^2–ε^3\right)t}{4}} Pr[∥ψ(u)∥2≥(1+ε)L2∥u∥2]≤((1+ε)e–ε)2dproj≤e–4(ε2–ε3)t,证毕
1.4. \textbf{1.4. } 1.4. 事实 A.2 \textbf{A.2} A.2的推论
➡️推论一:让 s i j ∼ U { – 1 , 1 } s_{ij}{∼}U\{–1,1\} sij∼U{–1,1},则 Pr [ ∥ ψ ( u ) ∥ 2 ≤ ( 1 – ε ) ∥ u ∥ 2 ] ≤ e – ( ε 2 – ε 3 ) t 4 \Pr\left[\left\|ψ{(u)}\right\|^2{≤}(1–ε)\|u\|^2\right]{≤}e^{–\frac{\left(ε^2–ε^3\right)t}{4}} Pr[∥ψ(u)∥2≤(1–ε)∥u∥2]≤e–4(ε2–ε3)t及 Pr [ ∥ ψ ( u ) ∥ 2 ≥ ( 1 + ε ) ∥ u ∥ 2 ] ≤ e – ( ε 2 – ε 3 ) t 4 \Pr\left[\left\|ψ{(u)}\right\|^2{≥}(1{+}ε)\|u\|^2\right]{≤}e^{–\frac{\left(ε^2–ε^3\right)t}{4}} Pr[∥ψ(u)∥2≥(1+ε)∥u∥2]≤e–4(ε2–ε3)t
- 对 s i j ∼ U { – 1 , 1 } s_{ij}{∼}U\{–1,1\} sij∼U{–1,1},有 E [ s i j 2 m ] = 1 2 ( − 1 ) 2 m + 1 2 1 2 m = 1 \mathbb{E}\left[s_{ij}^{2m}\right]{=}\cfrac{1}{2}(-1)^{2m}{+}\cfrac{1}{2}1^{2m}{=}1 E[sij2m]=21(−1)2m+2112m=1和 E [ s i j ] = 0 \mathbb{E}[s_{ij}]{=}0 E[sij]=0(符合前提),且 E [ s i j 4 ] = B = 1 \mathbb{E}\left[s_{ij}^{4}\right]{=}B{=}1 E[sij4]=B=1
- 再取 L =1 L\text{=1} L=1代入套定理,推论得证
➡️推论二:对 ∀ ε δ > 0 / ∀ d ≥ 1 / ∀ x y ∈ R d ∀εδ{>}0/∀{d}{≥}1/∀xy{∈}\mathbb{R}^{d} ∀εδ>0/∀d≥1/∀xy∈Rd及 t = O ( 1 ε 2 log 1 δ ) t{=}O\left(\cfrac{1}{ε^{2}}\log\cfrac{1}{δ}\right) t=O(ε21logδ1),让 s i j ∼ U { – 1 , 1 } s_{ij}{∼}U\{–1,1\} sij∼U{–1,1}则 Pr [ ∣ ⟨ ψ ( u ) , ψ ( v ) ⟩ – ⟨ u , v ⟩ ∣ ≤ ε ] ≥ 1 – δ \text{Pr}\left[|⟨ψ(u),ψ(v)⟩–⟨u,v⟩|{≤}ε\right]{≥}1–δ Pr[∣⟨ψ(u),ψ(v)⟩–⟨u,v⟩∣≤ε]≥1–δ
- 极化恒等式
- 有 ⟨ ψ ( u ) , ψ ( v ) ⟩ = 1 d proj ⟨ S u , S v ⟩ = 1 4 ( ∥ S ( u + v ) ∥ 2 d proj – ∥ S ( u – v ) ∥ 2 d proj ) ⟨ψ(u),ψ(v)⟩{=}\cfrac{1}{d_{\text{proj}}}⟨\mathbf{S}u,\mathbf{S}v⟩{=}\cfrac{1}{4}\left(\cfrac{\|\mathbf{S}(u{+}v)\|^2}{d_{\text{proj}}}–\cfrac{\|\mathbf{S}(u–v)\|^2}{d_{\text{proj}}}\right) ⟨ψ(u),ψ(v)⟩=dproj1⟨Su,Sv⟩=41(dproj∥S(u+v)∥2–dproj∥S(u–v)∥2),及 ⟨ u , v ⟩ = 1 4 ( ∥ u + v ∥ 2 – ∥ u – v ∥ 2 ) ⟨u,v⟩{=}\cfrac{1}{4}\left(\|u{+}v\|^2–\|u–v\|^2\right) ⟨u,v⟩=41(∥u+v∥2–∥u–v∥2)
- 二者偏差为 ∣ 1 d proj ⟨ S u , S v ⟩ – ⟨ u , v ⟩ ∣ = 1 4 ( ( 1 d proj ∥ S ( u + v ) ∥ 2 – ∥ u + v ∥ 2 ) – ( 1 d proj ∥ S ( u – v ) ∥ 2 – ∥ u – v ∥ 2 ) ) \left|\cfrac{1}{d_{\text{proj}}}⟨\mathbf{S}u,\mathbf{S}v⟩–⟨u,v⟩\right|{=}\cfrac{1}{4}\left(\left(\cfrac{1}{d_{\text{proj}}}\|\mathbf{S}(u{+}v)\|^2–\|u{+}v\|^2\right)–\left(\cfrac{1}{d_{\text{proj}}}\|\mathbf{S}(u–v)\|^2–\|u–v\|^2\right)\right) dproj1⟨Su,Sv⟩–⟨u,v⟩ =41((dproj1∥S(u+v)∥2–∥u+v∥2)–(dproj1∥S(u–v)∥2–∥u–v∥2))
- 对 u + v u{+}v u+v及 u – v u–v u–v应用推论一
- 对 u ± v u{±}v u±v有 Pr [ 1 d proj ∥ S ( u ± v ) ∥ 2 ≤ ( 1 − ε ) ∥ u ± v ∥ 2 ] ≤ e – ( ε 2 – ε 3 ) t 4 \Pr\left[\cfrac{1}{d_{\text{proj}}}\|\mathbf{S}(u{±}v)\|^2{≤}(1{-}ε)\|u{±}v\|^2\right]{≤}e^{–\frac{\left(ε^2–ε^3\right)t}{4}} Pr[dproj1∥S(u±v)∥2≤(1−ε)∥u±v∥2]≤e–4(ε2–ε3)t, 1 − ε 1{-}ε 1−ε换成 1 + ε 1{+}ε 1+ε也成立
- 合并后 Pr [ 1 d proj ∥ S ( u + v ) ∥ 2 ∈ ( 1 ± ε ) ∥ u + v ∥ 2 ] ≥ 1 – 2 e – ( ε 2 – ε 3 ) t 4 \Pr\left[\cfrac{1}{d_{\text{proj}}}\left\|\mathbf{S}(u{+}v)\right\|^2{∈}(1{±}ε)\|u{+}v\|^2\right]{≥}1–2e^{–\frac{\left(ε^2–ε^3\right)t}{4}} Pr[dproj1∥S(u+v)∥2∈(1±ε)∥u+v∥2]≥1–2e–4(ε2–ε3)t, u + v u{+}v u+v换成 u − v u{-}v u−v也成立
- 进而 max { 1 d proj ∥ S ( u + v ) ∥ 2 − ∥ u + v ∥ 2 } = ε ∥ u + v ∥ 2 \max\left\{\cfrac{1}{d_{\text{proj}}}\|\mathbf{S}(u{+}v)\|^2{-}\|u{+}v\|^2\right\}{=}ε\|u{+}v\|^2 max{dproj1∥S(u+v)∥2−∥u+v∥2}=ε∥u+v∥2和 min { 1 d proj ∥ S ( u − v ) ∥ 2 − ∥ u − v ∥ 2 } = – ε ∥ u – v ∥ 2 \min\left\{\cfrac{1}{d_{\text{proj}}}\|\mathbf{S}(u{-}v)\|^2{-}\|u{-}v\|^2\right\}{=}–ε\|u–v\|^2 min{dproj1∥S(u−v)∥2−∥u−v∥2}=–ε∥u–v∥2
- 代回 ∣ 1 d proj ⟨ S u , S v ⟩ – ⟨ u , v ⟩ ∣ ≤ ε 4 ( ∥ u + v ∥ 2 + ∥ u – v ∥ 2 ) = ε \left|\cfrac{1}{d_{\text{proj}}}⟨\mathbf{S}u,\mathbf{S}v⟩–⟨u,v⟩\right|{≤}\cfrac{ε}{4}\left(\|u{+}v\|^2{+}\|u–v\|^2\right){=}ε dproj1⟨Su,Sv⟩–⟨u,v⟩ ≤4ε(∥u+v∥2+∥u–v∥2)=ε
- 概率修正:
- 由 t = O ( 1 ε 2 log 1 δ ) t{=}O\left(\cfrac{1}{ε^{2}}\log\cfrac{1}{δ}\right) t=O(ε21logδ1)假设 t ≥ 4 ε 2 – ε 3 ln 4 δ t{≥}\cfrac{4}{ε^2–ε^3}\ln\cfrac{4}{δ} t≥ε2–ε34lnδ4,即 2 e – ( ε 2 – ε 3 ) t 4 ≤ δ 2 2e^{–\frac{\left(ε^2–ε^3\right)t}{4}}{≤}\cfrac{δ}{2} 2e–4(ε2–ε3)t≤2δ
- 故 Pr [ 1 d proj ∥ S ( u + v ) ∥ 2 ∈ ( 1 ± ε ) ∥ u + v ∥ 2 ] ≥ 1 – δ 2 \Pr\left[\cfrac{1}{d_{\text{proj}}}\left\|\mathbf{S}(u{+}v)\right\|^2{∈}(1{±}ε)\|u{+}v\|^2\right]{≥}1–\cfrac{δ}{2} Pr[dproj1∥S(u+v)∥2∈(1±ε)∥u+v∥2]≥1–2δ, u + v u{+}v u+v换成 u − v u{-}v u−v也成立
- 故 Pr [ ∣ ⟨ ψ ( u ) , ψ ( v ) ⟩ – ⟨ u , v ⟩ ∣ ≤ ε ] ≥ 1 – δ \text{Pr}\left[|⟨ψ(u),ψ(v)⟩–⟨u,v⟩|{≤}ε\right]{≥}1–δ Pr[∣⟨ψ(u),ψ(v)⟩–⟨u,v⟩∣≤ε]≥1–δ
2. \textbf{2. } 2. 事实 A.4 \textbf{A.4} A.4的证明
2.1. Muvera \textbf{2.1. Muvera} 2.1. Muvera前两步
1️⃣文本嵌入:得到查询和段落文本各自的多向量嵌入
- 查询嵌入 Q Q Q: { q 1 , . . . , q m } \{q_1,...,q_m\} {q1,...,qm},其中 q i ⊆ R d q_i{⊆}\mathbb{R}^{d} qi⊆Rd即为固定 d d d维
- 段落嵌入 P P P: { p 1 , . . . , p n } \{p_1,...,p_n\} {p1,...,pn},其中 p i ⊆ R d p_i{⊆}\mathbb{R}^{d} pi⊆Rd即为固定 d d d维
2️⃣向量分桶:用 SimHash \text{SimHash} SimHash将原有空间分为 2 k sim 2^{k_{\text{sim}}} 2ksim个桶,每个桶用长为 k sim k_{\text{sim}} ksim的二进制向量编码
- 法向抽取:从高斯分布中抽取 k sim ≥ 1 k_{\text{sim}}{≥}1 ksim≥1个向量 g 1 , … , g k sim ∈ R d g_{1},\ldots,g_{k_{\text{sim}}}{∈}\mathbb{R}^{d} g1,…,gksim∈Rd,作为 k sim k_{\text{sim}} ksim个超平面的法向量
- 空间划分: φ ( x ) = ( 1 ( ⟨ g 1 , x ⟩ > 0 ) , … , 1 ( ⟨ g k sim , x ⟩ > 0 ) ) φ(x){=}(\mathbf{1}(⟨g_{1},x⟩{>}0),\ldots,\mathbf{1}(⟨g_{k_{\text{sim}}},x⟩{>}0)) φ(x)=(1(⟨g1,x⟩>0),…,1(⟨gksim,x⟩>0))
- 1 ( ⟨ g i , x ⟩ > 0 ) \mathbf{1}(⟨g_i,x⟩{>}0) 1(⟨gi,x⟩>0):当 ⟨ g i , x ⟩ > 0 ⟨g_i,x⟩{>}0 ⟨gi,x⟩>0成立(即 x x x投影在超平面 g i g_i gi的正侧)时,将该位设为 1 1 1
- 向量分桶:让所有的 m + n m{+}n m+n个嵌入通过 φ ( ⋅ ) φ(\cdot) φ(⋅)得到长 k sim k_{\text{sim}} ksim的二进制编码,相同编码者放入同一桶
2.2. \textbf{2.2. } 2.2. 声明的内容
👉记号:记 ∀ x , y ∈ R b ∀x,y{∈}\mathbb{R}^b ∀x,y∈Rb夹角为 θ ( x , y ) ∈ [ 0 , π ] θ(x,y){∈}[0,\pi] θ(x,y)∈[0,π],二进制编码 x , y x,y x,y的海明距离为 ∥ x – y ∥ 0 \|x–y\|_{0} ∥x–y∥0
👉前提 1 1 1:对 ∀ q i ∈ Q ∀q_i{∈}Q ∀qi∈Q以及 ∀ p j ∈ P ∀p_j{∈}P ∀pj∈P,给定 ∀ ε ≤ 1 2 ∀ε\text{≤}\cfrac{1}{2} ∀ε≤21(与定理 2.1 \text{2.1} 2.1统一)与 ∀ δ ≤ ε ∀δ{≤}ε ∀δ≤ε
👉前提 2 2 2:令 k sim = O ( 1 ε log m δ ) k_{\text{sim}}{=}O\left(\cfrac{1}{ε}\log{\cfrac{m}{δ}}\right) ksim=O(ε1logδm),其中 m = ∣ Q ∣ + ∣ P ∣ m{=}|Q|{+}|P| m=∣Q∣+∣P∣
👉结论: ∣ ∥ φ ( q i ) – φ ( p j ) ∥ 0 − k s i m θ ( q i , p j ) π ∣ ≤ ε k s i m \left|\|φ(q_i)–φ(p_j)\|_0{-}\cfrac{k_{\mathrm{sim}}θ(q_i,p_j)}{\pi}\right|\text{≤}\sqrt{ε}k_{\mathrm{sim}} ∥φ(qi)–φ(pj)∥0−πksimθ(qi,pj) ≤εksim以 Pr ≥ 1 – ( ε δ m 2 ) \Pr{≥}1–\left(\cfrac{εδ}{m^2}\right) Pr≥1–(m2εδ)的概率成立
2.3. \textbf{2.3. } 2.3. 声明的证明
➡️由以下分析可得 Pr [ 1 ( ⟨ g k , q i ⟩ > 0 ) ≠ 1 ( ⟨ g k , p j ⟩ > 0 ) ] = θ ( q i , p j ) π \Pr[\mathbf{1}(⟨g_k,q_i⟩{>}0)\neq\mathbf{1}(⟨g_k, p_j⟩{>}0)]{=}\cfrac{θ(q_i,p_j)}{\pi} Pr[1(⟨gk,qi⟩>0)=1(⟨gk,pj⟩>0)]=πθ(qi,pj)
- 通过 Gram-Schmidt \text{Gram-Schmidt} Gram-Schmidt过程,对 q i , p j q_i,p_j qi,pj进行 R \mathbf{R} R旋转
- 令 d d d维空间基为 B = { e ⃗ 1 , e ⃗ 2 , . . . , e ⃗ d } B{=}\{\vec{e}_1,\vec{e}_2,...,\vec{e}_d\} B={e1,e2,...,ed}满足 ∀ e ⃗ i , e ⃗ j ∈ B ∀{\vec{e}_i,\vec{e}_j}{∈}B ∀ei,ej∈B有 e ⃗ i ⊥ e ⃗ j \vec{e}_i\text{⊥}\vec{e}_j ei⊥ej,让旋转矩阵 R \mathbf{R} R将 q i q_i qi旋转到 e ⃗ 1 \vec{e}_1 e1方向即 R q i = ( 1 , 0 , . . . , 0 ) d \mathbf{R}q_i{=}(1,0,...,0)^d Rqi=(1,0,...,0)d
- 只考虑 e ⃗ 1 \vec{e}_1 e1及 e ⃗ 2 \vec{e}_2 e2组成的二维平面,则 R p j = e ⃗ 1 cos ( R q i , R p j ) + e ⃗ 2 sin ( R q i , R p j ) \mathbf{R}p_j{=}\vec{e}_1\cos{(\mathbf{R}q_i,\mathbf{R}p_j)}{+}\vec{e}_2\sin{(\mathbf{R}q_i,\mathbf{R}p_j)} Rpj=e1cos(Rqi,Rpj)+e2sin(Rqi,Rpj)
- 因 θ ( R q i , R p j ) = θ ( q i , p j ) θ{(\mathbf{R}q_i,\mathbf{R}p_j)}{=}θ{(q_i,p_j)} θ(Rqi,Rpj)=θ(qi,pj),则 R p j = e 1 cos θ ( q i , p j ) + e 2 sin θ ( q i , p j ) = ( cos θ ( q i , p j ) , sin θ ( q i , p j ) , 0 , . . . , 0 ) d \mathbf{R}p_j{=}e_1\cos{θ(q_i,p_j)}{+}e_2\sinθ{(q_i,p_j)}{=}(\cos{θ(q_i,p_j)},\sin{θ(q_i,p_j)},0,...,0)^d Rpj=e1cosθ(qi,pj)+e2sinθ(qi,pj)=(cosθ(qi,pj),sinθ(qi,pj),0,...,0)d
- 对 g k g_k gk进行 R \mathbf{R} R旋转
- (旋转不变性)对高斯向量 g k ∈ R d × 1 ∼ N ( 0 , I d ) g_k{∈}\mathbb{R}^{d\text{×}1}\text{∼}\mathcal{N}(\textbf{0},\boldsymbol{\text{I}_d}) gk∈Rd×1∼N(0,Id)及旋转矩阵 R ∈ R d × d \mathbf{R}{∈}\mathbb{R}^{d\text{×}d} R∈Rd×d,有 R g k ∼ N ( 0 , I d ) \mathbf{R}g_k\text{∼}\mathcal{N}(\textbf{0},\boldsymbol{\text{I}_d}) Rgk∼N(0,Id)
- 故 R g k = ( g k 1 , . . . , g k d ) ∼ N ( 0 , I d ) \mathbf{R}g_k{=}(g_{k_1},...,g_{k_d})\text{∼}\mathcal{N}(\textbf{0},\boldsymbol{I_d}) Rgk=(gk1,...,gkd)∼N(0,Id),其中 g k 1 ∼ N ( 0 , 1 ) g_{k_1}\text{∼}{\mathcal{N}(0,1)} gk1∼N(0,1)和 g k 2 ∼ N ( 0 , 1 ) g_{k_2}\text{∼}{\mathcal{N}(0,1)} gk2∼N(0,1)
- 而 R \mathbf{R} R在对 x , y x,y x,y旋转时仅前两位起作用,故可令 R g k = ( g k 1 , g k 2 , 0 , . . . , 0 ) d \mathbf{R}g_k{=}(g_{k_1},g_{k_2},0,...,0)^d Rgk=(gk1,gk2,0,...,0)d,或 R g = ( cos ϕ , sin ϕ , 0 , . . . , 0 ) d \mathbf{R}g{=}(\cosϕ,\sinϕ,0,...,0)^d Rg=(cosϕ,sinϕ,0,...,0)d
- 求相应内积
- ⟨ g k , q i ⟩ = ⟨ R g k , R q i ⟩ = ( g k 1 , g k 2 , . . . , g k d ) ( 1 , 0 , . . . , 0 ) d T = cos ϕ ⟨g_k,q_i⟩{=}⟨\mathbf{R}g_k,\mathbf{R}q_i⟩{=}(g_{k_1},g_{k_2},...,g_{k_d})(1,0,...,0)^{dT}{=}\cosϕ ⟨gk,qi⟩=⟨Rgk,Rqi⟩=(gk1,gk2,...,gkd)(1,0,...,0)dT=cosϕ
- ⟨ g k , p j ⟩ = ⟨ R g k , R p j ⟩ = ( cos ϕ , sin ϕ , 0 , . . . , 0 ) d ( cos θ ( q i , p j ) , sin θ ( q i , p j ) , 0 , . . . , 0 ) d T = cos ( ϕ – θ ( q i , p j ) ) ⟨g_k,p_j⟩{=}⟨\mathbf{R}g_k,\mathbf{R}p_j⟩{=}(\cosϕ,\sinϕ,0,...,0)^d(\cos{θ(q_i,p_j)},\sin{θ(q_i,p_j)},0,...,0)^{dT}{=}\cos(ϕ–θ(q_i,p_j)) ⟨gk,pj⟩=⟨Rgk,Rpj⟩=(cosϕ,sinϕ,0,...,0)d(cosθ(qi,pj),sinθ(qi,pj),0,...,0)dT=cos(ϕ–θ(qi,pj))
- 整理 Pr [ 1 ( ⟨ g k , q i ⟩ > 0 ) ≠ 1 ( ⟨ g k , p j ⟩ > 0 ) ] \Pr[\mathbf{1}(⟨g_k,q_i⟩{>}0)\neq\mathbf{1}(⟨g_{k},p_j⟩{>}0)] Pr[1(⟨gk,qi⟩>0)=1(⟨gk,pj⟩>0)],变为 Pr [ 1 ( cos ϕ > 0 ) ≠ 1 ( cos ( ϕ – θ ( q i , p j ) ) > 0 ) ] \Pr\left[\mathbf{1}\left(\cosϕ{>}0\right)\neq\mathbf{1}\left(\cos(ϕ–θ(q_i,p_j)){>}0\right)\right] Pr[1(cosϕ>0)=1(cos(ϕ–θ(qi,pj))>0)]
- 情形 1 1 1: cos ϕ > 0 \cosϕ{>}0 cosϕ>0且 cos ( ϕ – θ ( q i , p j ) ) ≤ 0 \cos(ϕ–θ(q_i,p_j))\text{≤}0 cos(ϕ–θ(qi,pj))≤0,则 ϕ ∈ ( – π 2 , π 2 ) ϕ{∈}\left(–\cfrac{\pi}{2},\cfrac{\pi}{2}\right) ϕ∈(–2π,2π)及 ϕ ∈ ( θ ( q i , p j ) + π 2 , θ ( q i , p j ) + 3 π 2 ) ϕ{∈}\left(θ(q_i,p_j){+}\cfrac{\pi}{2},θ(q_i,p_j){+}\cfrac{3\pi}{2}\right) ϕ∈(θ(qi,pj)+2π,θ(qi,pj)+23π),交集长 ∣ ϕ ∣ = θ ( q i , p j ) |ϕ|{=}θ(q_i,p_j) ∣ϕ∣=θ(qi,pj)
- 情形 2 2 2: cos ϕ < 0 \cosϕ\text{<}0 cosϕ<0且 cos ( ϕ – θ ( q i , p j ) ) ≥ 0 \cos(ϕ–θ(q_i,p_j)){≥}0 cos(ϕ–θ(qi,pj))≥0,则 ϕ ∈ ( π 2 , 3 π 2 ) ϕ{∈}\left(\cfrac{\pi}{2},\cfrac{3\pi}{2}\right) ϕ∈(2π,23π)及 ϕ ∈ ( θ ( q i , p j ) – π 2 , θ ( q i , p j ) + π 2 ) ϕ{∈}\left(θ(q_i,p_j)\text{–}\cfrac{\pi}{2},θ(q_i,p_j){+}\cfrac{\pi}{2}\right) ϕ∈(θ(qi,pj)–2π,θ(qi,pj)+2π),交集长 ∣ ϕ ∣ = θ ( q i , p j ) |ϕ|{=}θ(q_i,p_j) ∣ϕ∣=θ(qi,pj)
- 而 ϕ ϕ ϕ总范围为 ∣ ϕ ∣ max = 2 π |ϕ|_{\max}{=}2\pi ∣ϕ∣max=2π,故 Pr [ 1 ( ⟨ g k , q i ⟩ > 0 ) ≠ 1 ( ⟨ g k , p j ⟩ > 0 ) ] = 2 θ ( q i , p j ) 2 π \Pr[\mathbf{1}(⟨g_k,q_i⟩{>}0)\neq\mathbf{1}(⟨g_{k},p_j⟩{>}0)]{=}\cfrac{2θ(q_i,p_j)}{2\pi} Pr[1(⟨gk,qi⟩>0)=1(⟨gk,pj⟩>0)]=2π2θ(qi,pj)
➡️构造变量 Z k Z_k Zk
- 对每个 k ∈ { 1 , 2 , . . . , k sim } k{∈}\{1,2,...,k_{\text{sim}}\} k∈{1,2,...,ksim}及对应的高斯向量 g k g_k gk,定义 Z k = 1 ( ⟨ g k , q i ⟩ > 0 ) ⊕ 1 ( ⟨ g k , p j ⟩ > 0 ) Z_k{=}\mathbf{1}(⟨ g_k, q_i⟩{>}0){⊕}\mathbf{1}(⟨ g_k, p_j⟩{>}0) Zk=1(⟨gk,qi⟩>0)⊕1(⟨gk,pj⟩>0),即二者不相等时 Z k = 1 Z_k{=}1 Zk=1
- 由海明距离的定义(两二进制编码上下对齐后有多少对应位不同),则有 ∥ φ ( q i ) – φ ( p j ) ∥ 0 = ∑ k = 1 k sim Z k \|φ(q_i)–φ(p_j)\|_0{=}\displaystyle\sum_{k{=}1}^{k_{\text{sim}}}Z_k ∥φ(qi)–φ(pj)∥0=k=1∑ksimZk
- 而 Pr [ 1 ( ⟨ g k , q i ⟩ > 0 ) ≠ 1 ( ⟨ g k , p j ⟩ > 0 ) ] = θ ( q i , p j ) π \Pr[\mathbf{1}(⟨g_k,q_i⟩{>}0)\neq\mathbf{1}(⟨g_k, p_j⟩{>}0)]{=}\cfrac{θ(q_i,p_j)}{\pi} Pr[1(⟨gk,qi⟩>0)=1(⟨gk,pj⟩>0)]=πθ(qi,pj)故 E [ Z k ] = 1 ( θ ( q i , p j ) π ) + 0 ( 1 – θ ( q i , p j ) π ) = θ ( q i , p j ) π \mathbb{E}\left[Z_k\right]{=}1\left(\cfrac{θ(q_i,p_j)}{\pi}\right){+}0\left(1–\cfrac{θ(q_i,p_j)}{\pi}\right){=}\cfrac{θ(q_i,p_j)}{\pi} E[Zk]=1(πθ(qi,pj))+0(1–πθ(qi,pj))=πθ(qi,pj)
- 故 ∣ ∥ φ ( q i ) – φ ( p j ) ∥ 0 − k s i m θ ( q i , p j ) π ∣ = ∣ ∑ k = 1 k sim Z k − E [ ∑ k = 1 k sim Z k ] ∣ \left|\|φ(q_i)–φ(p_j)\|_0{-}\cfrac{k_{\mathrm{sim}}θ(q_i,p_j)}{\pi}\right|{=}\left|\displaystyle\sum_{k{=}1}^{k_{\text{sim}}}Z_k{-}\mathbb{E}\left[\sum_{k{=}1}^{k_{\text{sim}}}Z_k\right]\right| ∥φ(qi)–φ(pj)∥0−πksimθ(qi,pj) = k=1∑ksimZk−E[k=1∑ksimZk] ,故需证 Pr [ ∣ ∑ k = 1 k sim Z k − E [ ∑ k = 1 k sim Z k ] ∣ ≥ ε k sim ] ≤ ( ε δ m 2 ) \Pr\left[\left|\displaystyle\sum_{k{=}1}^{k_{\text{sim}}}Z_k{-}\mathbb{E}\left[\sum_{k{=}1}^{k_{\text{sim}}}Z_k\right]\right|{≥}\sqrt{ε}k_{\text{sim}}\right]\text{≤}\left(\cfrac{εδ}{m^2}\right) Pr[ k=1∑ksimZk−E[k=1∑ksimZk] ≥εksim]≤(m2εδ)
➡️进一步转换
- 由 Hoeffding \text{Hoeffding} Hoeffding不等式,则 Z k ∈ [ 0 , 1 ] Z_k{∈}[0,1] Zk∈[0,1]有 Pr [ ∣ ∑ k = 1 n Z k − E [ ∑ k = 1 n Z k ] ∣ ≥ t ] ≤ 2 e – 2 t 2 n \Pr\left[\left|\displaystyle\sum_{k{=}1}^{n}Z_k{-}\mathbb{E}\left[\sum_{k{=}1}^{n}Z_k\right]\right|{≥}t\right]\text{≤}2e^{–\frac{2t^2}{n}} Pr[ k=1∑nZk−E[k=1∑nZk] ≥t]≤2e–n2t2
- 令 t = ε k sim t{=}\sqrt{ε}k_{\text{sim}} t=εksim与 n = k sim n{=}k_{\text{sim}} n=ksim,则 Pr [ ∣ ∑ k = 1 k sim Z k − E [ ∑ k = 1 k sim Z k ] ∣ ≥ ε k sim ] ≤ 2 e – 2 ε k sim \Pr\left[\left|\displaystyle\sum_{k{=}1}^{k_{\text{sim}}}Z_k{-}\mathbb{E}\left[\sum_{k{=}1}^{k_{\text{sim}}}Z_k\right]\right|{≥}\sqrt{ε}k_{\text{sim}}\right]\text{≤}2e^{–2εk_{\text{sim}}} Pr[ k=1∑ksimZk−E[k=1∑ksimZk] ≥εksim]≤2e–2εksim,需证 2 e – 2 ε k sim ≤ ε δ m 2 2e^{–2εk_{\text{sim}}}\text{≤}\cfrac{εδ}{m^2} 2e–2εksim≤m2εδ即 k s i m ≥ 1 2 ε ln 2 m 2 ε δ k_{\mathrm{sim}}{≥}\cfrac{1}{2ε}\ln\cfrac{2m^2}{εδ} ksim≥2ε1lnεδ2m2
- 只需验证 k sim = O ( 1 ε log m δ ) ≥ 1 2 ε ln 2 m 2 ε δ k_{\text{sim}}{=}O\left(\cfrac{1}{ε}\log{\cfrac{m}{δ}}\right){≥}\cfrac{1}{2ε}\ln\cfrac{2m^2}{εδ} ksim=O(ε1logδm)≥2ε1lnεδ2m2
- 令 C ε ln m δ ≥ k sim ≥ 1 2 ε ln 2 m 2 ε δ \cfrac{C}{ε}\ln\cfrac{m}{δ}{≥}k_{\text{sim}}{≥}\cfrac{1}{2ε}\ln\cfrac{2m^2}{ε δ} εClnδm≥ksim≥2ε1lnεδ2m2,需验证 C ε ln m δ ≥ 1 2 ε ln 2 m 2 ε δ \cfrac{C}{ε}\ln\cfrac{m}{δ}{≥}\cfrac{1}{2ε}\ln\cfrac{2m^2}{εδ} εClnδm≥2ε1lnεδ2m2,即上界足以覆盖下界
- 故 ( 2 C – 2 ) ln m – ( 2 C – 1 ) ln δ ≥ ln 2 – ln ε (2C–2)\ln{m}–(2C–1)\ln{δ}{≥}\ln2–\ln{ε} (2C–2)lnm–(2C–1)lnδ≥ln2–lnε
- 令 { 2 C – 2 ≥ 1 2 C – 1 ≥ 1 \begin{cases}2C–2{≥}1\\2C–1{≥}1\end{cases} {2C–2≥12C–1≥1及 { 2 ≤ m δ ≤ ε \begin{cases}2{≤}m\\δ{≤}ε\end{cases} {2≤mδ≤ε则上式成立,解得 C ≥ 3 2 C{≥}\cfrac{3}{2} C≥23及 m ≥ 2 , δ ≤ ε m{≥}2,δ{≤}ε m≥2,δ≤ε,其中 m = ∣ Q ∣ + ∣ P ∣ ≥ 2 m{=}|Q|{+}|P|{≥}2 m=∣Q∣+∣P∣≥2隐性成立
- 只需让 δ ≤ ε δ{≤}ε δ≤ε结论 O ( 1 ε log m δ ) ≥ 1 2 ε ln 2 m 2 ε δ O\left(\cfrac{1}{ε}\log{\cfrac{m}{δ}}\right){≥}\cfrac{1}{2ε}\ln\cfrac{2m^2}{εδ} O(ε1logδm)≥2ε1lnεδ2m2就成立,证毕