交叉熵 vs focal loss
交叉熵
C E ( p , t ) = C E ( p t ) = − l o g ( p t ) CE(p,t)=CE(p_t)=-log(p_t) CE(p,t)=CE(pt)=−log(pt)
- 简单样本,预测概率很高的: 如0.9
p = 0.9 , C E = − l o g 2 ( 0.9 ) = 0.152 p=0.9,CE = -log_2(0.9)=0.152 p=0.9,CE=−log2(0.9)=0.152 - 困难样本,预测概率很低的: 如0.2
p = 0.2 , C E = − l o g 2 ( 0.2 ) = 2.32 p=0.2,CE = -log_2(0.2)=2.32 p=0.2,CE=−log2(0.2)=2.32
focal loss
F L = − ( 1 − p t ) γ l o g ( p t ) , e x p : γ = 2 FL=-(1-p_t)^\gamma log(p_t),exp: \gamma=2 FL=−(1−pt)γlog(pt),exp:γ=2
$$
- 简单样本,预测概率很高的: 如0.9,简单样本缩小近100倍
p = 0.9 , F L = − ( 1 − 0.9 ) 2 × l o g ( 0.9 ) = 0.00152 p=0.9,FL = -(1-0.9)^2\times log(0.9)=0.00152 p=0.9,FL=−(1−0.9)2×log(0.9)=0.00152 - 困难样本,预测概率很低的: 如0.2,困难样本只缩小了不到2倍
p = 0.2 , F L = − ( 1 − 0.2 ) 2 × l o g ( 0.2 ) = 1.486 p=0.2,FL = -(1-0.2)^2\times log(0.2)=1.486 p=0.2,FL=−(1−0.2)2×log(0.2)=1.486
实际用的focal loss
- 实际用的会对类别也加上一个权重 α \alpha α
F L = − α ( 1 − p t ) γ l o g ( p t ) , e x p : γ = 2 FL=-\alpha(1-p_t)^\gamma log(p_t),exp: \gamma=2 FL=−α(1−pt)γlog(pt),exp:γ=2