未初始化引用检测工具UBITech
UBITech
- 1.Introduction
- 2.Qualifier Inference And Check
- 参考文献
artifact地址
1.Introduction
作者实现了一个未初始化变量引用(use-before initialization, UBI)检测工具,比如下面代码中,当 cpg->eng_st != ENGINE_IDLE
时,backlog
不一定为空指针但可能未初始化,同时 backlog->complete
也可能指向任意地址。作者发现很多UBI bug需要path-sensitive的分析才能精确找到。为了在效率很精度取得平衡,作者结合flow-sensitive type qualifier inference(参考文献 [ 2 ] ^{[2]} [2])和符号执行(klee)实现UBITech,整个工具的workflow包括:
-
1.调用分析: 包括 (1).间接调用分析,根据paper描述推断采用的是FLTA算法,也就是纯类型匹配,不过实现上提供了一个data-flow分析方法、(2).构造caller-callee依赖树、(3).分析递归调用链。
-
2.type qualifier inference: 这是一个flow- & field- & context-sensitive的跨函数分析。
-
3.under-constrained符号执行:针对一个UBI report(包含变量allocate处以及use处),作者使用klee尝试从allocate处找到一个到use处的可行路径,没有的话则过滤掉。由于采用的只是partial execution,符号执行的精度会比full execution时候差,也会导致一些误报没被过滤掉。
static int queue_manag(void *data) {/* backlog is defined without initialization */struct crypto_async_request *backlog;if (cpg->eng_st == ENGINE_IDLE)backlog = crypto_get_backlog(&cpg->queue);if (backlog)/* uninitialized pointer dereferenced! */backlog->complete(backlog, -EINPROGRESS);return 0;
}
Call Graph Construction
提供了两种间接调用分析策略:简单的data-flow分析以及保守的类型匹配。data-flow分析规则如下所示,这里 pt ( v ) \text{pt}(v) pt(v) 表示value或者指令 v v v 引用的函数集合,store
, load
, ret
直接用指令的id而不是对应value进行标识。这个data-flow分析还是比较简单,没有考虑 gep
指令也没有考虑别名情况,感觉比类型匹配会多出不少误报。
语句类型 | 规则 |
---|---|
constant : v = f v = f v=f (这里value直接是函数) | f ∈ pt ( v ) f \in \text{pt}(v) f∈pt(v) |
copy (包括 cast ): l : v = v 1 l: v = v_1 l:v=v1 | pt ( v 1 ) ⊆ pt ( v ) \text{pt}(v_1) \subseteq \text{pt}(v) pt(v1)⊆pt(v) |
phi l : v = ϕ ( v 1 , v 2 , . . . , v n ) l: v = \phi(v_1, v_2, ..., v_n) l:v=ϕ(v1,v2,...,vn) | pt ( v ) = ∪ 1 ≤ i ≤ n pt ( v i ) \text{pt}(v) = \cup_{1 \leq i \leq n} \text{pt}(v_i) pt(v)=∪1≤i≤npt(vi) |
select l : v = v 1 ⊙ v 2 l: v = v_1 \odot v_2 l:v=v1⊙v2 (中间表示 select 操作) | pt ( v ) = pt ( v 1 ) ∪ pt ( v 2 ) \text{pt}(v) = \text{pt}(v_1) \cup \text{pt}(v_2) pt(v)=pt(v1)∪pt(v2) |
binary l : v = v 1 ⊙ v 2 l: v = v_1 \odot v_2 l:v=v1⊙v2, 只考虑 v 1 v_1 v1 或者 v 2 v_2 v2 为常量的情况 | pt ( v ) = pt ( v 1 / v 2 ) \text{pt}(v) = \text{pt}(v_1/v_2) pt(v)=pt(v1/v2) |
ret l : ret v l: \text{ret} \; v l:retv | pt ( v ) ⊆ pt ( l ) \text{pt}(v) \subseteq \text{pt}(l) pt(v)⊆pt(l) |
call l : v = f ( . . . ) l: v = f(...) l:v=f(...) | pt ( ret ( f ) ) ⊆ pt ( v ) \text{pt}(\text{ret}(f)) \subseteq \text{pt}(v) pt(ret(f))⊆pt(v) |
load l : v = ∗ p l: v = *p l:v=∗p | pt ( l ) ⊆ pt ( v ) \text{pt}(l) \subseteq \text{pt}(v) pt(l)⊆pt(v) |
store l : ∗ p = v l: *p = v l:∗p=v | pt ( v ) ⊆ pt ( l ) \text{pt}(v) \subseteq \text{pt}(l) pt(v)⊆pt(l) |
Type Qualifier Inference
type qualifier inference是静态分析部分的核心,其架构如下图所示,首先会以bottom-up的方式遍历调用图,为每个函数生成function summary并将summary应用到caller分析过程。而一个函数的summary生成过程涉及point-to analysis、alias analysis和qualifier inference 3个部分。这3个部分都是基于worklist data flow分析(参考blog))算法实现。
point-to analysis是基于一个函数内的flow-sensitive指针分析算法,这里的fact是pts集,FuncAnalysis::nPtsGraph保存每个Instruction Out处的pts集,这里UBITech会先将LLVM的 Value
映射为 id
然后用 bitvec
进行运算。这里的指针分析不同的倒是在于 store
的语句处理: l : ∗ p = q l: *p = q l:∗p=q,中有 ∀ o ∈ pts ( p ) pts ( o ) = pts ( q ) \forall_{o \in \text{pts}(p)} \text{pts}(o) = \text{pts}(q) ∀o∈pts(p)pts(o)=pts(q),也就是直接进行strong update。然后就是call语句的处理,只考虑 malloc
和 memcpy
处理;对于 malloc
调用会按照SVF的方式创建heap object,对于 l: memcpy(p, q)
如果 p, q
都是指针,那么 pts ( q ) ⊆ pts ( p ) \text{pts}(q) \subseteq \text{pts}(p) pts(q)⊆pts(p)。
alias analysis同样基于flow-sensitive的worklist data flow分析算法。每个指令对应的fact为别名集合,类型为 map<unsigned, set<unsigned>>
,也就是将 Value
v v v 映射到具有别名关系的 Value
集合。具体规则如下,这里 aa l ( v ) \text{aa}_l(v) aal(v) 表示 v v v 在语句 l l l 处的别名集合, aa l p \text{aa}_{l_p} aalp 表示 l l l 的In处的别名集合,可能是前一条指令的别名集合也可能是多个直接前驱指令别名集合的并集。copy
包括 bitcast
, zext
, sext
, trunc
;对于 ptrtoint
,如果有 z = inttoptr(y), y = ptrtoint(x)
或者 y = ptrtoint(x) + offset
,那么存在 copy
: z = x z = x z=x。update语句的AA set前会先将 aa l p ⊆ aa l \text{aa}_{l_p} \subseteq \text{aa}_l aalp⊆aal。store
则会直接进行strong update。gep
语句的处理则很保守,只考虑偏移为0的情况,其它情况则直接将 aa l ( v ) = v \text{aa}_l(v) = v aal(v)=v,kill掉其它值。这个别名分析理论上并不sound,不过或许engineer方面好用。
语句 | update规则 |
---|---|
alloca : l : v = & o l: v = \&o l:v=&o | v ∈ aa l ( v ) v \in \text{aa}_l(v) v∈aal(v) |
copy : l : v = p l: v = p l:v=p | v ∈ aa l ( v ) , aa l p ( p ) ⊆ aa l ( v ) v \in \text{aa}_l(v), \; \text{aa}_{l_p}(p) \subseteq \text{aa}_l(v) v∈aal(v),aalp(p)⊆aal(v) |
phi : l : v = ϕ ( v 1 , . . . , v n ) l: v = \phi(v_1, ..., v_n) l:v=ϕ(v1,...,vn) | v ∈ aa l ( v ) , ∀ 1 ≤ i ≤ n ( v i ∈ aa l ( v ) , aa l p ( v i ) ⊆ aa l ( v ) ) v \in \text{aa}_l(v), \; \forall_{1 \leq i \leq n}(v_i \in \text{aa}_l(v), \; \text{aa}_{l_p}(v_i) \subseteq \text{aa}_l(v)) v∈aal(v),∀1≤i≤n(vi∈aal(v),aalp(vi)⊆aal(v)) |
select : l : v = v 1 ⊙ v 2 l: v = v_1 \odot v_2 l:v=v1⊙v2 | { v , v 1 , v 2 } ⊆ aa l ( v ) \{v, v_1, v_2\} \subseteq \text{aa}_l(v) {v,v1,v2}⊆aal(v) |
load : l : v = ∗ p l:v = *p l:v=∗p | v ∈ aa l ( v ) , ∀ o ∈ pts ( p ) aa l p ( o ) ⊆ aa l ( v ) v \in \text{aa}_l(v), \; \forall_{o \in \text{pts}(p)} \text{aa}_{l_p}(o) \subseteq \text{aa}_l(v) v∈aal(v),∀o∈pts(p)aalp(o)⊆aal(v) |
store : l : ∗ v = p l: *v = p l:∗v=p | ∀ o ∈ pts ( v ) aa l ( o ) = aa l p ( p ) \forall_{o \in \text{pts}(v)} \text{aa}_l(o) = \text{aa}_{l_p}(p) ∀o∈pts(v)aal(o)=aalp(p) |
gep : l : v = p . f l: v = p.f l:v=p.f | aa l ( v ) = { v , p } ∣ f = 0 \text{aa}_l(v) = \{v, p\} \mid f = 0 aal(v)={v,p}∣f=0 |
other | Non |
2.Qualifier Inference And Check
Qualify Inference将每个value(top-level pointer和address-taken variable)映射到一个type qualifier。qualifier包括 { init , uninit , unk } \{ \text{init}, \text{uninit}, \text{unk}\} {init,uninit,unk}(偏序关系从左到右依次增加,也就是 init ⊔ uninit = uninit \text{init} \sqcup \text{uninit} = \text{uninit} init⊔uninit=uninit,不过对于 unk \text{unk} unk 通常会重新计算值而不是参与偏序运算),初始时每个value的状态为 unk \text{unk} unk。用 tq l ( v ) \text{tq}_l(v) tql(v) 表示value v v v 在 l l l 处的type qualifier。除了type qualifier外还定义了一个辅助用的 related map
rm \text{rm} rm,保存与value v v v 相关的其它值。系统调用UBITech对 malloc
, zalloc
, memset
(init
类), memcpy
(copy
类), copy_to_user
(transfer
类) 等均进行了建模,这里暂且不管,分析规则如下:
语句类型 | 处理规则 |
---|---|
alloca : l : v = & o l: v = \&o l:v=&o | tq l ( v ) = init , tq l ( o ) = uninit \text{tq}_l(v) = \text{init}, \; \text{tq}_l(o) = \text{uninit} tql(v)=init,tql(o)=uninit |
binary : l : v = v 1 ⊙ v 2 l: v = v_1 \odot v_2 l:v=v1⊙v2 | (1). v i ∈ rm ( v ) , rm ( v i ) ⊆ rm ( v ) ∣ tq l p ( v i ) = unk v_i \in \text{rm}(v), \; \text{rm}(v_i) \subseteq \text{rm}(v) \mid \text{tq}_{l_p}(v_i) = \text{unk} vi∈rm(v),rm(vi)⊆rm(v)∣tqlp(vi)=unk. (2). tq l ( v ) = ⊔ ( tq l p ( v i ) ) \text{tq}_l(v) = \sqcup (\text{tq}_{l_p}(v_i)) tql(v)=⊔(tqlp(vi)) |
load : l : v = ∗ p l: v = *p l:v=∗p | tq l ( v ) = ⊔ o ∈ pts l ( p ) tq l p ( o ) \text{tq}_l(v) = \sqcup_{o \in \text{pts}_l(p)} \text{tq}_{l_p}(o) tql(v)=⊔o∈ptsl(p)tqlp(o) |
store : l : ∗ p = q l: *p = q l:∗p=q | tq l ( p ) = init , ∀ o ∈ pts l ( p ) tq l ( o ) = tq l p ( q ) \text{tq}_l(p) = \text{init}, \; \forall_{o \in \text{pts}_l(p)} \text{tq}_l(o) = \text{tq}_{l_p}(q) tql(p)=init,∀o∈ptsl(p)tql(o)=tqlp(q) |
gep : l : v = p . f l: v = p.f l:v=p.f | tq l ( v ) = tq l p ( p ) \text{tq}_l(v) = \text{tq}_{l_p}(p) tql(v)=tqlp(p) |
cmp : l : r = cmp v 1 , v 2 l: r = \text{cmp} \; v_1, v_2 l:r=cmpv1,v2 | tq l ( r ) = ⊔ ( tq l p ( v i ) ) \text{tq}_l(r) = \sqcup (\text{tq}_{l_p}(v_i)) tql(r)=⊔(tqlp(vi)) |
copy : l : v = p l: v = p l:v=p | tq l ( v ) = tq l p ( p ) \text{tq}_l(v) = \text{tq}_{l_p}(p) tql(v)=tqlp(p) |
select : l : v = select c , v 1 , v 2 l: v = \text{select} \; c, v_1, v_2 l:v=selectc,v1,v2 | tq l ( v ) = ⊔ ( { tq l p ( c ) , tq l p ( v 1 ) , tq l p ( v 2 ) } ) \text{tq}_l(v) = \sqcup (\{\text{tq}_{l_p}(c), \text{tq}_{l_p}(v_1), \text{tq}_{l_p}(v_2)\}) tql(v)=⊔({tqlp(c),tqlp(v1),tqlp(v2)}) |
malloc call : l : v = & o l: v = \&o l:v=&o | tq l ( v ) = init , tq l ( o ) = uninit \text{tq}_l(v) = \text{init}, \; \text{tq}_l(o) = \text{uninit} tql(v)=init,tql(o)=uninit |
zalloc call : l : v = & o l: v = \&o l:v=&o | tq l ( v ) = init , tq l ( o ) = init \text{tq}_l(v) = \text{init}, \; \text{tq}_l(o) = \text{init} tql(v)=init,tql(o)=init |
init call : l : init(v, d, size) l: \text{init(v, d, size)} l:init(v, d, size) | ∀ o ∈ pts l ( v ) tq l ( o ) = init ∣ ( tq l ( v ) = = init & & tq l ( d ) = = init tq l ( size ) = = init ) \forall_{o \in \text{pts}_l(v)}\text{tq}_l(o) = \text{init} \mid (\text{tq}_l(v) == \text{init} \; \&\& \; \text{tq}_l(d) == \text{init} \; \; \text{tq}_l(\text{size}) == \text{init}) ∀o∈ptsl(v)tql(o)=init∣(tql(v)==init&&tql(d)==inittql(size)==init) |
copy/transfer call : l : copy(d, s, size) l: \text{copy(d, s, size)} l:copy(d, s, size) | ∀ o d ∈ pts l ( d ) ∀ o s ∈ pts l ( s ) tq l ( o d ) = tq l ( o d ) ⊔ tq l ( o s ) ∣ ( tq l ( d ) = = init & & tq l ( s ) = = init tq l ( size ) = = init ) \forall_{o_d \in \text{pts}_l(d)}\forall_{o_s \in \text{pts}_l(s)} \text{tq}_l(o_d) = \text{tq}_l(o_d) \sqcup \text{tq}_l(o_s) \mid (\text{tq}_l(d) == \text{init} \; \&\& \; \text{tq}_l(s) == \text{init} \; \; \text{tq}_l(\text{size}) == \text{init}) ∀od∈ptsl(d)∀os∈ptsl(s)tql(od)=tql(od)⊔tql(os)∣(tql(d)==init&&tql(s)==inittql(size)==init) |
漏洞检测的规则包括
语句类型 | 漏洞触发条件 |
---|---|
load : l : v = ∗ p l: v = *p l:v=∗p, gep : l : v = p . f l: v = p.f l:v=p.f | tq l ( p ) = = uninit \text{tq}_l(p) == \text{uninit} tql(p)==uninit |
store : l : ∗ p = q l: *p = q l:∗p=q | tq l ( q ) ≠ init or ∃ o ∈ pts l ( q ) tq l ( o ) ≠ init ∣ isHeap ( p ) \text{tq}_l(q) \neq \text{init} \; \text{or} \; \exist_{o \in \text{pts}_l(q)} \text{tq}_l(o) \neq \text{init} \mid \text{isHeap}(p) tql(q)=initor∃o∈ptsl(q)tql(o)=init∣isHeap(p) |
cmp : l : r = cmp v 1 , v 2 l: r = \text{cmp} \; v_1, v_2 l:r=cmpv1,v2 | tq l ( v 1 ) = = uninit ∣ ∣ tq l ( v 2 ) = = uninit \text{tq}_l(v_1) == \text{uninit} \; || \; \text{tq}_l(v_2) == \text{uninit} tql(v1)==uninit∣∣tql(v2)==uninit |
switch : l : switch (cond) { ... } l: \text{switch (cond) \{ ... \}} l:switch (cond) { ... } | tq l ( cond ) = = uninit \text{tq}_l(\text{cond}) == \text{uninit} tql(cond)==uninit |
copy call : l : copy(d, s, size) l: \text{copy(d, s, size)} l:copy(d, s, size) | tq l ( d ) = = uninit ∣ ∣ tq l ( s ) = = uninit ∣ ∣ tq l ( size ) = = uninit ∣ ∣ ∃ o ∈ pts l ( s ) tq l ( o ) = = uninit \text{tq}_l(d) == \text{uninit} \; || \; \text{tq}_l(s) == \text{uninit} \; || \; \text{tq}_l(\text{size}) == \text{uninit} \; || \; \exists_{o \in \text{pts}_l(s)} \text{tq}_l(o) == \text{uninit} tql(d)==uninit∣∣tql(s)==uninit∣∣tql(size)==uninit∣∣∃o∈ptsl(s)tql(o)==uninit |
transfer call : l : transfer(d, s, size) l: \text{transfer(d, s, size)} l:transfer(d, s, size) | tq l ( s ) = = uninit ∣ ∣ tq l ( size ) = = uninit ∣ ∣ ∃ o ∈ pts l ( s ) tq l ( o ) = = uninit \text{tq}_l(s) == \text{uninit} \; || \; \text{tq}_l(\text{size}) == \text{uninit} \; || \; \exists_{o \in \text{pts}_l(s)} \text{tq}_l(o) == \text{uninit} tql(s)==uninit∣∣tql(size)==uninit∣∣∃o∈ptsl(s)tql(o)==uninit |
参考文献
[1].Zhai Y, Hao Y, Zhang H, et al. UBITect: a precise and scalable method to detect use-before-initialization bugs in Linux kernel[C]//Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2020: 221-232.
[2].Jeffrey S. Foster, Tachio Terauchi, and Alex Aiken. 2002. Flow-sensitive type qualifiers. In Proceedings of the ACM SIGPLAN 2002 conference on Programming language design and implementation (PLDI '02). Association for Computing Machinery, New York, NY, USA, 1–12