当前位置: 首页 > news >正文

未初始化引用检测工具UBITech

UBITech

  • 1.Introduction
  • 2.Qualifier Inference And Check
  • 参考文献

artifact地址

1.Introduction

作者实现了一个未初始化变量引用(use-before initialization, UBI)检测工具,比如下面代码中,当 cpg->eng_st != ENGINE_IDLE 时,backlog 不一定为空指针但可能未初始化,同时 backlog->complete 也可能指向任意地址。作者发现很多UBI bug需要path-sensitive的分析才能精确找到。为了在效率很精度取得平衡,作者结合flow-sensitive type qualifier inference(参考文献 [ 2 ] ^{[2]} [2])和符号执行(klee)实现UBITech,整个工具的workflow包括:

  • 1.调用分析: 包括 (1).间接调用分析,根据paper描述推断采用的是FLTA算法,也就是纯类型匹配,不过实现上提供了一个data-flow分析方法、(2).构造caller-callee依赖树、(3).分析递归调用链。

  • 2.type qualifier inference: 这是一个flow- & field- & context-sensitive的跨函数分析。

  • 3.under-constrained符号执行:针对一个UBI report(包含变量allocate处以及use处),作者使用klee尝试从allocate处找到一个到use处的可行路径,没有的话则过滤掉。由于采用的只是partial execution,符号执行的精度会比full execution时候差,也会导致一些误报没被过滤掉。

static int queue_manag(void *data) {/* backlog is defined without initialization */struct crypto_async_request *backlog;if (cpg->eng_st == ENGINE_IDLE)backlog = crypto_get_backlog(&cpg->queue);if (backlog)/* uninitialized pointer dereferenced! */backlog->complete(backlog, -EINPROGRESS);return 0;
}

Call Graph Construction

提供了两种间接调用分析策略:简单的data-flow分析以及保守的类型匹配。data-flow分析规则如下所示,这里 pt ( v ) \text{pt}(v) pt(v) 表示value或者指令 v v v 引用的函数集合,store, load, ret 直接用指令的id而不是对应value进行标识。这个data-flow分析还是比较简单,没有考虑 gep 指令也没有考虑别名情况,感觉比类型匹配会多出不少误报。

语句类型规则
constant: v = f v = f v=f (这里value直接是函数) f ∈ pt ( v ) f \in \text{pt}(v) fpt(v)
copy (包括 cast): l : v = v 1 l: v = v_1 l:v=v1 pt ( v 1 ) ⊆ pt ( v ) \text{pt}(v_1) \subseteq \text{pt}(v) pt(v1)pt(v)
phi l : v = ϕ ( v 1 , v 2 , . . . , v n ) l: v = \phi(v_1, v_2, ..., v_n) l:v=ϕ(v1,v2,...,vn) pt ( v ) = ∪ 1 ≤ i ≤ n pt ( v i ) \text{pt}(v) = \cup_{1 \leq i \leq n} \text{pt}(v_i) pt(v)=1inpt(vi)
select l : v = v 1 ⊙ v 2 l: v = v_1 \odot v_2 l:v=v1v2 (中间表示 select 操作) pt ( v ) = pt ( v 1 ) ∪ pt ( v 2 ) \text{pt}(v) = \text{pt}(v_1) \cup \text{pt}(v_2) pt(v)=pt(v1)pt(v2)
binary l : v = v 1 ⊙ v 2 l: v = v_1 \odot v_2 l:v=v1v2, 只考虑 v 1 v_1 v1 或者 v 2 v_2 v2 为常量的情况 pt ( v ) = pt ( v 1 / v 2 ) \text{pt}(v) = \text{pt}(v_1/v_2) pt(v)=pt(v1/v2)
ret l : ret v l: \text{ret} \; v l:retv pt ( v ) ⊆ pt ( l ) \text{pt}(v) \subseteq \text{pt}(l) pt(v)pt(l)
call l : v = f ( . . . ) l: v = f(...) l:v=f(...) pt ( ret ( f ) ) ⊆ pt ( v ) \text{pt}(\text{ret}(f)) \subseteq \text{pt}(v) pt(ret(f))pt(v)
load l : v = ∗ p l: v = *p l:v=p pt ( l ) ⊆ pt ( v ) \text{pt}(l) \subseteq \text{pt}(v) pt(l)pt(v)
store l : ∗ p = v l: *p = v l:p=v pt ( v ) ⊆ pt ( l ) \text{pt}(v) \subseteq \text{pt}(l) pt(v)pt(l)

Type Qualifier Inference

type qualifier inference是静态分析部分的核心,其架构如下图所示,首先会以bottom-up的方式遍历调用图,为每个函数生成function summary并将summary应用到caller分析过程。而一个函数的summary生成过程涉及point-to analysis、alias analysis和qualifier inference 3个部分。这3个部分都是基于worklist data flow分析(参考blog))算法实现。

在这里插入图片描述
point-to analysis是基于一个函数内的flow-sensitive指针分析算法,这里的fact是pts集,FuncAnalysis::nPtsGraph保存每个Instruction Out处的pts集,这里UBITech会先将LLVM的 Value 映射为 id 然后用 bitvec 进行运算。这里的指针分析不同的倒是在于 store 的语句处理: l : ∗ p = q l: *p = q l:p=q,中有 ∀ o ∈ pts ( p ) pts ( o ) = pts ( q ) \forall_{o \in \text{pts}(p)} \text{pts}(o) = \text{pts}(q) opts(p)pts(o)=pts(q),也就是直接进行strong update。然后就是call语句的处理,只考虑 mallocmemcpy 处理;对于 malloc 调用会按照SVF的方式创建heap object,对于 l: memcpy(p, q) 如果 p, q 都是指针,那么 pts ( q ) ⊆ pts ( p ) \text{pts}(q) \subseteq \text{pts}(p) pts(q)pts(p)

alias analysis同样基于flow-sensitive的worklist data flow分析算法。每个指令对应的fact为别名集合,类型为 map<unsigned, set<unsigned>>,也就是将 Value v v v 映射到具有别名关系的 Value 集合。具体规则如下,这里 aa l ( v ) \text{aa}_l(v) aal(v) 表示 v v v 在语句 l l l 处的别名集合, aa l p \text{aa}_{l_p} aalp 表示 l l l 的In处的别名集合,可能是前一条指令的别名集合也可能是多个直接前驱指令别名集合的并集。copy 包括 bitcast, zext, sext, trunc;对于 ptrtoint,如果有 z = inttoptr(y), y = ptrtoint(x) 或者 y = ptrtoint(x) + offset,那么存在 copy: z = x z = x z=x。update语句的AA set前会先将 aa l p ⊆ aa l \text{aa}_{l_p} \subseteq \text{aa}_l aalpaalstore 则会直接进行strong update。gep 语句的处理则很保守,只考虑偏移为0的情况,其它情况则直接将 aa l ( v ) = v \text{aa}_l(v) = v aal(v)=v,kill掉其它值。这个别名分析理论上并不sound,不过或许engineer方面好用。

语句update规则
alloca: l : v = & o l: v = \&o l:v=&o v ∈ aa l ( v ) v \in \text{aa}_l(v) vaal(v)
copy: l : v = p l: v = p l:v=p v ∈ aa l ( v ) , aa l p ( p ) ⊆ aa l ( v ) v \in \text{aa}_l(v), \; \text{aa}_{l_p}(p) \subseteq \text{aa}_l(v) vaal(v),aalp(p)aal(v)
phi: l : v = ϕ ( v 1 , . . . , v n ) l: v = \phi(v_1, ..., v_n) l:v=ϕ(v1,...,vn) v ∈ aa l ( v ) , ∀ 1 ≤ i ≤ n ( v i ∈ aa l ( v ) , aa l p ( v i ) ⊆ aa l ( v ) ) v \in \text{aa}_l(v), \; \forall_{1 \leq i \leq n}(v_i \in \text{aa}_l(v), \; \text{aa}_{l_p}(v_i) \subseteq \text{aa}_l(v)) vaal(v),1in(viaal(v),aalp(vi)aal(v))
select: l : v = v 1 ⊙ v 2 l: v = v_1 \odot v_2 l:v=v1v2 { v , v 1 , v 2 } ⊆ aa l ( v ) \{v, v_1, v_2\} \subseteq \text{aa}_l(v) {v,v1,v2}aal(v)
load: l : v = ∗ p l:v = *p l:v=p v ∈ aa l ( v ) , ∀ o ∈ pts ( p ) aa l p ( o ) ⊆ aa l ( v ) v \in \text{aa}_l(v), \; \forall_{o \in \text{pts}(p)} \text{aa}_{l_p}(o) \subseteq \text{aa}_l(v) vaal(v),opts(p)aalp(o)aal(v)
store: l : ∗ v = p l: *v = p l:v=p ∀ o ∈ pts ( v ) aa l ( o ) = aa l p ( p ) \forall_{o \in \text{pts}(v)} \text{aa}_l(o) = \text{aa}_{l_p}(p) opts(v)aal(o)=aalp(p)
gep: l : v = p . f l: v = p.f l:v=p.f aa l ( v ) = { v , p } ∣ f = 0 \text{aa}_l(v) = \{v, p\} \mid f = 0 aal(v)={v,p}f=0
otherNon

2.Qualifier Inference And Check

Qualify Inference将每个value(top-level pointer和address-taken variable)映射到一个type qualifier。qualifier包括 { init , uninit , unk } \{ \text{init}, \text{uninit}, \text{unk}\} {init,uninit,unk}(偏序关系从左到右依次增加,也就是 init ⊔ uninit = uninit \text{init} \sqcup \text{uninit} = \text{uninit} inituninit=uninit,不过对于 unk \text{unk} unk 通常会重新计算值而不是参与偏序运算),初始时每个value的状态为 unk \text{unk} unk。用 tq l ( v ) \text{tq}_l(v) tql(v) 表示value v v v l l l 处的type qualifier。除了type qualifier外还定义了一个辅助用的 related map rm \text{rm} rm,保存与value v v v 相关的其它值。系统调用UBITech对 malloc, zalloc, memset (init 类), memcpy (copy 类), copy_to_user (transfer 类) 等均进行了建模,这里暂且不管,分析规则如下:

语句类型处理规则
alloca: l : v = & o l: v = \&o l:v=&o tq l ( v ) = init , tq l ( o ) = uninit \text{tq}_l(v) = \text{init}, \; \text{tq}_l(o) = \text{uninit} tql(v)=init,tql(o)=uninit
binary: l : v = v 1 ⊙ v 2 l: v = v_1 \odot v_2 l:v=v1v2(1). v i ∈ rm ( v ) , rm ( v i ) ⊆ rm ( v ) ∣ tq l p ( v i ) = unk v_i \in \text{rm}(v), \; \text{rm}(v_i) \subseteq \text{rm}(v) \mid \text{tq}_{l_p}(v_i) = \text{unk} virm(v),rm(vi)rm(v)tqlp(vi)=unk. (2). tq l ( v ) = ⊔ ( tq l p ( v i ) ) \text{tq}_l(v) = \sqcup (\text{tq}_{l_p}(v_i)) tql(v)=(tqlp(vi))
load: l : v = ∗ p l: v = *p l:v=p tq l ( v ) = ⊔ o ∈ pts l ( p ) tq l p ( o ) \text{tq}_l(v) = \sqcup_{o \in \text{pts}_l(p)} \text{tq}_{l_p}(o) tql(v)=optsl(p)tqlp(o)
store: l : ∗ p = q l: *p = q l:p=q tq l ( p ) = init , ∀ o ∈ pts l ( p ) tq l ( o ) = tq l p ( q ) \text{tq}_l(p) = \text{init}, \; \forall_{o \in \text{pts}_l(p)} \text{tq}_l(o) = \text{tq}_{l_p}(q) tql(p)=init,optsl(p)tql(o)=tqlp(q)
gep: l : v = p . f l: v = p.f l:v=p.f tq l ( v ) = tq l p ( p ) \text{tq}_l(v) = \text{tq}_{l_p}(p) tql(v)=tqlp(p)
cmp: l : r = cmp v 1 , v 2 l: r = \text{cmp} \; v_1, v_2 l:r=cmpv1,v2 tq l ( r ) = ⊔ ( tq l p ( v i ) ) \text{tq}_l(r) = \sqcup (\text{tq}_{l_p}(v_i)) tql(r)=(tqlp(vi))
copy: l : v = p l: v = p l:v=p tq l ( v ) = tq l p ( p ) \text{tq}_l(v) = \text{tq}_{l_p}(p) tql(v)=tqlp(p)
select: l : v = select c , v 1 , v 2 l: v = \text{select} \; c, v_1, v_2 l:v=selectc,v1,v2 tq l ( v ) = ⊔ ( { tq l p ( c ) , tq l p ( v 1 ) , tq l p ( v 2 ) } ) \text{tq}_l(v) = \sqcup (\{\text{tq}_{l_p}(c), \text{tq}_{l_p}(v_1), \text{tq}_{l_p}(v_2)\}) tql(v)=({tqlp(c),tqlp(v1),tqlp(v2)})
malloc call: l : v = & o l: v = \&o l:v=&o tq l ( v ) = init , tq l ( o ) = uninit \text{tq}_l(v) = \text{init}, \; \text{tq}_l(o) = \text{uninit} tql(v)=init,tql(o)=uninit
zalloc call: l : v = & o l: v = \&o l:v=&o tq l ( v ) = init , tq l ( o ) = init \text{tq}_l(v) = \text{init}, \; \text{tq}_l(o) = \text{init} tql(v)=init,tql(o)=init
init call: l : init(v, d, size) l: \text{init(v, d, size)} l:init(v, d, size) ∀ o ∈ pts l ( v ) tq l ( o ) = init ∣ ( tq l ( v ) = = init & & tq l ( d ) = = init tq l ( size ) = = init ) \forall_{o \in \text{pts}_l(v)}\text{tq}_l(o) = \text{init} \mid (\text{tq}_l(v) == \text{init} \; \&\& \; \text{tq}_l(d) == \text{init} \; \; \text{tq}_l(\text{size}) == \text{init}) optsl(v)tql(o)=init(tql(v)==init&&tql(d)==inittql(size)==init)
copy/transfer call: l : copy(d, s, size) l: \text{copy(d, s, size)} l:copy(d, s, size) ∀ o d ∈ pts l ( d ) ∀ o s ∈ pts l ( s ) tq l ( o d ) = tq l ( o d ) ⊔ tq l ( o s ) ∣ ( tq l ( d ) = = init & & tq l ( s ) = = init tq l ( size ) = = init ) \forall_{o_d \in \text{pts}_l(d)}\forall_{o_s \in \text{pts}_l(s)} \text{tq}_l(o_d) = \text{tq}_l(o_d) \sqcup \text{tq}_l(o_s) \mid (\text{tq}_l(d) == \text{init} \; \&\& \; \text{tq}_l(s) == \text{init} \; \; \text{tq}_l(\text{size}) == \text{init}) odptsl(d)osptsl(s)tql(od)=tql(od)tql(os)(tql(d)==init&&tql(s)==inittql(size)==init)

漏洞检测的规则包括

语句类型漏洞触发条件
load: l : v = ∗ p l: v = *p l:v=p, gep: l : v = p . f l: v = p.f l:v=p.f tq l ( p ) = = uninit \text{tq}_l(p) == \text{uninit} tql(p)==uninit
store: l : ∗ p = q l: *p = q l:p=q tq l ( q ) ≠ init or ∃ o ∈ pts l ( q ) tq l ( o ) ≠ init ∣ isHeap ( p ) \text{tq}_l(q) \neq \text{init} \; \text{or} \; \exist_{o \in \text{pts}_l(q)} \text{tq}_l(o) \neq \text{init} \mid \text{isHeap}(p) tql(q)=initoroptsl(q)tql(o)=initisHeap(p)
cmp: l : r = cmp v 1 , v 2 l: r = \text{cmp} \; v_1, v_2 l:r=cmpv1,v2 tq l ( v 1 ) = = uninit ∣ ∣ tq l ( v 2 ) = = uninit \text{tq}_l(v_1) == \text{uninit} \; || \; \text{tq}_l(v_2) == \text{uninit} tql(v1)==uninit∣∣tql(v2)==uninit
switch: l : switch (cond) { ... } l: \text{switch (cond) \{ ... \}} l:switch (cond) { ... } tq l ( cond ) = = uninit \text{tq}_l(\text{cond}) == \text{uninit} tql(cond)==uninit
copy call: l : copy(d, s, size) l: \text{copy(d, s, size)} l:copy(d, s, size) tq l ( d ) = = uninit ∣ ∣ tq l ( s ) = = uninit ∣ ∣ tq l ( size ) = = uninit ∣ ∣ ∃ o ∈ pts l ( s ) tq l ( o ) = = uninit \text{tq}_l(d) == \text{uninit} \; || \; \text{tq}_l(s) == \text{uninit} \; || \; \text{tq}_l(\text{size}) == \text{uninit} \; || \; \exists_{o \in \text{pts}_l(s)} \text{tq}_l(o) == \text{uninit} tql(d)==uninit∣∣tql(s)==uninit∣∣tql(size)==uninit∣∣optsl(s)tql(o)==uninit
transfer call: l : transfer(d, s, size) l: \text{transfer(d, s, size)} l:transfer(d, s, size) tq l ( s ) = = uninit ∣ ∣ tq l ( size ) = = uninit ∣ ∣ ∃ o ∈ pts l ( s ) tq l ( o ) = = uninit \text{tq}_l(s) == \text{uninit} \; || \; \text{tq}_l(\text{size}) == \text{uninit} \; || \; \exists_{o \in \text{pts}_l(s)} \text{tq}_l(o) == \text{uninit} tql(s)==uninit∣∣tql(size)==uninit∣∣optsl(s)tql(o)==uninit

参考文献

[1].Zhai Y, Hao Y, Zhang H, et al. UBITect: a precise and scalable method to detect use-before-initialization bugs in Linux kernel[C]//Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2020: 221-232.

[2].Jeffrey S. Foster, Tachio Terauchi, and Alex Aiken. 2002. Flow-sensitive type qualifiers. In Proceedings of the ACM SIGPLAN 2002 conference on Programming language design and implementation (PLDI '02). Association for Computing Machinery, New York, NY, USA, 1–12


http://www.mrgr.cn/news/95611.html

相关文章:

  • 架构思维:如何设计一个支持海量数据存储的高扩展性架构
  • 快速入手:Nacos融合SpringCloud成为注册配置中心
  • kotlin知识体系(三) : Android Kotlin 中的函数式编程实践指南
  • 通往自主智能之路:探索自我成长的AI
  • UDP套接字编程(代码)
  • SpringMVC_day02
  • 分布式系统设计陷阱,白话CAP理论
  • 运行时智控:PanLang 开发者指南(一)运行时系统核心模块实现——PanLang 原型全栈设计方案与实验性探索5
  • nacos未经授权创建用户漏洞
  • 快速入手-基于Django的Form和ModelForm操作(七)
  • SAP-ABAP:SAP BW模块架构与实战应用详解
  • 网心云OEC/OEC-turbo刷机问题——刷机教程、救砖方法、技术要点及下载boot失败异常解决尝试
  • 银河麒麟桌面版包管理器(二)
  • 【Linux】线程库
  • 重温Ubuntu 24.04 LTS
  • 麒麟Win32运行环境
  • PyTorch 面试题及参考答案(精选100道)
  • 图解AUTOSAR_DefaultErrorTracer
  • 本地部署Dify 添加Ollama模型DeepSeek
  • 大模型提示词工程师的自我修养-提示技巧二(思维树、RAG检索增强生成) -(专题2)