初始化分析(Initialization Analysis)
写于2022年1月21日。
初始化分析主要计算出经传递后的Path
读写、移动之处,变量的初始化、移动情况,并找出因移动而导致的错误(即试图访问被移动或部分移动的变量)。
1. 将路径的读写、移动情况传递至其子路径
所有路径(除入参外)在刚进入函数体内时,均视为已移动。
输入
path_moved_at (Path, Point) :- path_moved_at_base (Path, Point).
path_assigned_at(Path, Point) :- path_assigned_at_base(Path, Point).
path_accessed_at(Path, Point) :- path_accessed_at_base(Path, Point).
ancestor_path(ParentPath, ChildPath) :- child_path(ChildPath, ParentPath).
path_begin_with_var(Path, Variable) :- path_is_var(Path, Variable).
其中,path_begin_with_var
代表某个Path
是否属于某个变量(含递归),如a.b
、a.b.c
都属于变量a
。
推导
- 传递
ancestor_path
。
ancestor_path(GrandparentPath, ChildPath) :-
ancestor_path(ParentPath, ChildPath),
child_path(ParentPath, GrandparentPath).
- 移动、赋值、访问某一路径
ParentPath
时,也将移动、赋值、访问其子路径ChildPath
。
path_moved_at(ChildPath, Point) :-
path_moved_at(ParentPath, Point),
ancestor_path(ParentPath, ChildPath).
path_assigned_at(ChildPath, point) :-
path_assigned_at(ParentPath, point),
ancestor_path(ParentPath, ChildPath).
path_accessed_at(ChildPath, point) :-
path_accessed_at(ParentPath, point),
ancestor_path(ParentPath, ChildPath).
- 若某个路径
Path
属于某个变量Variable
时,其子路径也属于该变量。
path_begins_with_var(Child, Variable) :-
path_begins_with_var(Parent, Variable)
ancestor_path(Parent, Child).
2. 计算变量初始化情况,并求解移动造成的错误
推导
- 对于每一个路径,从每个 赋值 处开始,沿CFG边向后追溯直到该路径被 移动 时,该路径都视为 已初始化 状态。
path_maybe_initialized_on_exit(Path, Point) :-s
path_assigned_at(Path, Point).
path_maybe_initialized_on_exit(Path, TargetPoint) :-
path_maybe_initialized_on_exit(Path, SourcePoint),
cfg_edge(SourcePoint, TargetPoint),
!path_moved_at(Path, TargetPoint).
- 对于每一个路径,从每个 移动 处开始,沿CFG边向后追溯直到该路径被 赋值 时,该路径都视为 未初始化 状态。
path_maybe_uninitialized_on_exit(Path, Point) :-
path_moved_at(Path, Point).
path_maybe_uninitialized_on_exit(Path, TargetPoint) :-
path_maybe_uninitialized_on_exit(Path, SourcePoint),
cfg_edge(SourcePoint, TargetPoint)
!path_assigned_at(Path, TargetPoint).
- 已初始化的路径,其对应的变量视为(半)初始化状态。
var_maybe_partly_initialized_on_exit(Variable, Point) :-
path_maybe_initialized_on_exit(Path, Point),
path_begins_with_var(Path, Variable).
- 计算 「移动后访问」错误 :路径
Path
在某处若未被始化,且在下一位置访问时,视位「移动后访问」的错误。
move_error(Path, TargetPoint) :-
path_maybe_uninitialized_on_exit(Path, SourcePoint),
cfg_edge(SourcePoint, TargetPoint),
path_accessed_at(Path, TargetPoint).
输出
初始化分析之后,将输出两个信息用于后续分析
var_maybe_partly_initialized_on_exit(Variable, Point)
:在Point
处,变量Variable
(或其部分子路径)已初始化。将用于后续「存活性分析」阶段。move_error(Path, Point)
:在Point
处,路径Path
发生了移动后访问错误。
思考
1. 为什么需要path_maybe_uninitialized_on_exit
与path_maybe_initialized_on_exit
两个式子,看似结论截然相反,而非用一个式子统一表示呢?
理论上是可行的,因为同一路径在同一位置不可能既被赋值又被移动。类似这样的例子
fn main() { let mut a = String::new(); a = a; }
其中第2行的a = a
看似对a
既赋值又移动,但到MIR层会被转化为
// ...
let mut _1: std::string::String;
let mut _2: std::string::String;
scope 1 {
debug a => _1;
}
bb0: { /* ... */ }
bb1: {
// ...
// ...
_2 = move _1;
replace(_1 <- move _2) -> [return: bb2, unwind: bb5];
}
bb2: { /* ... */ }
// ...
可见,被赋值的a
与被移动的a
分别对应了_1
与_2
,实际上是先移动、再赋值。
如果我们引入path_mustbe_initialized_on_exit(Path, Point)
代表在位置Point
处,路径Path
被完全初始化。
那么有
path_mustbe_initialized_on_exit(Path, Point) :-
path_assigned_at(Path, Point).
!path_mustbe_initialized_on_exit(Path, Point) :-
path_moved_at(Path, Point).
path_mustbe_initialized_on_exit(Path, TargetPoint) :-
path_mustbe_initialized_on_exit(Path, SourcePoint),
cfg_edge(SourcePoint, TargetPoint),
!path_moved_at(Path, TargetPoint),
!path_assigned_at(Path, TargetPoint).
但datalog的语法并不支持第二条推论的写法。因此,只能将path_mustbe_initialized_on_exit
分裂为两个式子path_maybe_initialized_on_exit
与path_maybe_uninitialized_on_exit
,来实现相应的计算。
2. 如何理解var_maybe_partly_initialized_on_exit
?既然不允许在变量移动后,对其部份字段赋值,为何会存在半初始化的变量?
这是因为允许只移动变量的部份字段。当部份字段被移动后,剩余字段仍处于已初始化状态,因此整个变量此时处于「半初始化」的状态。