初始化分析(Initialization Analysis)

写于2022年1月21日。

初始化分析主要计算出经传递后的Path读写、移动之处,变量的初始化、移动情况,并找出因移动而导致的错误(即试图访问被移动或部分移动的变量)。

1. 将路径的读写、移动情况传递至其子路径

所有路径(除入参外)在刚进入函数体内时,均视为已移动。

输入

path_moved_at   (Path, Point) :- path_moved_at_base   (Path, Point).
path_assigned_at(Path, Point) :- path_assigned_at_base(Path, Point).
path_accessed_at(Path, Point) :- path_accessed_at_base(Path, Point).

ancestor_path(ParentPath, ChildPath) :- child_path(ChildPath, ParentPath).

path_begin_with_var(Path, Variable) :- path_is_var(Path, Variable).

其中,path_begin_with_var代表某个Path是否属于某个变量(含递归),如a.ba.b.c都属于变量a

推导

  1. 传递ancestor_path
ancestor_path(GrandparentPath, ChildPath) :-
    ancestor_path(ParentPath, ChildPath),
    child_path(ParentPath, GrandparentPath).
  1. 移动、赋值、访问某一路径ParentPath时,也将移动、赋值、访问其子路径ChildPath
path_moved_at(ChildPath, Point) :-
    path_moved_at(ParentPath, Point),
    ancestor_path(ParentPath, ChildPath).

path_assigned_at(ChildPath, point) :-
    path_assigned_at(ParentPath, point),
    ancestor_path(ParentPath, ChildPath).

path_accessed_at(ChildPath, point) :-
    path_accessed_at(ParentPath, point),
    ancestor_path(ParentPath, ChildPath).
  1. 若某个路径Path属于某个变量Variable时,其子路径也属于该变量。
path_begins_with_var(Child, Variable) :-
    path_begins_with_var(Parent, Variable)
    ancestor_path(Parent, Child).

2. 计算变量初始化情况,并求解移动造成的错误

推导

  1. 对于每一个路径,从每个 赋值 处开始,沿CFG边向后追溯直到该路径被 移动 时,该路径都视为 已初始化 状态。
path_maybe_initialized_on_exit(Path, Point) :-s
    path_assigned_at(Path, Point).

path_maybe_initialized_on_exit(Path, TargetPoint) :-
    path_maybe_initialized_on_exit(Path, SourcePoint),
    cfg_edge(SourcePoint, TargetPoint),
    !path_moved_at(Path, TargetPoint).
  1. 对于每一个路径,从每个 移动 处开始,沿CFG边向后追溯直到该路径被 赋值 时,该路径都视为 未初始化 状态。
path_maybe_uninitialized_on_exit(Path, Point) :-
    path_moved_at(Path, Point).

path_maybe_uninitialized_on_exit(Path, TargetPoint) :-
    path_maybe_uninitialized_on_exit(Path, SourcePoint),
    cfg_edge(SourcePoint, TargetPoint)
    !path_assigned_at(Path, TargetPoint).
  1. 已初始化的路径,其对应的变量视为(半)初始化状态。
var_maybe_partly_initialized_on_exit(Variable, Point) :-
    path_maybe_initialized_on_exit(Path, Point),
    path_begins_with_var(Path, Variable).
  1. 计算 「移动后访问」错误 :路径Path在某处若未被始化,且在下一位置访问时,视位「移动后访问」的错误。
move_error(Path, TargetPoint) :-
    path_maybe_uninitialized_on_exit(Path, SourcePoint),
    cfg_edge(SourcePoint, TargetPoint),
    path_accessed_at(Path, TargetPoint).

输出

初始化分析之后,将输出两个信息用于后续分析

  • var_maybe_partly_initialized_on_exit(Variable, Point):在Point处,变量Variable(或其部分子路径)已初始化。将用于后续「存活性分析」阶段。
  • move_error(Path, Point):在Point处,路径Path发生了移动后访问错误。

思考

1. 为什么需要path_maybe_uninitialized_on_exitpath_maybe_initialized_on_exit两个式子,看似结论截然相反,而非用一个式子统一表示呢?

理论上是可行的,因为同一路径在同一位置不可能既被赋值又被移动。类似这样的例子

fn main() {
let mut a = String::new();
a = a;
} 

其中第2行的a = a看似对a既赋值又移动,但到MIR层会被转化为

// ...
let mut _1: std::string::String;
let mut _2: std::string::String;
scope 1 {
    debug a => _1;
}

bb0: { /* ... */ }

bb1: {
    // ...
    // ...
    _2 = move _1;
    replace(_1 <- move _2) -> [return: bb2, unwind: bb5];
}

bb2: { /* ... */ }
// ...

可见,被赋值的a与被移动的a分别对应了_1_2,实际上是先移动、再赋值。

如果我们引入path_mustbe_initialized_on_exit(Path, Point)代表在位置Point处,路径Path被完全初始化。

那么有

path_mustbe_initialized_on_exit(Path, Point) :-
    path_assigned_at(Path, Point).

!path_mustbe_initialized_on_exit(Path, Point) :-
    path_moved_at(Path, Point).

path_mustbe_initialized_on_exit(Path, TargetPoint) :-
    path_mustbe_initialized_on_exit(Path, SourcePoint),
    cfg_edge(SourcePoint, TargetPoint),
    !path_moved_at(Path, TargetPoint),
    !path_assigned_at(Path, TargetPoint).

但datalog的语法并不支持第二条推论的写法。因此,只能将path_mustbe_initialized_on_exit分裂为两个式子path_maybe_initialized_on_exitpath_maybe_uninitialized_on_exit,来实现相应的计算。

2. 如何理解var_maybe_partly_initialized_on_exit?既然不允许在变量移动后,对其部份字段赋值,为何会存在半初始化的变量?

这是因为允许只移动变量的部份字段。当部份字段被移动后,剩余字段仍处于已初始化状态,因此整个变量此时处于「半初始化」的状态。