Skip to content

Commit

Permalink
Add field Projection/Injection metadata in Lineage graph
Browse files Browse the repository at this point in the history
Summary:
The ongoing work on interprocedural analysis aims at improving dataflow precision by removing
spurious Derive edges generated on function calls, based on a better understanding of fields
concerned by these calls. It relies on the fact that it is sufficient to follow Derive edges only,
not Call-then-Return paths, when looking for data flows -- in other words, the Call and Return edges
remain largely unaffected by this work, for various technical and theoretical reasons.

Yet this assumption is not true when mixing in flows imported from another analysis such as a dynamic
one: some actual flows may exist that will go though Call/Return edges (of the same function) and do
not have a corresponding Derive edge. This jeopardises the precision improvement expected from
interprocedural work.

To alleviate this, we add some shape information into those edges, by annotating them with the fields
that are projected from or injected into some node. Injection is generated on Call edges and on edges
that have a special Return node as destination, meaning that the source only flows into the
corresponding field of the destination. Projection is generated on edges from a formal parameter and
on Return edges, meaning that only the corresponding field of the source flows into the destination.

It becomes then possible to filter out impossible paths by removing those that involves an edge that
inject into some field `f`, then only does some Copying, then projects from another field `f'` --
similar to what can already be done on Call stacks to prevent "returning" from a function that wasn't
called in the first place. Note that, until further understanding, Derive edges between injection and
projection should "reset" this tracking (as they could themselves involve a projection of some sort),
and nested fields are to be understood in a "prefix" sense: injecting into `#foo#bar` then projecting
from `#foo` doesn't break the chain either.

*Implementation details.*

Within the internal analysis, we add two knew new kinds of edges, `Project` and `Inject`, and also
add field information on the `Call` and `Return` edges. The former are generated to complete the
summary of the "current" procedure (replacing some Copies when formals fields are involved), whereas
the latter keep being generated upon calling another procedure.

The Json output however retains its old edge kinds, but they may now have an additional
`edge_metadata` component. When present, its value will be either `{ "projection" : ["nested";
"field"; "list"] }` or `{ "injection" : ... }`. The current diff will not generate "trivial"
projections or injections, that is with an empty field list. The schema doesn't forbid an edge that
both projects and injects, but that will not currently happen. If the metadata is trivial, il will
not be present at all on the edge entity.

Reviewed By: rgrig

Differential Revision: D43401524

fbshipit-source-id: 12bd5470920868964c23a9252ebfbd8259e57f55
  • Loading branch information
thizanne authored and facebook-github-bot committed Feb 23, 2023
1 parent 699c30d commit c16e1cc
Showing 1 changed file with 145 additions and 34 deletions.
179 changes: 145 additions & 34 deletions infer/src/checkers/SimpleLineage.ml
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,18 @@ end
module Fields = struct
(** A module to help manipulating lists of (nested) fields. *)

(** We define a type alias to be able use our own json exporting function *)
type fieldname = Fieldname.t [@@deriving compare, equal, sexp]

let yojson_of_fieldname fieldname =
(* The ppx-generated [yojson_of] is unnecessarily complex for our purposes (it's a record that
involves "class_name", which incidentally seems to always be "_".) *)
`String (Fieldname.to_string fieldname)


(* The list is to be understood in "syntactic order": [["a"; "b"]] represents the field part of
[X#a#b].*)
type t = Fieldname.t list [@@deriving compare, equal, sexp]
type t = fieldname list [@@deriving compare, equal, sexp, yojson_of]

let pp = Fmt.(list ~sep:nop (any "#" ++ Fieldname.pp))
end
Expand Down Expand Up @@ -181,8 +190,10 @@ module LineageGraph = struct
- INV2: There is no loop, because they don't mean anything. *)
type t =
| Direct (** Immediate copy; e.g., assigment or passing an argument *)
| Call (** Target is ArgumentOf *)
| Return (** Source is ReturnOf *)
| Inject of Fields.t
| Project of Fields.t
| Call of Fields.t (** Target is ArgumentOf, fields are injected into arg *)
| Return of Fields.t (** Source is ReturnOf, fields are projected from return *)
| Capture (** [X=1, F=fun()->X end] has Capture flow from X to F *)
| Summary (** Summarizes the effect of a procedure call *)
| DynamicCallFunction
Expand Down Expand Up @@ -253,10 +264,14 @@ module LineageGraph = struct
Format.fprintf fmt "Capture"
| Direct ->
Format.fprintf fmt "Direct"
| Call ->
Format.fprintf fmt "Call"
| Return ->
Format.fprintf fmt "Return"
| Call fields ->
Format.fprintf fmt "Call%a" Fields.pp fields
| Inject fields ->
Format.fprintf fmt "Inject%a" Fields.pp fields
| Project fields ->
Format.fprintf fmt "Project%a" Fields.pp fields
| Return fields ->
Format.fprintf fmt "Return%a" Fields.pp fields
| Summary ->
Format.fprintf fmt "Summary"
| DynamicCallFunction ->
Expand Down Expand Up @@ -291,13 +306,13 @@ module LineageGraph = struct
graph (* skip Summary loops*)
| _, true, _, _ ->
L.die InternalError "There shall be no fancy (%a) loops!" pp_flow_kind kind
| Call, _, ArgumentOf _, _ ->
| Call _, _, ArgumentOf _, _ ->
added
| Call, _, _, _ ->
| Call _, _, _, _ ->
L.die InternalError "Call edges shall return ArgumentOf!"
| Return, _, _, ReturnOf _ ->
| Return _, _, _, ReturnOf _ ->
added
| Return, _, _, _ ->
| Return _, _, _, _ ->
L.die InternalError "Return edges shall come form ReturnOf!"
| _ ->
added
Expand Down Expand Up @@ -382,6 +397,15 @@ module LineageGraph = struct
| DynamicCallModule
| Return

type edge_metadata =
{ inject: Fields.t [@default []] [@yojson_drop_default.equal]
; project: Fields.t [@default []] [@yojson_drop_default.equal] }
[@@deriving yojson_of]

let metadata_inject fields = {inject= fields; project= []}

let metadata_project fields = {inject= []; project= fields}

let yojson_of_edge_type typ =
match typ with
| Capture ->
Expand All @@ -400,7 +424,12 @@ module LineageGraph = struct
`String "Return"


type _edge = {source: node_id; target: node_id; edge_type: edge_type; location: location_id}
type _edge =
{ source: node_id
; target: node_id
; edge_type: edge_type
; edge_metadata: edge_metadata option [@yojson.option]
; location: location_id }
[@@deriving yojson_of]

type edge = {edge: _edge} [@@deriving yojson_of]
Expand Down Expand Up @@ -647,27 +676,44 @@ module LineageGraph = struct
let edge_id = Id.of_list [source_id; target_id; kind_id; location_id] in
let edge_type =
match kind with
| Call ->
| Call _ ->
Json.Call
| Capture ->
Json.Capture
| Direct ->
Json.Copy
| Inject _ ->
Json.Copy
| Project _ ->
Json.Copy
| Summary ->
Json.Derive
| DynamicCallFunction ->
Json.DynamicCallFunction
| DynamicCallModule ->
Json.DynamicCallModule
| Return ->
| Return _ ->
Json.Return
in
let edge_metadata =
match kind with
| Direct | Capture | Summary | DynamicCallFunction | DynamicCallModule ->
None
| Call [] | Return [] ->
(* Don't generate "trivial" metadata *)
None
| Call fields | Inject fields ->
Some (Json.metadata_inject fields)
| Return fields | Project fields ->
Some (Json.metadata_project fields)
in
write_json Edge edge_id
(Json.yojson_of_edge
{ edge=
{ source= Id.out source_id
; target= Id.out target_id
; edge_type
; edge_metadata
; location= Id.out location_id } } ) ;
edge_id

Expand Down Expand Up @@ -845,7 +891,14 @@ module Summary = struct
match kind with
| Direct ->
false
| Call | Return | Capture | Summary | DynamicCallFunction | DynamicCallModule ->
| Call _
| Return _
| Capture
| Summary
| DynamicCallFunction
| DynamicCallModule
| Inject _
| Project _ ->
true
in
(* The set [todo] contains vertices considered for removal. We initialize this set with all
Expand Down Expand Up @@ -993,6 +1046,25 @@ module TransferFunctions = struct
end
end

(** If an expression is made of a single variable, return it *)
let exp_as_single_var (e : Exp.t) : Var.t option =
match e with
| Exp.Lvar pvar ->
Some (Var.of_pvar pvar)
| Exp.Var id ->
Some (Var.of_id id)
| Exp.Lfield (_, _, _)
| Exp.UnOp (_, _, _)
| Exp.BinOp (_, _, _)
| Exp.Exn _
| Exp.Closure _
| Exp.Const _
| Exp.Cast (_, _)
| Exp.Lindex (_, _)
| Exp.Sizeof _ ->
None


(** If an expression is made of a single variable index, return it *)
let exp_as_single_var_index (e : Exp.t) : _ VariableIndex.t option =
let rec aux fields_acc = function
Expand Down Expand Up @@ -1124,21 +1196,46 @@ module TransferFunctions = struct
List.fold ~init:astate ~f:one_exp (Sil.exps_of_instr instr)


let warn_on_complex_arg arg_exp =
L.debug Analysis Verbose
"SimpleLineage: the analysis assumes that the frontend uses only single-var expressions as \
actual arguments, otherwise it will lose precision (found actual argument `%a` instead).@;"
Exp.pp arg_exp


(* Add Call flow from the concrete arguments to the special ArgumentOf nodes *)
let add_arg_flows shapes (call_node : PPNode.t) (callee_pname : Procname.t)
(argument_list : Exp.t list) (astate : Domain.t) : Domain.t =
let add_one_arg_flows index ((last_writes, has_unsupported_features), local_edges) arg =
let add_one_source_edge local_edges source =
LineageGraph.add_flow ~kind:Call ~node:call_node ~source
let add_one_source_edge local_edges source ~injected_arg_field =
LineageGraph.add_flow ~kind:(Call injected_arg_field) ~node:call_node ~source
~target:(ArgumentOf (index, callee_pname))
local_edges
in
let add_one_local_flows local_edges (local : Local.t) =
let add_one_local_flows local_edges (local : Local.t) ~injected_arg_field =
let sources = sources_of_local call_node last_writes local in
List.fold sources ~f:add_one_source_edge ~init:local_edges
List.fold sources ~f:(add_one_source_edge ~injected_arg_field) ~init:local_edges
in
let local_edges =
match exp_as_single_var arg with
| None ->
(* The Erlang frontend probably doesn't generate non-single-var arguments. If that
happens however, we may simply collect all the locals from the actual argument and
have them flow on the full formal parameter (that is inject into the empty list of
nested fields) *)
warn_on_complex_arg arg ;
let read_set = free_locals_of_exp shapes arg in
Set.fold read_set ~init:local_edges ~f:(add_one_local_flows ~injected_arg_field:[])
| Some arg_var ->
(* The concrete argument is a single var: we collect all its terminal fields and have
them flow onto the corresponding field of the formal parameter. *)
let arg_as_index = VariableIndex.var arg_var in
VariableIndex.fold_terminal shapes arg_as_index
~f:(fun acc arg_terminal ->
add_one_local_flows acc (VariableIndex arg_terminal)
~injected_arg_field:(VariableIndex.get_fields arg_terminal) )
~init:local_edges
in
let read_set = free_locals_of_exp shapes arg in
let local_edges = Set.fold read_set ~init:local_edges ~f:add_one_local_flows in
((last_writes, has_unsupported_features), local_edges)
in
List.foldi argument_list ~init:astate ~f:add_one_arg_flows
Expand All @@ -1148,9 +1245,12 @@ module TransferFunctions = struct
let add_ret_flows shapes node (callee_pname : Procname.t) (ret_id : Ident.t) (astate : Domain.t) :
Domain.t =
let last_writes, local_edges = astate in
let add_one_return_edge local_edges target_index =
LineageGraph.add_flow ~kind:Return ~node ~source:(ReturnOf callee_pname)
~target:(Local (VariableIndex target_index, node))
let add_one_return_edge local_edges ret_terminal_index =
(* Add flow from the formal return to a terminal subfield of the actual ret_id: project the
return into the corresponding fields. *)
let target_field = VariableIndex.get_fields ret_terminal_index in
LineageGraph.add_flow ~kind:(Return target_field) ~node ~source:(ReturnOf callee_pname)
~target:(Local (VariableIndex ret_terminal_index, node))
local_edges
in
let local_edges =
Expand Down Expand Up @@ -1245,11 +1345,7 @@ module TransferFunctions = struct
(* The Erlang frontend probably doesn't generate arguments that are not put in
intermediate variables first, but in any case we can just add edges from all
occurring locals and soundly not use the TITO information. *)
L.debug Analysis Verbose
"SimpleLineage: the analysis assumes that the frontend uses only single-var \
expressions as actual arguments, otherwise it will lose precision (found actual \
argument `%a` instead).@;"
Exp.pp arg ;
warn_on_complex_arg arg ;
Local.Set.union acc (free_locals_of_exp shapes arg)
| Some var_index ->
Tito.FieldSet.fold (collect_one_index var_index) (Tito.get index tito_arguments) acc
Expand Down Expand Up @@ -1438,21 +1534,27 @@ let unskipped_checker ({InterproceduralAnalysis.proc_desc} as analysis) shapes_o
List.fold ~init:Domain.Real.LastWrites.bottom ~f:add_arg (captured @ formals)
in
let local_edges =
let add_proc_start_flow ~source local_edges var_index =
LineageGraph.add_flow ~kind:Direct ~node:start_node ~source
let add_proc_start_flow ~source ~kind local_edges var_index =
LineageGraph.add_flow ~kind ~node:start_node ~source
~target:(Local (VariableIndex var_index, start_node))
local_edges
in
let arg_edge_kind arg_fields : LineageGraph.FlowKind.t =
(* Record that edges from fields of a formal parameter will represent a Projection of those
fields from the eventual summarised Argument node *)
match arg_fields with [] -> Direct | subfields -> Project subfields
in
let add_arg_flow i local_edges arg_var =
VariableIndex.fold_terminal
~f:(fun acc idx ->
let fields = VariableIndex.get_fields idx in
add_proc_start_flow ~source:(Argument (i, fields)) acc idx )
add_proc_start_flow ~kind:(arg_edge_kind fields) ~source:(Argument (i, fields)) acc idx
)
~init:local_edges shapes (VariableIndex.var arg_var)
in
let add_cap_flow i local_edges var =
VariableIndex.fold_terminal
~f:(add_proc_start_flow ~source:(Captured i))
~f:(add_proc_start_flow ~kind:Direct ~source:(Captured i))
~init:local_edges shapes (VariableIndex.var var)
in
let local_edges = [] in
Expand All @@ -1477,8 +1579,17 @@ let unskipped_checker ({InterproceduralAnalysis.proc_desc} as analysis) shapes_o
post_edges :: edges
in
let ret_var = VariableIndex.pvar (Procdesc.get_ret_var proc_desc) in
let ret_edge_kind into_ret_index : LineageGraph.FlowKind.t =
(* Record that edges into fields of the formal return will represent an Injection of those
fields into the eventual summarised Return node *)
match VariableIndex.get_fields into_ret_index with
| [] ->
Direct
| subfields ->
Inject subfields
in
let add_one_ret_index_edge local_edges ret_index ret_node =
LineageGraph.add_flow ~kind:Direct ~node:ret_node
LineageGraph.add_flow ~kind:(ret_edge_kind ret_index) ~node:ret_node
~source:(Local (VariableIndex ret_index, ret_node))
~target:Return local_edges
in
Expand Down

0 comments on commit c16e1cc

Please sign in to comment.