The influences relation
Kythe stores information about dataflow in the graph using
influences edges. A node A
influences a node B if A directly affects B during the evaluation
of a program. This relationship is not intended to model all dataflow
interactions. In particular, it excludes nontrivial aliasing (that can’t
be decided within the scope of a single compilation unit). Despite this,
the influence relation can be used to trace simple paths through a program
by exploring the subgraph iteratively, allowing for non-universal queries
that fit the following examples:
-
find places where a flag passed to a program directly affects a different program value
-
find places where the return value from a sensitive function is passed to an insensitive sink, like leaking a key to a logging function
-
identify the places that affect a program value that was assigned something unexpected at runtime
This document is intended to describe how to add the influences relationship
to a Kythe indexer based on common language features. Note that this document
is still under active development.
General framework
We assume that the indexer being amended performs an AST walk over a tree with resolved references (so it’s possible to construct VNames for, e.g., a variable that is referenced). The indexer should maintain a stack of sets of influencers (using whichever type is appopriate; for C++, we use pointers to declarations). As the indexer enters and leaves various kinds of AST node and encounters others, it will manipulate this stack.
Names follow this convention:
-
vis a simple program variable. -
pis a parameter. -
eis an expression.
Popping the set "onto" an object o with node n means removing the top
set from the influencers stack and recording i influences n edges for all
nodes i for the objects in that top set.
The blame context is the node or nodes that would be assigned responsibility for making a function call at the current point in the AST. For example:
int f() { g(); }
Here, the call to g would be assigned to blame context f.
Variables
Simple reference
v
-
Insert
v.
Simple assignment
v := e
-
Push
{}. -
Traverse
e. -
Pop onto
v.
A simple expression. (C++)
void f() {
//- @x defines/binding VarX
int x = 0;
//- @y defines/binding VarY
int y = 1;
//- // VarX influences VarY
y = x;
}
Functions
Simple calls
e(e1 ... eN)
-
Traverse
e. -
For each of
e1 ... eNwith corresponding parametersp1 ... pN: -
Push
{}. -
Traverse
eI. -
Pop onto
pI. -
Insert
e.
Returns
return e;
-
Push
{}. -
Traverse
e. -
Pop onto the blame context.
A function call. (C++)
//- @x defines/binding ParamX
//- @g defines/binding FnG
int g(int x) {
//- ParamX influences FnG
return x + 1;
}
void f() {
//- @y defines/binding VarY
int y = 0;
//- @z defines/binding VarZ
int z;
//- VarY influences ParamX
//- FnG influences VarZ
z = y + g(y);
}
Forward declarations
f'(p'1 ... p'N);
f(p1 ... pN) { ... }
-
p'IinfluencespI. -
finfluencesf'.
Forward declarations. (C++)
//- @f defines/binding FwdFnF
//- @x defines/binding FwdParamX
//- @y defines/binding FwdParamY
int f(int x, int y);
//- @f defines/binding DefnFnF
//- @x defines/binding DefnParamX
//- @y defines/binding DefnParamY
int f(int x, int y) {
return x + y;
}
//- FwdParamX influences DefnParamX
//- FwdParamY influences DefnParamY
//- DefnFnF influences FwdFnF