@@ -19,8 +19,76 @@ private import codeql.ruby.dataflow.internal.DataFlowDispatch as DataFlowDispatc
1919 */
2020module API {
2121 /**
22- * An abstract representation of a definition or use of an API component such as a Ruby module,
23- * or the result of a method call.
22+ * A node in the API graph, representing a value that has crossed the boundary between this
23+ * codebase and an external library (or in general, any external codebase).
24+ *
25+ * ### Basic usage
26+ *
27+ * API graphs are typically used to identify "API calls", that is, calls to an external function
28+ * whose implementation is not necessarily part of the current codebase.
29+ *
30+ * The most basic use of API graphs is typically as follows:
31+ * 1. Start with `API::getTopLevelMember` for the relevant library.
32+ * 2. Follow up with a chain of accessors such as `getMethod` describing how to get to the relevant API function.
33+ * 3. Map the resulting API graph nodes to data-flow nodes, using `asSource` or `asSink`.
34+ *
35+ * For example, a simplified way to get arguments to `Foo.bar` would be
36+ * ```codeql
37+ * API::getTopLevelMember("Foo").getMethod("bar").getParameter(0).asSink()
38+ * ```
39+ *
40+ * The most commonly used accessors are `getMember`, `getMethod`, `getParameter`, and `getReturn`.
41+ *
42+ * ### API graph nodes
43+ *
44+ * There are two kinds of nodes in the API graphs, distinguished by who is "holding" the value:
45+ * - **Use-nodes** represent values held by the current codebase, which came from an external library.
46+ * (The current codebase is "using" a value that came from the library).
47+ * - **Def-nodes** represent values held by the external library, which came from this codebase.
48+ * (The current codebase "defines" the value seen by the library).
49+ *
50+ * API graph nodes are associated with data-flow nodes in the current codebase.
51+ * (Since external libraries are not part of the database, there is no way to associate with concrete
52+ * data-flow nodes from the external library).
53+ * - **Use-nodes** are associated with data-flow nodes where a value enters the current codebase,
54+ * such as the return value of a call to an external function.
55+ * - **Def-nodes** are associated with data-flow nodes where a value leaves the current codebase,
56+ * such as an argument passed in a call to an external function.
57+ *
58+ *
59+ * ### Access paths and edge labels
60+ *
61+ * Nodes in the API graph are associated with a set of access paths, describing a series of operations
62+ * that may be performed to obtain that value.
63+ *
64+ * For example, the access path `API::getTopLevelMember("Foo").getMethod("bar")` represents the action of
65+ * reading the top-level constant `Foo` and then accessing the method `bar` on the resulting object.
66+ * It would be associated with a call such as `Foo.bar()`.
67+ *
68+ * Each edge in the graph is labelled by such an "operation". For an edge `A->B`, the type of the `A` node
69+ * determines who is performing the operation, and the type of the `B` node determines who ends up holding
70+ * the result:
71+ * - An edge starting from a use-node describes what the current codebase is doing to a value that
72+ * came from a library.
73+ * - An edge starting from a def-node describes what the external library might do to a value that
74+ * came from the current codebase.
75+ * - An edge ending in a use-node means the result ends up in the current codebase (at its associated data-flow node).
76+ * - An edge ending in a def-node means the result ends up in external code (its associated data-flow node is
77+ * the place where it was "last seen" in the current codebase before flowing out)
78+ *
79+ * Because the implementation of the external library is not visible, it is not known exactly what operations
80+ * it will perform on values that flow there. Instead, the edges starting from a def-node are operations that would
81+ * lead to an observable effect within the current codebase; without knowing for certain if the library will actually perform
82+ * those operations. (When constructing these edges, we assume the library is somewhat well-behaved).
83+ *
84+ * For example, given this snippet:
85+ * ```ruby
86+ * Foo.bar(->(x) { doSomething(x) })
87+ * ```
88+ * A callback is passed to the external function `Foo.bar`. We can't know if `Foo.bar` will actually invoke this callback.
89+ * But _if_ the library should decide to invoke the callback, then a value will flow into the current codebase via the `x` parameter.
90+ * For that reason, an edge is generated representing the argument-passing operation that might be performed by `Foo.bar`.
91+ * This edge is going from the def-node associated with the callback to the use-node associated with the parameter `x` of the lambda.
2492 */
2593 class Node extends Impl:: TApiNode {
2694 /**
0 commit comments