|
2 | 2 | * This file provides the second phase of the `cpp/invalid-pointer-deref` query that identifies flow |
3 | 3 | * from the out-of-bounds pointer identified by the `AllocationToInvalidPointer.qll` library to |
4 | 4 | * a dereference of the out-of-bounds pointer. |
| 5 | + * |
| 6 | + * Consider the following snippet: |
| 7 | + * ```cpp |
| 8 | + * 1. char* begin = (char*)malloc(size); |
| 9 | + * 2. char* end = begin + size; |
| 10 | + * 3. for(int *p = begin; p <= end; p++) { |
| 11 | + * 4. use(*p); |
| 12 | + * 5. } |
| 13 | + * ``` |
| 14 | + * this file identifies the flow from `base + size` to `end`. We call `base + size` the "dereference source" and `end` |
| 15 | + * the "dereference sink" (even though `end` is not actually dereferenced - it will be used to find the correct |
| 16 | + * dereference eventually). |
| 17 | + * |
| 18 | + * Merely _constructing_ a pointer that's out-of-bounds is fine if the pointer is never dereferenced (in reality, the |
| 19 | + * standard only guarentees that it's safe to move the pointer one element past the last element. But we ignore that |
| 20 | + * here). So this step is about identifying which of those out-of-bounds pointers identified from step 1 that are |
| 21 | + * actually being dereferenced. We do this using a regular dataflow configuration (see `InvalidPointerToDerefConfig`). |
| 22 | + * |
| 23 | + * This dataflow traversal defines the set of sources as any dataflow node that is non-strictly upper-bounded by the |
| 24 | + * pointer-arithmetic instruction identified by `AllocationToInvalidPointer.qll`. (TOOD: I'm pretty sure this is incorrect, |
| 25 | + * and we should define the set of sources as anything that is non-strictly _lower_ bounded by the pointer-arithmetic |
| 26 | + * instruction). That is, the set of sources is any dataflow node `source` such that `source.asInstruction <= pai + delta1` |
| 27 | + * for some `delta1 >= 0`. |
| 28 | + * |
| 29 | + * The set of sinks is defined to be any address operand `addr` that is non-strictly upper-bounded by the sink. That is, |
| 30 | + * any dataflow node `n` such that `addr <= sink.asInstruction() + delta2` for some `delta2`. We call the instruction that |
| 31 | + * consumes the address operand the "operation". |
| 32 | + * |
| 33 | + * For example, consider the flow from `begin + size` to `end` above. The sink is `end` on line 3 because that is a dataflow |
| 34 | + * node whose underlying instruction non-strictly upper bounds the address operand `p` in `use(*p)`. The load attached to `*p` |
| 35 | + * is the "operation". To ensure that the path makes intuitive sense, we only pick operations that are control-flow reachable |
| 36 | + * from the dereference sink. |
| 37 | + * |
| 38 | + * To compute the amount of the dereference is away from the final entry of the allocation, we sum the two deltas `delta1` and |
| 39 | + * `delta2`. This is done in the `operationIsOffBy` predicate (which is the only predicate exposed by this file). |
| 40 | + * |
| 41 | + * Handling false positives: |
| 42 | + * |
| 43 | + * Consider the following snippet: |
| 44 | + * ```cpp |
| 45 | + * 1. char *p = new char[size]; |
| 46 | + * 2. char *end = p + size; // $ alloc=L363 |
| 47 | + * 3. if (p < end) { |
| 48 | + * 4. p += 1; |
| 49 | + * 5. } |
| 50 | + * 6. if (p < end) { |
| 51 | + * 7. int val = *p; // GOOD |
| 52 | + * 8. } |
| 53 | + * ``` |
| 54 | + * this is safe because `p` is guarded to be strictly less than `end` on line 6 before the dereference on line 7. However, if we |
| 55 | + * run the query on the above without further modifications we'd see an alert on line 7. This is because range analysis infers |
| 56 | + * that `p <= end` after the increment on line 4, and thus the result of `p += 1` is seen as a valid dereference source. This |
| 57 | + * node then flows to `p` on line 6 (which is a valid dereference sink since it non-strictly upper bounds an address operand), and |
| 58 | + * range analysis then infers that the address operand of `*p` (i.e., `p`) is non-strictly upper bounded by `p`, and thus reports |
| 59 | + * an alert on line 7. |
| 60 | + * |
| 61 | + * In order to handle this false positive, we define a barrier that identifies guards such as `p < end` that ensures that a value |
| 62 | + * is less than the pointer-arithmetic instruction that computed the invalid pointer. This is done in the `InvalidPointerToDerefBarrier` |
| 63 | + * module. Since the node we're tracking isn't necessarily _equal_ to the pointer-arithmetic instruction, but rather satisfies |
| 64 | + * `node.asInstruction() <= pai + delta`, we need to account for the delta when checking if a guard is sufficiently strong to infer |
| 65 | + * that a future dereference is safe. To do this, we check that the guard guarantees that a node `n` satisfies `n < node + d` where |
| 66 | + * `node` is a node we know is equal to the value of the dereference source (i.e., it satisfies `node.asInstruction() <= pai + delta`) |
| 67 | + * and `d <= delta`. Combining this we have `n < node + d <= node + delta <= pai + 2*delta` (TODO: Oops. This math doesn't quite work |
| 68 | + * out. This is because we need to redefine the `BarrierConfig` to start flow at the pointer-arithmetic instruction instead of at the |
| 69 | + * dereference source. When combined with TODO above it's easy to show that this guard ensures that the dereference is safe). |
5 | 70 | */ |
6 | 71 |
|
7 | 72 | private import cpp |
|
0 commit comments