Skip to content

Commit a567dba

Browse files
authored
Merge pull request #2823 from felicitymay/2166-python-pre-migration-tasks
CodeQL: Python topics (2166) - WIP
2 parents 5a7a3f7 + f1238f1 commit a567dba

9 files changed

Lines changed: 159 additions & 158 deletions

docs/language/learn-ql/python/control-flow-graph.rst

Lines changed: 0 additions & 9 deletions
This file was deleted.

docs/language/learn-ql/python/control-flow.rst

Lines changed: 32 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,12 @@
1-
Tutorial: Control flow analysis
2-
===============================
1+
Analyzing control flow in Python
2+
================================
33

4-
To analyze the `Control-flow graph <http://en.wikipedia.org/wiki/Control_flow_graph>`__ of a ``Scope`` we can use the two CodeQL classes ``ControlFlowNode`` and ``BasicBlock``. These classes allow you to ask such questions as "can you reach point A from point B?" or "Is it possible to reach point B *without* going through point A?". To report results we use the class ``AstNode``, which represents a syntactic element and corresponds to the source code - allowing the results of the query to be more easily understood.
4+
You can write CodeQL queries to explore the control-flow graph of a Python program, for example, to discover unreachable code or mutually exclusive blocks of code.
5+
6+
About analyzing control flow
7+
--------------------------------------
8+
9+
To analyze the control-flow graph of a ``Scope`` we can use the two CodeQL classes ``ControlFlowNode`` and ``BasicBlock``. These classes allow you to ask such questions as "can you reach point A from point B?" or "Is it possible to reach point B *without* going through point A?". To report results we use the class ``AstNode``, which represents a syntactic element and corresponds to the source code - allowing the results of the query to be more easily understood. For more information, see `Control-flow graph <http://en.wikipedia.org/wiki/Control_flow_graph>`__ on Wikipedia.
510

611
The ``ControlFlowNode`` class
712
-----------------------------
@@ -19,11 +24,18 @@ To show why this complex relation is required consider the following Python code
1924
finally:
2025
close_resource()
2126
22-
There are many paths through the above code. There are three different paths through the call to ``close_resource();`` one normal path, one path that breaks out of the loop, and one path where an exception is raised by ``might_raise()``. (An annotated flow graph can be seen :doc:`here <control-flow-graph>`.)
27+
There are many paths through the above code. There are three different paths through the call to ``close_resource();`` one normal path, one path that breaks out of the loop, and one path where an exception is raised by ``might_raise()``.
28+
29+
An annotated flow graph:
30+
31+
|Python control flow graph|
32+
33+
.. |Python control flow graph| image:: ../../images/python-flow-graph.png
2334

2435
The simplest use of the ``ControlFlowNode`` and ``AstNode`` classes is to find unreachable code. There is one ``ControlFlowNode`` per path through any ``AstNode`` and any ``AstNode`` that is unreachable has no paths flowing through it. Therefore, any ``AstNode`` without a corresponding ``ControlFlowNode`` is unreachable.
2536

26-
**Unreachable AST nodes**
37+
Example finding unreachable AST nodes
38+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2739

2840
.. code-block:: ql
2941
@@ -33,9 +45,10 @@ The simplest use of the ``ControlFlowNode`` and ``AstNode`` classes is to find u
3345
where not exists(node.getAFlowNode())
3446
select node
3547
36-
➤ `See this in the query console <https://lgtm.com/query/669220024/>`__. The demo projects on LGTM.com all have some code that has no control flow node, and is therefore unreachable. However, since the ``Module`` class is also a subclass of the ``AstNode`` class, the query also finds any modules implemented in C or with no source code. Therefore, it is better to find all unreachable statements:
48+
➤ `See this in the query console <https://lgtm.com/query/669220024/>`__. The demo projects on LGTM.com all have some code that has no control flow node, and is therefore unreachable. However, since the ``Module`` class is also a subclass of the ``AstNode`` class, the query also finds any modules implemented in C or with no source code. Therefore, it is better to find all unreachable statements.
3749

38-
**Unreachable statements**
50+
Example finding unreachable statements
51+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3952

4053
.. code-block:: ql
4154
@@ -45,15 +58,15 @@ The simplest use of the ``ControlFlowNode`` and ``AstNode`` classes is to find u
4558
where not exists(s.getAFlowNode())
4659
select s
4760
48-
➤ `See this in the query console <https://lgtm.com/query/670720181/>`__. This query gives fewer results, but most of the projects have some unreachable nodes. These are also highlighted by the standard query: `Unreachable code <https://lgtm.com/rules/3980095>`__.
61+
➤ `See this in the query console <https://lgtm.com/query/670720181/>`__. This query gives fewer results, but most of the projects have some unreachable nodes. These are also highlighted by the standard "Unreachable code" query. For more information, see `Unreachable code <https://lgtm.com/rules/3980095>`__ on LGTM.com.
4962

5063
The ``BasicBlock`` class
5164
------------------------
5265

53-
The ``BasicBlock`` class represents a `basic block <http://en.wikipedia.org/wiki/Basic_block>`__ of control flow nodes. The ``BasicBlock`` class is not that useful for writing queries directly, but is very useful for building complex analyses, such as data flow. The reason it is useful is that it shares many of the interesting properties of control flow nodes, such as what can reach what and what `dominates <http://en.wikipedia.org/wiki/Dominator_%28graph_theory%29>`__ what, but there are fewer basic blocks than control flow nodes - resulting in queries that are faster and use less memory.
66+
The ``BasicBlock`` class represents a basic block of control flow nodes. The ``BasicBlock`` class is not that useful for writing queries directly, but is very useful for building complex analyses, such as data flow. The reason it is useful is that it shares many of the interesting properties of control flow nodes, such as, what can reach what, and what dominates what, but there are fewer basic blocks than control flow nodes - resulting in queries that are faster and use less memory. For more information, see `Basic block <http://en.wikipedia.org/wiki/Basic_block>`__ and `Dominator <http://en.wikipedia.org/wiki/Dominator_%28graph_theory%29>`__ on Wikipedia.
5467

55-
Example: Finding mutually exclusive basic blocks
56-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
68+
Example finding mutually exclusive basic blocks
69+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
5770

5871
Suppose we have the following Python code:
5972

@@ -84,7 +97,8 @@ However, by that definition, two basic blocks are mutually exclusive if they are
8497
8598
Combining these conditions we get:
8699

87-
**Mutually exclusive blocks within the same function**
100+
Example finding mutually exclusive blocks within the same function
101+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
88102

89103
.. code-block:: ql
90104
@@ -98,10 +112,11 @@ Combining these conditions we get:
98112
)
99113
select b1, b2
100114
101-
➤ `See this in the query console <https://lgtm.com/query/671000028/>`__. This typically gives a very large number of results, because it is a common occurrence in normal control flow. It is, however, an example of the sort of control-flow analysis that is possible. Control-flow analyses such as this are an important aid to data flow analysis which is covered in the next tutorial.
115+
➤ `See this in the query console <https://lgtm.com/query/671000028/>`__. This typically gives a very large number of results, because it is a common occurrence in normal control flow. It is, however, an example of the sort of control-flow analysis that is possible. Control-flow analyses such as this are an important aid to data flow analysis. For more information, see :doc:`Analyzing data flow and tracking tainted data in Python <taint-tracking>`.
116+
117+
Further reading
118+
---------------
102119

103-
What next?
104-
----------
120+
- ":doc:`Analyzing data flow and tracking tainted data in Python <taint-tracking>`"
105121

106-
- Experiment with the worked examples in the tutorial topic :doc:`Taint tracking and data flow analysis in Python <taint-tracking>`.
107-
- Find out more about QL in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__ and `QL language specification <https://help.semmle.com/QL/ql-spec/language.html>`__.
122+
.. include:: ../../reusables/python-other-resources.rst

docs/language/learn-ql/python/functions.rst

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
1-
Tutorial: Functions
1+
Functions in Python
22
===================
33

4-
This example uses the standard CodeQL class ``Function`` (see :doc:`Introducing the Python libraries <introduce-libraries-python>`).
4+
You can use syntactic classes from the standard CodeQL library to find Python functions and identify calls to them.
5+
6+
These examples use the standard CodeQL class `Function <https://help.semmle.com/qldoc/python/semmle/python/Function.qll/type.Function$Function.html>`__. For more information, see ":doc:`Introducing the Python libraries <introduce-libraries-python>`."
57

68
Finding all functions called "get..."
79
-------------------------------------
@@ -55,7 +57,7 @@ We can modify the query further to include only methods whose body consists of a
5557
and count(f.getAStmt()) = 1
5658
select f, "This function is (probably) a getter."
5759
58-
➤ `See this in the query console <https://lgtm.com/query/667290044/>`__. This query returns fewer results, but if you examine the results you can see that there are still refinements to be made. This is refined further in :doc:`Tutorial: Statements and expressions <statements-expressions>`.
60+
➤ `See this in the query console <https://lgtm.com/query/667290044/>`__. This query returns fewer results, but if you examine the results you can see that there are still refinements to be made. This is refined further in ":doc:`Expressions and statements in Python <statements-expressions>`."
5961

6062
Finding a call to a specific function
6163
-------------------------------------
@@ -76,8 +78,12 @@ The ``Call`` class represents calls in Python. The ``Call.getFunc()`` predicate
7678
Due to the dynamic nature of Python, this query will select any call of the form ``eval(...)`` regardless of whether it is a call to the built-in function ``eval`` or not.
7779
In a later tutorial we will see how to use the type-inference library to find calls to the built-in function ``eval`` regardless of name of the variable called.
7880

79-
What next?
80-
----------
81+
Further reading
82+
---------------
83+
84+
- ":doc:`Expressions and statements in Python <statements-expressions>`"
85+
- ":doc:`Pointer analysis and type inference in Python <pointsto-type-infer>`"
86+
- ":doc:`Analyzing control flow in Python <control-flow>`"
87+
- ":doc:`Analyzing data flow and tracking tainted data in Python <taint-tracking>`"
8188

82-
- Experiment with the worked examples in the following tutorial topics: :doc:`Statements and expressions <statements-expressions>`, :doc:`Control flow <control-flow>`, and :doc:`Points-to analysis and type inference <pointsto-type-infer>`.
83-
- Find out more about QL in the `QL language handbook <https://help.semmle.com/QL/ql-handbook/index.html>`__ and `QL language specification <https://help.semmle.com/QL/ql-spec/language.html>`__.
89+
.. include:: ../../reusables/python-other-resources.rst

0 commit comments

Comments
 (0)