Skip to content

Latest commit

 

History

History
138 lines (106 loc) · 6.37 KB

File metadata and controls

138 lines (106 loc) · 6.37 KB

Database Logic

In this we translate the logic of the database fields and add details that we should know

There are some resources that can be used as reference:

Tree

A tree is a set of repos and branches, normally just one pair, but more are possible. Essentially, a tree is a copy of the code base maintained separately for a particular purpose.

Maestro maintains its own database of trees, where each is given a name (Checkouts.tree_name), those names are generally accepted and often recognized, but were chosen by Maestro (KernelCI Legacy before it) developers and maintainers over a period of time, and are not standardized, other CI systems are free to report whatever they want there. Most choose not to report anything in that field.

Checkouts table

A checkout is a record of a CI system checking out a particular revision from a particular git repository and branch. The json schema also says that checkouts "represents the way the tested source code was obtained and its original location. E.g. checking out a particular commit from a git repo, and applying a set of patches on top."

Test table

Tests are linked to a build

Important thing to look at the tests table is the path column, which is the type of test that it is performing. They will be split by dots, which means that it is a child test from that test. eg: boot.start is a child of boot

  • start_time is the time that a test has started, we can use that to graph X-Axis graphs of tests over time

  • status column show a status that can be PASS SKIP or ERROR

  • misc a JSON field that is used to store miscellaneous data about the test like archicture, compiler, runtime, kernel type, etc. However, it does not have a fixed structure, and therefore can contain any type of information the CI system finds useful to store. It's possible to note this phenomenon when comparing the misc field between origins. The misc field might have data related to errors, but's generally not used since it's unreliable

example of a typical misc field of Maestro:

{
  "arch": "x86_64",
  "runtime": "lava-collabora",
  "compiler": "gcc-12",
  "kernel_type": "bzimage"
}
  • environment_misc a JSON field that we can use for things about the environment like detecting the platform, this is not set in stone and can change, so we should double check the validation. On Maestro it's expected that platform should be a field inside the environment_misc JSON.
{
  "job_id": "14750408",
  "platform": "hp-x360-14a-cb0001xx-zork"
}

NOTE: The contents of the columns misc and environment_misc are not standardized, and shouldn't be relied upon to ever be used consistently by more than one CI system ("origin"), or even by the originating CI system itself, over time. The most it could be used for is a proof-of-concept.

  • environment_compatible contains a sorted list of strings starting with the exact name of the machine, followed by an optional list of boards it is compatible with sorted from most compatible to least.

NOTE: The concept of "Hardware" used througout the dashboard also includes the platform found in the environment_misc column. Up to this point, the rule is to prioritize platform over environment_compatible since, at least on Maestro, platform provides a more informative representation of the hardware used during the test/boot/build. This, however, does not apply to the Hardware Listing and Details pages that still prioritize the environment_compatible over platform

For more information about environment_compatible these resources can be used as a reference:

  • output_files is a JSON field that has an array of objects, each of them having data about the artifacts generated by the test. The Builds table also has an input_files column that respects the same structure
[
   {
      "url":"https://storage.kernelci.org/broonie-sound/for-next/v6.11-rc3-218-gc76d5dfbfc97/arm64/defconfig/gcc-12/lab-broonie/lava-meson-sm1-s905d3-libretech-cc.json",
      "name":"lava_json"
   },
   {
      "url":"https://storage.kernelci.org/broonie-sound/for-next/v6.11-rc3-218-gc76d5dfbfc97/arm64/defconfig/gcc-12/lab-broonie/kselftest-alsa-meson-sm1-s905d3-libretech-cc.txt",
      "name":"txt"
   },
   {
      "url":"https://storage.kernelci.org/broonie-sound/for-next/v6.11-rc3-218-gc76d5dfbfc97/arm64/defconfig/gcc-12/lab-broonie/kselftest-alsa-meson-sm1-s905d3-libretech-cc.html",
      "name":"html"
   }
]

Revision

Revision has no respective table, but it is a collection of checkouts with the same git_commit_hash and patchset_hash. But patchset_hash is not commonly used these days.

Indexes

Some indexes were defined to improve the database performance, they were chosen based on if the field is used on query filters. Fields that are only selected or only filtered in the Python layer don't need to be indexed, and the excess of indexes can impact insertion performance.

For most fields, a default b-tree index is used; for array fields, a GIN index is used since it can handle containment checks. One expression index was also created for the test platform, which lives inside the environment_misc JSON field.

Django also creates some indexes automatically:

  • Index for primary key as b-tree;
  • Index for primary key as b-tree with text_pattern_ops, specialized for LIKE operations.
  • Index for foreign key as b-tree;
  • Index for foreign key as b-tree with text_pattern_ops, specialized for LIKE operations.

One of the operations that we are doing benefits from this double indexes, which is the filtering of dummy builds (builds where id LIKE maestro:_dummy%).

There is only one index that diverges from model to database, which is tests_origin_time_platform. This is because we want an index on the json field environment_misc ->> 'platform' -- with ->> -- but Django only uses -> (check https://docs.djangoproject.com/en/5.2/topics/db/queries/#module-django.db.models.fields.json). The divergence is created in the migration where this index is added.