Skip to content

Bug: log_class in CloudWatch agent config causes boot failure #5065

@trevjonez

Description

@trevjonez

I had opus help me track down my issue after updating tonight. Here is the bug report it generated for me. I don't love the tone of it but it has all the required info so I'll leave it as is. I have validated the suggested fix below in my deployment. I'll let you decide on if the alternative fix it offered is worth consideration. I don't see the harm in keeping it since a tfa is going to update the log group config and launch template all at once in most cases anyway?

Thanks!

Description

PR #5036 (feat(logging): add log_class parameter to runner log files configuration) added a log_class field to the logfiles local in modules/runners/logging.tf. This local is serialized via jsonencode(local.logfiles) into the CloudWatch agent configuration stored in SSM (cloudwatch_agent_config_runner).

The CloudWatch agent's JSON schema validation rejects log_class as an unknown property in collect_list entries, causing the agent to fail on startup. Since the runner's start-runner.sh treats CloudWatch agent failure as fatal (exit code 1), the instance self-terminates immediately.

Error

From EC2 console output during boot:

2026/03/12 03:06:41 E! Invalid Json input schema.
2026/03/12 03:06:41 Under path : /logs/logs_collected/files/collect_list/0 | Error : Additional property log_class is not allowed
2026/03/12 03:06:41 Under path : /logs/logs_collected/files/collect_list/1 | Error : Additional property log_class is not allowed
2026/03/12 03:06:41 Under path : /logs/logs_collected/files/collect_list/2 | Error : Additional property log_class is not allowed
2026/03/12 03:06:41 Under path : /logs/logs_collected/files/collect_list/3 | Error : Additional property log_class is not allowed
2026/03/12 03:06:41 E! configuration validation first phase failed. Agent version: 1.0. Verify the JSON input is only using features supported by this version
ERROR: runner-start-failed with exit code 1 occurred on 1

Root cause

The CloudWatch agent config schema does not have a log_class property. The correct property name is log_group_class, which accepts STANDARD or INFREQUENT_ACCESS.

The log_class field is used internally by the Terraform aws_cloudwatch_log_group resource via loggroups_classes, but the same logfiles local that feeds the agent config template also carries this field, leaking a Terraform-only attribute into the agent JSON.

Affected code

modules/runners/logging.tf:

logfiles = var.enable_cloudwatch_agent ? [for l in local.runner_log_files : {
    "log_group_name" : l.prefix_log_group ? "/github-self-hosted-runners/${var.prefix}/${l.log_group_name}" : "/${l.log_group_name}"
    "log_stream_name" : l.log_stream_name
    "file_path" : l.file_path
    "log_class" : l.log_class    # <-- this breaks the CloudWatch agent
  }] : []

This local is passed to the agent config template:

value = var.cloudwatch_config != null ? var.cloudwatch_config : templatefile("${path.module}/templates/cloudwatch_config.json", {
    logfiles = jsonencode(local.logfiles)
  })

Suggested fix

Rename log_class to log_group_class in the logfiles local so the CloudWatch agent accepts it and creates log groups with the intended class:

logfiles = var.enable_cloudwatch_agent ? [for l in local.runner_log_files : {
    "log_group_name" : ...
    "log_stream_name" : l.log_stream_name
    "file_path" : l.file_path
    "log_group_class" : l.log_class
  }] : []

Alternatively, exclude the field from logfiles entirely and rely on Terraform's aws_cloudwatch_log_group resource to set the log group class (via the separate loggroups_classes local that already exists).

Impact

Any deployment that applies version 7.5.0 with enable_cloudwatch_agent = true (the default) will have all runner instances fail to boot and immediately self-terminate.

Versions

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions