diff --git a/README.md b/README.md index c3c2d2f..2efb2a9 100644 --- a/README.md +++ b/README.md @@ -90,7 +90,7 @@ Common unsupported-pattern buckets include `sshd_connection_closed_preauth`, `pam_faillock_account_locked`, and `pam_unix_session_closed`. These buckets keep non-finding evidence reviewable without counting it as detector evidence. -For the parser behavior contract, supported modes, and fixture map, see [`docs/parser-contract.md`](./docs/parser-contract.md). +For the parser behavior contract, supported modes, and fixture map, see [`docs/parser-contract.md`](./docs/parser-contract.md). For the deliberately noisy parser-coverage sample, see [`docs/parser-coverage-notes.md`](./docs/parser-coverage-notes.md). LogLens does not currently detect: diff --git a/assets/noisy_auth_sample.log b/assets/noisy_auth_sample.log new file mode 100644 index 0000000..2ab2091 --- /dev/null +++ b/assets/noisy_auth_sample.log @@ -0,0 +1,27 @@ +Mar 10 25:61:00 alpha-host sshd[9001]: Failed password for invalid user bad-clock from 203.0.113.1 port 50000 ssh2 +Feb 31 08:00:02 alpha-host sshd[9002]: Failed password for invalid user bad-date from 203.0.113.2 port 50001 ssh2 + + +Mar 10 08:00:10 alpha-host sshd[1001]: Failed password for invalid user svc+deploy from 203.0.113.10 port 50100 ssh2 +Mar 10 08:00:20 beta-host sshd[1002]: Accepted password for ops.robot from 203.0.113.11 port 50101 ssh2 +Mar 10 08:00:30 gamma-host sudo: svc-admin : TTY=pts/1 ; PWD=/home/user/project ; USER=root ; COMMAND=/usr/bin/systemctl status ssh +Mar 10 08:00:40 alpha-host sudo[1004]: limited.user : user NOT in sudoers ; TTY=pts/2 ; PWD=/home/user/project ; USER=root ; COMMAND=/usr/bin/id +Mar 10 08:00:50 beta-host sudo[1005]: blocked-user : command not allowed ; TTY=pts/3 ; PWD=/home/user/project ; USER=root ; COMMAND=/usr/bin/less /etc/hosts +Mar 10 08:01:00 beta-host pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost= +Mar 10 08:01:10 gamma-host pam_faillock(sshd:auth): Authentication failure for user svc.locked from 203.0.113.13 +Mar 10 08:01:20 delta-host sshd[1008]: input_userauth_request: invalid user weird/user [preauth] +Mar 10 08:01:30 alpha-host sshd[1009]: Connection closed by authenticating user legacy-user 203.0.113.20 port 50200 [preauth] +Mar 10 08:01:40 beta-host sshd[1010]: Connection reset by invalid user trial.user 203.0.113.21 port 50201 [preauth] +Mar 10 08:01:50 gamma-host sshd[1011]: Received disconnect from 203.0.113.22 port 50202:11: disconnected by user +Mar 10 08:02:00 delta-host sshd[1012]: Timeout, client not responding from 203.0.113.23 port 50203 +Mar 10 08:02:10 alpha-host sshd[1013]: Unable to negotiate with 203.0.113.24 port 50204: no matching key exchange method found +Mar 10 08:02:20 beta-host sshd[1014]: Unable to negotiate with 203.0.113.25 port 50205: no matching host key type found +Mar 10 08:02:30 gamma-host pam_unix(sshd:session): session closed for user svc+deploy +Mar 10 08:02:40 delta-host pam_unix(sudo:session): session closed for user svc-admin +Mar 10 08:02:50 alpha-host pam_faillock(sshd:auth): Account temporarily locked for user locked.user +Mar 10 08:03:00 beta-host pam_faillock(sshd:auth): Account temporarily locked for user svc-temp +Mar 10 08:03:10 gamma-host pam_sss(sshd:auth): received for user unknown.shadow: 10 (User not known to the underlying authentication module) +Mar 10 08:03:20 beta-host sudo: pam_unix(sudo:auth): authentication failure; user=limited.user + +rotated +Mar 10 08:03:30 alpha-host CRON[2001]: (root) CMD (/usr/bin/true) diff --git a/docs/parser-contract.md b/docs/parser-contract.md index b97dc2d..2069a8a 100644 --- a/docs/parser-contract.md +++ b/docs/parser-contract.md @@ -79,6 +79,7 @@ Parsed successes and audit-only events remain reportable but do not count as bru | [`assets/parser_fixture_matrix_journalctl_short_full.log`](../assets/parser_fixture_matrix_journalctl_short_full.log) | Journalctl short-full known/unknown parser matrix | | [`assets/parser_auth_families_syslog.log`](../assets/parser_auth_families_syslog.log) | Syslog PAM/auth-family parser coverage | | [`assets/parser_auth_families_journalctl_short_full.log`](../assets/parser_auth_families_journalctl_short_full.log) | Journalctl PAM/auth-family parser coverage | +| [`assets/noisy_auth_sample.log`](../assets/noisy_auth_sample.log) and [`tests/fixtures/parser_matrix/noisy_auth_expected.json`](../tests/fixtures/parser_matrix/noisy_auth_expected.json) | Noisy syslog parser-coverage matrix for malformed, unsupported, blank, irrelevant, multi-host, and unusual-username input | | [`tests/test_report_contracts.cpp`](../tests/test_report_contracts.cpp) | Stable report-shape expectations for generated artifacts | ## Non-goals diff --git a/docs/parser-coverage-notes.md b/docs/parser-coverage-notes.md new file mode 100644 index 0000000..677ef29 --- /dev/null +++ b/docs/parser-coverage-notes.md @@ -0,0 +1,29 @@ +# Parser coverage notes + +LogLens parser coverage is intentionally visible. Noisy logs should produce a useful coverage shape instead of a quiet success claim. + +## Noisy auth matrix + +[`assets/noisy_auth_sample.log`](../assets/noisy_auth_sample.log) is a sanitized `syslog_legacy` sample for reviewer inspection. It mixes recognized authentication evidence with common log noise: + +- malformed timestamp evidence +- unsupported but bucketed `sshd` preauth, disconnect, and negotiation lines +- partial PAM evidence that is either lower-confidence parsed evidence or telemetry-only warning evidence +- sudo denial variants that still become typed audit events +- empty, blank, rotated, and irrelevant service lines +- multiple hosts and intentionally unusual synthetic usernames + +The locked expected coverage summary lives in [`tests/fixtures/parser_matrix/noisy_auth_expected.json`](../tests/fixtures/parser_matrix/noisy_auth_expected.json). It focuses on parser quality fields rather than detector findings: + +- `total_input_lines`: 27 +- `skipped_blank_lines`: 3 +- `parsed_lines`: 8 +- `unparsed_lines`: 16 +- `parse_success_rate`: 0.3333333333 +- `top_unknown_patterns`: the five most common unsupported-pattern buckets + +## Reading the numbers + +A low parse success rate is not automatically a bug for this fixture. The sample is deliberately noisy, and the useful property is that unsupported evidence remains explainable through `warnings` and `top_unknown_patterns`. + +The matrix should stay defensive and public-safe: use documentation IP ranges, synthetic hostnames, and synthetic usernames only. diff --git a/tests/fixtures/parser_matrix/noisy_auth_expected.json b/tests/fixtures/parser_matrix/noisy_auth_expected.json new file mode 100644 index 0000000..b08344c --- /dev/null +++ b/tests/fixtures/parser_matrix/noisy_auth_expected.json @@ -0,0 +1,38 @@ +{ + "fixture": "assets/noisy_auth_sample.log", + "input_mode": "syslog_legacy", + "assume_year": 2026, + "total_input_lines": 27, + "total_lines": 24, + "skipped_blank_lines": 3, + "parsed_lines": 8, + "unparsed_lines": 16, + "parse_success_rate": 0.3333333333, + "parsed_event_count": 8, + "warning_count": 16, + "top_unknown_patterns": [ + {"pattern": "pam_faillock_account_locked", "count": 2}, + {"pattern": "pam_unix_session_closed", "count": 2}, + {"pattern": "sshd_connection_closed_preauth", "count": 2}, + {"pattern": "sshd_negotiation_failure", "count": 2}, + {"pattern": "sshd_timeout_or_disconnection", "count": 2} + ], + "warnings": [ + {"line_number": 1, "reason": "invalid time token"}, + {"line_number": 2, "reason": "invalid calendar date"}, + {"line_number": 13, "reason": "unrecognized auth pattern: sshd_connection_closed_preauth"}, + {"line_number": 14, "reason": "unrecognized auth pattern: sshd_connection_closed_preauth"}, + {"line_number": 15, "reason": "unrecognized auth pattern: sshd_timeout_or_disconnection"}, + {"line_number": 16, "reason": "unrecognized auth pattern: sshd_timeout_or_disconnection"}, + {"line_number": 17, "reason": "unrecognized auth pattern: sshd_negotiation_failure"}, + {"line_number": 18, "reason": "unrecognized auth pattern: sshd_negotiation_failure"}, + {"line_number": 19, "reason": "unrecognized auth pattern: pam_unix_session_closed"}, + {"line_number": 20, "reason": "unrecognized auth pattern: pam_unix_session_closed"}, + {"line_number": 21, "reason": "unrecognized auth pattern: pam_faillock_account_locked"}, + {"line_number": 22, "reason": "unrecognized auth pattern: pam_faillock_account_locked"}, + {"line_number": 23, "reason": "unrecognized auth pattern: pam_sss_unknown_user"}, + {"line_number": 24, "reason": "unrecognized auth pattern: sudo_other"}, + {"line_number": 26, "reason": "missing syslog header fields"}, + {"line_number": 27, "reason": "unrecognized auth pattern: program_cron"} + ] +} diff --git a/tests/test_parser.cpp b/tests/test_parser.cpp index c874d3d..dc111a2 100644 --- a/tests/test_parser.cpp +++ b/tests/test_parser.cpp @@ -2,6 +2,8 @@ #include #include +#include +#include #include #include #include @@ -52,12 +54,67 @@ std::filesystem::path asset_path(std::string_view filename) { return repo_root() / "assets" / std::string(filename); } +std::filesystem::path parser_matrix_fixture_path(std::string_view filename) { + return repo_root() / "tests" / "fixtures" / "parser_matrix" / std::string(filename); +} + +std::string read_text_file(const std::filesystem::path& path) { + std::ifstream input(path); + if (!input) { + throw std::runtime_error("unable to read file: " + path.string()); + } + + return std::string((std::istreambuf_iterator(input)), std::istreambuf_iterator()); +} + void expect_close(double actual, double expected, double tolerance, const std::string& message) { if (std::fabs(actual - expected) > tolerance) { throw std::runtime_error(message); } } +std::size_t total_input_lines(const loglens::ParseReport& result) { + return result.quality.total_lines + result.quality.skipped_blank_lines; +} + +std::string noisy_auth_coverage_json(const loglens::ParseReport& result) { + std::ostringstream output; + output << "{\n" + << " \"fixture\": \"assets/noisy_auth_sample.log\",\n" + << " \"input_mode\": \"" << loglens::to_string(result.metadata.input_mode) << "\",\n" + << " \"assume_year\": " << *result.metadata.assume_year << ",\n" + << " \"total_input_lines\": " << total_input_lines(result) << ",\n" + << " \"total_lines\": " << result.quality.total_lines << ",\n" + << " \"skipped_blank_lines\": " << result.quality.skipped_blank_lines << ",\n" + << " \"parsed_lines\": " << result.quality.parsed_lines << ",\n" + << " \"unparsed_lines\": " << result.quality.unparsed_lines << ",\n" + << " \"parse_success_rate\": " << std::fixed << std::setprecision(10) + << result.quality.parse_success_rate << ",\n" + << " \"parsed_event_count\": " << result.events.size() << ",\n" + << " \"warning_count\": " << result.warnings.size() << ",\n" + << " \"top_unknown_patterns\": [\n"; + + for (std::size_t index = 0; index < result.quality.top_unknown_patterns.size(); ++index) { + const auto& entry = result.quality.top_unknown_patterns[index]; + output << " {\"pattern\": \"" << entry.pattern << "\", \"count\": " << entry.count << "}"; + output << (index + 1 == result.quality.top_unknown_patterns.size() ? "\n" : ",\n"); + } + + output << " ],\n" + << " \"warnings\": [\n"; + + for (std::size_t index = 0; index < result.warnings.size(); ++index) { + const auto& warning = result.warnings[index]; + output << " {\"line_number\": " << warning.line_number + << ", \"reason\": \"" << warning.reason << "\"}"; + output << (index + 1 == result.warnings.size() ? "\n" : ",\n"); + } + + output << " ]\n" + << "}\n"; + return output.str(); +} + void test_invalid_user_failure() { const auto parser = make_syslog_parser(); std::string error; @@ -967,6 +1024,42 @@ void test_journalctl_fixture_matrix_file() { expect(result.quality.top_unknown_patterns[3].count == 1, "expected one sshd negotiation-failure journalctl line"); } +void test_noisy_auth_fixture_matrix_file() { + const auto parser = make_syslog_parser(); + const auto result = parser.parse_file(asset_path("noisy_auth_sample.log")); + + expect(result.events.size() == 8, "expected eight parsed noisy-auth events"); + expect(result.warnings.size() == 16, "expected sixteen noisy-auth warnings"); + expect(total_input_lines(result) == 27, "expected noisy-auth total input line count"); + expect(result.quality.total_lines == 24, "expected noisy-auth nonblank line count"); + expect(result.quality.skipped_blank_lines == 3, "expected noisy-auth skipped blank line count"); + expect(result.quality.parsed_lines == 8, "expected noisy-auth parsed line count"); + expect(result.quality.unparsed_lines == 16, "expected noisy-auth unparsed line count"); + expect_close(result.quality.parse_success_rate, 8.0 / 24.0, 1e-9, + "expected noisy-auth parse success rate"); + + expect(result.events[0].hostname == "alpha-host", "expected first noisy-auth host"); + expect(result.events[0].username == "svc+deploy", "expected unusual invalid-user username"); + expect(result.events[1].hostname == "beta-host", "expected second noisy-auth host"); + expect(result.events[1].username == "ops.robot", "expected dotted accepted-password username"); + expect(result.events[2].event_type == loglens::EventType::SudoCommand, + "expected noisy-auth sudo command event"); + expect(result.events[3].event_type == loglens::EventType::SudoPolicyDenied, + "expected noisy-auth sudoers denial event"); + expect(result.events[4].event_type == loglens::EventType::SudoPolicyDenied, + "expected noisy-auth command-not-allowed denial event"); + expect(result.events[5].event_type == loglens::EventType::PamAuthFailure, + "expected partial pam_unix failure to remain parsed lower-confidence evidence"); + expect(result.events[5].username.empty(), "expected partial pam_unix failure to stay username-less"); + expect(result.events[5].source_ip.empty(), "expected partial pam_unix failure to stay source-less"); + expect(result.events[7].hostname == "delta-host", "expected noisy-auth multi-host coverage"); + expect(result.events[7].username == "weird/user", "expected slash username in input_userauth_request"); + + const auto actual = noisy_auth_coverage_json(result); + const auto expected = read_text_file(parser_matrix_fixture_path("noisy_auth_expected.json")); + expect(actual == expected, "expected noisy auth coverage summary to match fixture"); +} + } // namespace int main() { @@ -1016,5 +1109,6 @@ int main() { test_journalctl_rejects_empty_fractional_seconds(); test_syslog_fixture_matrix_file(); test_journalctl_fixture_matrix_file(); + test_noisy_auth_fixture_matrix_file(); return 0; } diff --git a/tests/test_report.cpp b/tests/test_report.cpp index 8dabc2e..858645b 100644 --- a/tests/test_report.cpp +++ b/tests/test_report.cpp @@ -29,6 +29,31 @@ std::string read_file(const std::filesystem::path& path) { return std::string((std::istreambuf_iterator(input)), std::istreambuf_iterator()); } +std::filesystem::path repo_root() { + const std::filesystem::path source_path{__FILE__}; + std::vector candidates; + + if (source_path.is_absolute()) { + candidates.push_back(source_path); + } else { + const auto cwd = std::filesystem::current_path(); + candidates.push_back(cwd / source_path); + candidates.push_back(cwd.parent_path() / source_path); + } + + for (const auto& candidate : candidates) { + if (std::filesystem::exists(candidate)) { + return candidate.parent_path().parent_path(); + } + } + + throw std::runtime_error("unable to resolve repository root from test source path"); +} + +std::filesystem::path asset_path(std::string_view filename) { + return repo_root() / "assets" / std::string(filename); +} + loglens::ReportData make_report_data() { loglens::ReportData data; data.input_path = std::filesystem::path{"assets/sample_auth.log"}; @@ -41,6 +66,48 @@ loglens::ReportData make_report_data() { return data; } +void test_noisy_auth_report_json_keeps_unsupported_lines_visible() { + const auto input_path = asset_path("noisy_auth_sample.log"); + const loglens::AuthLogParser parser(loglens::ParserConfig{ + loglens::InputMode::SyslogLegacy, + 2026}); + const auto parsed = parser.parse_file(input_path); + + const loglens::Detector detector; + const auto findings = detector.analyze(parsed.events); + + loglens::ReportData data; + data.input_path = std::filesystem::path{"assets/noisy_auth_sample.log"}; + data.parse_metadata = parsed.metadata; + data.parser_quality = parsed.quality; + data.events = parsed.events; + data.findings = findings; + data.warnings = parsed.warnings; + data.auth_signal_mappings = detector.config().auth_signal_mappings; + + const auto json = loglens::render_json_report(data); + + expect(findings.empty(), "expected noisy unsupported lines not to create findings"); + expect(json.find("\"parse_success_rate\": 0.3333") != std::string::npos, + "expected noisy report json parse success rate"); + expect(json.find("\"parsed_event_count\": 8") != std::string::npos, + "expected noisy report json parsed event count"); + expect(json.find("\"warning_count\": 16") != std::string::npos, + "expected noisy report json warning count"); + expect(json.find("\"finding_count\": 0") != std::string::npos, + "expected noisy report json finding count"); + expect(json.find("\"pattern\": \"sshd_connection_closed_preauth\", \"count\": 2") != std::string::npos, + "expected noisy report json stable sshd preauth bucket"); + expect(json.find("\"pattern\": \"pam_faillock_account_locked\", \"count\": 2") != std::string::npos, + "expected noisy report json stable pam_faillock account-lock bucket"); + expect(json.find("\"line_number\": 13, \"reason\": \"unrecognized auth pattern: sshd_connection_closed_preauth\"") + != std::string::npos, + "expected noisy report json to keep unsupported sshd warning visible"); + expect(json.find("\"line_number\": 24, \"reason\": \"unrecognized auth pattern: sudo_other\"") + != std::string::npos, + "expected noisy report json to keep unsupported partial sudo warning visible"); +} + void test_markdown_table_cells_escape_user_controlled_values() { auto data = make_report_data(); @@ -244,6 +311,7 @@ void test_write_reports_reports_csv_write_failure() { } // namespace int main() { + test_noisy_auth_report_json_keeps_unsupported_lines_visible(); test_markdown_table_cells_escape_user_controlled_values(); test_json_escapes_generic_control_characters(); test_reports_include_total_input_line_count();