Skip to content

Commit d3afcfb

Browse files
weltekialexellis
authored andcommitted
Add blog post on faas-cli diag for support and architecture reviews
Signed-off-by: Han Verstraete (OpenFaaS Ltd) <han@openfaas.com>
1 parent 34af017 commit d3afcfb

File tree

4 files changed

+220
-0
lines changed

4 files changed

+220
-0
lines changed
Lines changed: 220 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,220 @@
1+
---
2+
title: "Introducing: Painless support and hands-off architecture reviews"
3+
description: "Learn how faas-cli diag collects diagnostic data from your OpenFaaS cluster — making support requests faster and architecture reviews hands-off."
4+
date: 2026-03-10
5+
categories:
6+
- kubernetes
7+
- troubleshooting
8+
- openfaas-pro
9+
author_staff_member: han
10+
dark_background: true
11+
# image: "/images/2026-03-diag/background.png"
12+
hide_header_image: true
13+
---
14+
15+
Learn how faas-cli diag collects diagnostic data from your OpenFaaS cluster — making support requests faster and architecture reviews hands-off.
16+
17+
`faas-cli diag` is a plugin for the faas-cli that collects diagnostic data from your OpenFaaS cluster. We built it to take the friction out of two things: getting help when something's broken, and making sure you're getting the most out of OpenFaaS.
18+
19+
It generates an HTML report you can open in your browser to explore graphs and visualisations, and packages everything into an archive you can quickly share to get support. One command, no manual steps, nothing to forget.
20+
21+
![End-to-end flow for faas-cli diag](/images/2026-03-diag/e2e_flow.png)
22+
23+
**Hands-off support**
24+
25+
When something goes wrong in production, the last thing you want is to be sent to a troubleshooting guide and told to run half a dozen commands. Your product is on fire. People are starting to point the finger of blame. You just want it fixed.
26+
27+
That's why we built `faas-cli diag`, a single command that collects everything we need to help: deployments, function definitions, logs, events, pod status, and Prometheus metrics. Run it, send us the archive, and we can start working on your issue immediately, without a back-and-forth asking you to gather more data.
28+
29+
**Review your architecture**
30+
31+
Beyond troubleshooting, the data and graphs collected by `faas-cli diag` can help you answer broader questions about your setup: are you getting the most value possible from the product? Is there an OpenFaaS features that could help with your type of workload? Is there a production incident waiting to happen because something's been mixed up in the `values.yaml`?
32+
33+
The report generated by diag gives you a starting point. You can inspect invocation rates, error rates, replica counts, and resource usage without needing to set up dashboards or port-forward to Prometheus. You can also send us the archive if you'd like help with an architecture review, and we'll come back with recommendations tailored to your setup.
34+
35+
## What does it collect?
36+
37+
The diag tool gathers the following from your cluster:
38+
39+
- **Deployment YAMLs** — exported specs for OpenFaaS core components and functions
40+
- **Function CRs** — Custom Resource definitions for deployed functions
41+
- **Kubernetes events** — cluster events from the OpenFaaS and function namespaces
42+
- **Pod status** — output from `kubectl get` and `kubectl describe` for all relevant pods
43+
- **Container logs** — streamed via [stern](https://github.com/stern/stern) for real-time and retrospective log collection
44+
- **Node info** — inventory and descriptions for all cluster nodes
45+
- **Helm values** — user-supplied values for the OpenFaaS Helm release
46+
- **Ingress & Gateway API** — Ingress, IngressClass, HTTPRoute, and GatewayClass resources
47+
- **Network Policies** — NetworkPolicy resources from OpenFaaS and function namespaces
48+
- **Prometheus metrics** — metrics snapshots and visualisations covering replicas, request rates, latencies, and resource usage
49+
50+
All collected data is written to a local directory and archived into a `.tar.gz` file for easy sharing. The tool is 100% offline — no information is shared with anyone, including OpenFaaS Ltd, by default.
51+
52+
## Install the diag plugin
53+
54+
Install the diag plugin using the faas-cli plugin manager:
55+
56+
```bash
57+
faas-cli plugin get diag
58+
```
59+
60+
Verify the installation:
61+
62+
```bash
63+
faas-cli diag version
64+
```
65+
66+
## Generate a report
67+
68+
By default, diag runs against your currently selected kubectl context. Generate a configuration file, then run the tool:
69+
70+
```bash
71+
# Generate a `diag.yaml` config file
72+
faas-cli diag config simple
73+
74+
# Run diagnostics
75+
faas-cli diag
76+
```
77+
78+
The first command creates a `diag.yaml` with sensible defaults that works for most setups. The second starts the collection: it sets up port-forwards, streams logs, collects Kubernetes resources, and scrapes Prometheus metrics. Press `Control+C` once to stop gracefully, it will finish collecting and write all output to disk.
79+
80+
**Staging and production**
81+
82+
If you manage separate clusters for staging and production, you can run diag multiple times against each environment. Either switch your kubectl context between runs, or create a dedicated config file per cluster:
83+
84+
```bash
85+
faas-cli diag config simple > diag-staging.yaml
86+
faas-cli diag config simple > diag-prod.yaml
87+
```
88+
89+
Edit each config to set the `context` field and any other parameters for that environment, then generate a report for each:
90+
91+
```bash
92+
faas-cli diag diag-staging.yaml
93+
faas-cli diag diag-prod.yaml
94+
```
95+
96+
For more advanced options like targeting specific functions or using an external Prometheus instance, see the [full configuration reference](#appendix-full-configuration-reference) at the end of this post.
97+
98+
**Running at scale with hundreds of namespaces**
99+
100+
If you're running a multi-tenant setup with hundreds of function namespaces, you probably don't want to collect from all of them at once. Use the `--namespace` flag to target a specific subset:
101+
102+
```bash
103+
faas-cli diag config simple --namespace staging --namespace production
104+
```
105+
106+
Or use `'*'` to automatically discover all OpenFaaS function namespaces:
107+
108+
```bash
109+
faas-cli diag config simple --namespace '*'
110+
```
111+
112+
<script src="https://asciinema.org/a/tsVGRdQhWh7p32hp.js" id="asciicast-tsVGRdQhWh7p32hp" async="true" data-autoplay="true" data-loop="true"></script>
113+
114+
## Exploring the report
115+
116+
Output is saved to the `./run` directory in a timestamped folder, along with a `.tar.gz` archive ready to share with the OpenFaaS team or colleagues. Open the generated `index.html` file in a browser to explore the collected metrics and inspect graphs:
117+
118+
```bash
119+
open ./run/2026-03-10_14-30-00/index.html
120+
```
121+
122+
The report includes visualisations of Prometheus metrics such as function invocation rates, error rates, and replica counts, giving you a quick overview of cluster health without needing to set up Grafana or port-forward to Prometheus yourself.
123+
124+
![The report summary page with quick links to metrics, CRDs, pods, events, and logs per namespace.](/images/2026-03-diag/report-summary.png)
125+
> The report summary page with quick links to metrics, CRDs, pods, events, and logs per namespace.
126+
127+
![The metrics dashboard showing function replicas, request rates by status code, and execution duration.](/images/2026-03-diag/report-metrics-dashboard.png)
128+
> The metrics dashboard showing function replicas, request rates by status code, and execution duration.
129+
130+
**AI ready**
131+
132+
The output also includes an `AGENTS.md` file that instructs AI coding agents like Claude Code, Codex, and similar tools to interpret and diagnose issues from the collected data. This means you can outsource the first pass of a support investigation or architecture review to an AI agent, clearing up any initial issues.
133+
134+
A word of caution: most AI coding plans retain data from anywhere between 30 days to 5 years, and some may train on customer data. Many providers offer a zero data retention option through API-based tokens and/or specific Enterprise plans. We advise very careful review of your provider's data handling policies before sending any potentially sensitive cluster data to an AI agent.
135+
136+
If data privacy is a concern, the realistic paths are:
137+
138+
- Scrub or redact the collected data before passing it to an AI agent
139+
- Use a local model. OpenFaaS Ltd has tested a number of local models with physical GPUs in airgapped environments
140+
141+
## Useful flags and options
142+
143+
| Flag / Command | Description | Example |
144+
|---|---|---|
145+
| `-d/--duration` | Auto-stop after a set duration | `faas-cli diag -d 5m` |
146+
| `--age` | Collect logs from a past time window | `faas-cli diag --age 1h` |
147+
| `diag [run-name]` | Custom name for the run (positional argument) | `faas-cli diag incident-456` |
148+
149+
## Wrapping up
150+
151+
The `faas-cli diag` plugin gives you a fast, repeatable way to collect everything needed for support requests and architecture reviews. Instead of manually running a dozen `kubectl` commands, you get a single workflow that captures logs, events, pod status, and metrics — all archived and ready to share.
152+
153+
Whether you're debugging an incident or reviewing your cluster setup, the workflow is the same: run `faas-cli diag` and explore the report. If you need our help, send us the archive.
154+
155+
For more details, see the [Troubleshooting docs](https://docs.openfaas.com/deployment/troubleshooting/).
156+
157+
## Appendix: full configuration reference
158+
159+
Generate the full configuration template with:
160+
161+
```bash
162+
faas-cli diag config full
163+
```
164+
165+
```yaml
166+
# Identify the cluster and kubectl context
167+
clusterName: "production-cluster"
168+
context: "" # Leave empty to use current context
169+
170+
# Namespaces to collect from
171+
namespaces:
172+
openfaas: openfaas
173+
functions:
174+
- openfaas-fn
175+
- staging-fn
176+
- production-fn
177+
178+
# Function filter patterns (glob-style)
179+
functions:
180+
- 'api-*'
181+
- 'webhook-*'
182+
183+
# Prometheus configuration
184+
prometheus:
185+
enabled: true
186+
service: prometheus
187+
targetPort: 9090
188+
# Use a custom URL if Prometheus is outside the openfaas namespace
189+
# url: "http://prometheus.monitoring.svc.cluster.local:9090"
190+
191+
# Gateway configuration
192+
gateway:
193+
enabled: true
194+
service: gateway
195+
targetPort: 8080
196+
autoAuth: true
197+
198+
# What to collect
199+
collection:
200+
deployments: true
201+
functionCRs: true
202+
events: true
203+
podStatus: true
204+
logs: true
205+
metrics: true
206+
logAge: "1h"
207+
208+
# Output directory and run name
209+
output:
210+
directory: "./run"
211+
# runName: "incident-123"
212+
```
213+
214+
A few options worth noting:
215+
216+
- `context` - lets you target a specific kubectl context if you manage multiple clusters. Leave it empty to use whichever context is currently active.
217+
- `functions` - uses glob patterns to filter which functions are collected. Use `'*'` for all, or patterns like `'api-*'` to narrow the scope on large clusters.
218+
- `prometheus.url` - lets you point to an external Prometheus instance, bypassing the automatic port-forward.
219+
- `collection` - toggles let you disable individual collectors if you only need a subset of the data.
220+
- `logAge` - controls how far back to collect logs retrospectively. Leave it empty to collect all available logs.

images/2026-03-diag/e2e_flow.png

66.9 KB
Loading
109 KB
Loading
26.7 KB
Loading

0 commit comments

Comments
 (0)