ggbridge/docs/troubleshoot.md at main · GitGuardian/ggbridge

Troubleshooting

Debug image

GGBridge provides two types of images for different use cases:

Production image: ghcr.io/gitguardian/ggbridge:latest - Minimal, secure image without shell access
Debug image: ghcr.io/gitguardian/ggbridge:latest-shell - Includes debugging tools

Available debug tools: bash, curl, net-tools, bind-tools, openssl, dig, nslookup

How to switch to debug image:

Docker Compose: Update the image tag in docker-compose.yaml
Helm: Update the image tag in values.yaml

Basic checks

Some very basic commands can be executed to check deployment healthiness before going further into debugging.

Check pods status:

kubectl get pods -n ggbridge -o wide

You should be looking for Running status, Ready column showing 1/1 (or 2/2), and low restart count.

Check pods details:

kubectl describe pod $pod_name -n ggbridge

Have a look at the Events section for suspicious warnings or errors.

Connectivity Tests

1. Client-Side Healthcheck

Verify basic connectivity from the client to the server:

kubectl exec -it $ggbridge_pod -- bash -c "curl http://127.0.0.1:9081/healthz"

Expected output:

OK

2. SOCKS Proxy Test (Server-Side)

Test SOCKS proxy connectivity and DNS resolution.

Important

If you want to execute this test on client side, you need first to enable socks tunnel in your values.yaml by adding these lines :

client:
  tunnels:
    socks:
      enabled: true

Then upgrade your deployment:

helm -n ggbridge upgrade -i gbridge oci://ghcr.io/gitguardian/ggbridge/helm/ggbridge -f values.yaml

And finally, test the connection. By default, the service name will be ggbridge-proxy (different from the server side). Only endpoints in the allowed list can be accessed - for testing, you can use https://api.gitguardian.com.

kubectl run debug -it --rm \
                      --restart=Never \
                      -n ggbridge \
                      --image=nicolaka/netshoot:latest \
                      -- zsh -c "curl -sILk --proxy socks5h://ggbridge-proxy.ggbridge.svc.cluster.local:1080 https://api.gitguardian.com"

Quick test (HTTP status code only):

curl -sLk \
     -o /dev/null \
     -w "%{http_code}" \
     --connect-timeout 60 \
     --proxy "socks5h://${PROXY_HOST}:${PROXY_PORT}" "${VCS_URL}"

Verbose test (with headers):

curl -sILk --connect-timeout 60 \
           --proxy "socks5h://${PROXY_HOST}:${PROXY_PORT}" "${VCS_URL}"

Real-world example:

# Replace $uid with your actual bridge UID
kubectl run debug -it --rm \
                      --restart=Never \
                      -n ggbridge \
                      --image=nicolaka/netshoot:latest \
                      -- zsh -c "curl -sILk --proxy socks5h://$uid.ggbridge.svc.cluster.local https://vcs.example.local"

Expected responses:

200: Success
301/302: Redirect

Note

The socks5h is intended for remote DNS lookup.

3. Git Repository Test (Server-Side)

Test Git operations through the SOCKS proxy:

git -c http.proxy="socks5h://${PROXY_HOST}" \
    -c http.sslVerify=false \
    -c http.timeout=30 \
    ls-remote --heads "${REPO_URL_WITH_AUTH}"

Example with authentication:

git -c http.proxy="socks5h://$uid-proxy-socks:1080" \
    -c http.sslVerify=false \
    -c http.timeout=30 \
    ls-remote --heads "https://admin:token@gitlab.example.local/group1/myrepo.git"

Expected output: List of Git branches and their commit hashes

Tip

Please consider using the CronJob probes available here if you want a permanent check.

4. Reverse tunneling

When reverse tunneling is enabled on client side, you can check if you are able to connect to api.gitguardian.com. Execute this command on the customer's cluster:

kubectl run debug -it --rm \
                      --restart=Never \
                      -n ggbridge \
                      --image=nicolaka/netshoot \
                      -- zsh -c "curl -IL --resolve api.gitguardian.com:443:$(kubectl get svc ggbridge-proxy-tls -n ggbridge -o jsonpath='{.spec.clusterIP}') https://api.gitguardian.com"

Check that DNS resolution on customer's environment properly resolve to the custom Kubernetes endpoint (implentation specific) instead of the public IP address of hook.gitguardian.com/api.gitguardian.com. Execute following commands from the customer's VCS server for example:

dig hook.gitguardian.com
dig api.gitguardian.com

traceroute hook.gitguardian.com
traceroute api.gitguardian.com

Log Analysis

Tip

To collect and package client side logs in a .tgz archive, you can use the dedicated script

1. Client/Server Healthcheck Logs

Check nginx sidecar logs for connectivity issues :

Server side:

# Check ggbridge server nginx container logs for a specific tenant and index
kubectl logs -l tenant=$uid,index=$index,app.kubernetes.io/component=server -c nginx -n ggbridge

Client side:

# Check ggbridge client nginx container logs for a specific index
kubectl logs -l index=$index,app.kubernetes.io/instance=ggbridge -c nginx -n ggbridge

Healthy connection log example for the Healthcheck probe (nginx container for server/client pod):

health 127.0.0.1 [30/Sep/2025:12:04:38 +0000] 127.0.0.1 "GET /healthz HTTP/1.1" 200 3 "-" "Go-http-client/1.1"

No logs = No connectivity from the other tunnel endpoint.

2. Client/Server tunnel Logs

Server side:

# Check ggbridge server main container logs for a specific tenant and index
kubectl logs -l tenant=$uid,index=$index,app.kubernetes.io/component=server -c ggbridge -n ggbridge

Client side:

# Check ggbridge client main container logs for a specific index
kubectl logs -l index=$index,app.kubernetes.io/instance=ggbridge -c ggbridge -n ggbridge

You should see INFO logs mentionning opened/closed connections:

2025-10-16T08:21:06.156024Z  INFO wstunnel::protocols::tls::server: Doing TLS handshake using SNI DnsName("jpynh30wscp60zs4lbdf4m4p8qe9idgu.ggbridge.gitguardian.com") with the server jpynh30wscp60zs4lbdf4m4p8qe9idgu.ggbridge.gitguardian.com:443
2025-10-16T08:21:06.570872Z  INFO tunnel{id="0199ec1b-c14b-7f41-9492-e538c7a90f97" remote="127.0.0.1:8081"}: wstunnel::tunnel::transport::io: Closing local => remote tunnel
2025-10-16T08:21:06.571213Z  INFO tunnel{id="0199ec1b-c14b-7f41-9492-e538c7a90f97" remote="127.0.0.1:8081"}: wstunnel::tunnel::transport::io: Closing local <= remote tunnel
2025-10-16T08:21:08.489704Z  INFO tunnel{id="0199ec1b-af3c-70e3-8595-cd82aaf74cf4" remote="0.0.0.0:9081"}: wstunnel::tunnel::transport::io: Closing local => remote tunnel
2025-10-16T08:21:08.738773Z  INFO wstunnel::protocols::tls::server: Doing TLS handshake using SNI DnsName("jpynh30wscp60zs4lbdf4m4p8qe9idgu.ggbridge.gitguardian.com") with the server jpynh30wscp60zs4lbdf4m4p8qe9idgu.ggbridge.gitguardian.com:443
2025-10-16T08:21:10.693531Z  INFO tunnel{id="0199ec1b-b789-7362-b52a-5853e726c484" remote="0.0.0.0:9081"}: wstunnel::tunnel::transport::io: Closing local => remote tunnel
2025-10-16T08:21:10.947501Z  INFO wstunnel::protocols::tls::server: Doing TLS handshake using SNI DnsName("jpynh30wscp60zs4lbdf4m4p8qe9idgu.ggbridge.gitguardian.com") with the server jpynh30wscp60zs4lbdf4m4p8qe9idgu.ggbridge.gitguardian.com:443

Note

Any log entries at WARN or ERROR level are worth highlighting if present.

Note

You can also increase verbosity if needed, at DEBUG or TRACE level (default INFO):

logLevel: INFO # --> set de DEBUG or TRACE on server/client side values.yaml

3. Server-Side Proxy Logs

Monitor traffic through the SOCKS proxy:

kubectl logs -l tenant=$uid,index=$index,app.kubernetes.io/component=proxy -n ggbridge

Port meanings:

8081: Health checks
1080: SOCKS proxy traffic
443: HTTPS/TLS traffic
80: HTTP traffic

Log format explanation:

Position	Value	Nginx variable	Description	Unit
1	`127.0.0.1`	`$remote_addr`	Local client(health check)	IP
2	`[24/Sep/2025:09:46:28 +0000]`	`[$time_local]`	Connection timestamp	Date
3	`TCP`	`$protocol`	Transport protocol	Protocol
4	`200`	`$status`	Status code	Code
5	`150`	`$bytes_sent`	Bytes sent by nginx → client	Bytes
6	`102`	`$bytes_received`	Bytes received by nginx ← client	Bytes
7	`0.077`	`$session_time`	Session duration	Seconds
8	`"172.20.167.124:8081"`	`"$upstream_addr"`	Healthcheck backend server	IP:Port
9	`"102"`	`"$upstream_bytes_sent"`	Data sent nginx → backend	Bytes
10	`"150"`	`"$upstream_bytes_received"`	Data received nginx ← backend	Bytes
11	`"0.000"`	`"$upstream_connect_time"`	Connection time	Seconds

Tip

If Session duration reach 5sec for healtcheck (port 8081), it means time out occured

Tunnel Disruption Analysis (Server-Side)

When investigating mass bridge disconnections or tunnel outages, the wstunnel server logs provide key indicators about the disruption lifecycle. This section documents the log messages emitted by the wstunnel server process during a tunnel disruption and recovery, along with their source in the codebase and their meaning.

Disruption sequence

When a WebSocket tunnel is interrupted (e.g. ingress pod eviction, network disruption, Karpenter node consolidation), the server-side logs follow a predictable sequence:

error while writing to tx tunnel              <-- tunnel is dead
error while handling pending operations       <-- ping/pong fails
    |
    v
New reverse connection failed to be           <-- listener closing
picked by client after 30s                        (connections arrive but
                                                   nobody consumes them)
No client connected to reverse tunnel         <-- listener closing
server for 30s                                    (no traffic at all)
    |
    v
Stopping listening reverse server             <-- port unbound
    |
    v
connected to ReverseTcp                       <-- recovery

Tunnel disconnection indicators

These logs appear when the WebSocket connection breaks. They occur on every normal connection close (~every 7s for health probes), but a disruption is identified when they appear without a subsequent Accepting connection within a few seconds.

Log message	Source	Meaning
`Closing local => remote tunnel`	`io.rs:105`	The local-to-remote forwarder exits (WebSocket writer errored or local reader closed)
`Closing local <= remote tunnel`	`io.rs:183`	The remote-to-local forwarder exits

Error messages during disruption

Log message	Source	Meaning
`error while writing to tx tunnel {err}`	`io.rs:166`	Write error on the tunnel (broken pipe, connection reset)
`error while handling pending operations {err}`	`io.rs:138`	Ping/pong handling failure (connection dead)
`error while reading incoming bytes from local tx tunnel: {err}`	`io.rs:159`	Read error on the tunnel
`Error while listening for incoming connections {err}`	`reverse_tunnel.rs:91`	TCP listener error on the reverse tunnel port

Reverse tunnel listener shutdown

After the WebSocket handler dies, the reverse tunnel listener (port 9081) does not close immediately. It runs in a separate spawned task and checks periodically whether anyone is still consuming connections. There are two shutdown triggers:

Log message	Source	Meaning
`New reverse connection failed to be picked by client after {N}s. Closing reverse tunnel server`	`reverse_tunnel.rs:96`	A TCP connection arrived on the reverse tunnel port but no WebSocket handler picked it up within the idle timeout. This happens when nginx health probes keep arriving but the tunnel is dead.
`No client connected to reverse tunnel server for {N}s. Closing reverse tunnel server`	`reverse_tunnel.rs:107`	Idle timeout with zero activity. No WebSocket handler is consuming the channel (`receiver_count <= 1`) and no new client has registered. This happens when nginx has already marked the upstream as down and stopped sending probes.
`Stopping listening reverse server`	`reverse_tunnel.rs:113`	The TCP listener is dropped and the reverse tunnel port is unbound. From this point, any connection to the port returns `Connection refused`.

The idle timeout is controlled by SERVER_IDLE_TIMEOUT (default: 30 seconds). The listener closes between 0 and SERVER_IDLE_TIMEOUT seconds after the last WebSocket handler exits, depending on where in the timer interval the disconnect occurred.

Recovery indicators

Log message	Source	Meaning
`Accepting connection`	`server.rs:412`	New incoming TCP connection (client WebSocket arriving)
`Tunnel accepted due to matched restriction: {name}`	`server.rs:131`	Tunnel authorized by restriction rules
`connected to {protocol} {host}:{port}`	`server.rs:144`	Reverse tunnel re-established, the port is re-bound and accepting connections

Observability queries

Detect a mass tunnel disruption (search in your log aggregator):

"Closing reverse tunnel server" OR "Stopping listening reverse server"

A spike in these messages across multiple bridges simultaneously indicates a mass disconnection event (e.g. ingress disruption, network outage).

Detect recovery:

"connected to ReverseTcp"

A spike in connected to ReverseTcp messages following a disruption indicates clients are reconnecting.

Detect tunnel errors:

"error while writing to tx tunnel" OR "error while handling pending operations"

These errors precede the listener shutdown and indicate the WebSocket connection is broken.

Example: Coralogix queries (adapt to your log aggregator):

Disruption detection (listener shutdown):

resource.attributes.k8s.namespace.name="ggbridge" AND resource.attributes.k8s.container.name="ggbridge" AND resource.attributes.k8s.deployment.name="*-server-*" AND (body:"Closing reverse tunnel server" OR body:"Stopping listening reverse server")

Tunnel errors (broken WebSocket):

resource.attributes.k8s.namespace.name="ggbridge" AND resource.attributes.k8s.container.name="ggbridge" AND resource.attributes.k8s.deployment.name="*-server-*" AND (body:"error while writing to tx tunnel" OR body:"error while handling pending operations")

Recovery detection (clients reconnecting):

resource.attributes.k8s.namespace.name="ggbridge" AND resource.attributes.k8s.container.name="ggbridge" AND resource.attributes.k8s.deployment.name="*-server-*" AND body:"connected to ReverseTcp"

Proxy-side connection refused (nginx container):

resource.attributes.k8s.namespace.name="ggbridge" AND resource.attributes.k8s.container.name="nginx" AND resource.attributes.k8s.deployment.name="*-proxy-*" AND body:"Connection refused"

Tip

During a disruption, correlate the timestamp of Stopping listening reverse server with the proxy nginx logs showing Connection refused to confirm the causal chain. The proxy starts seeing Connection refused within seconds of the listener shutting down.

Note

The wstunnel server process itself does not crash during a tunnel disruption. It stays alive and continues accepting new WebSocket connections on the main port (9000). Only the reverse tunnel listener port (9081) is closed. When a client reconnects, the listener is automatically re-created.

Client Monitoring/Alerting Guidelines

Overview

Note

This guide provides generic recommendations for monitoring GGBridge client health and stability. These guidelines are platform-agnostic and can be adapted to your existing monitoring infrastructure.

Replica count

Ensure that all 3 GGBridge client deployments are properly deployed, each with 1 replica:

$ kubectl get deployments -n ggbridge
NAME                READY   UP-TO-DATE   AVAILABLE   AGE
ggbridge-client-0   1/1     1            1           25h
ggbridge-client-1   1/1     1            1           25h
ggbridge-client-2   1/1     1            1           25h

What to monitor:

All deployments should show 1/1 in the READY column

Alert condition:

Any deployment showing 0/1 or missing deployments

Prometheus query example:

kube_deployment_status_replicas_ready{namespace="ggbridge", deployment=~"ggbridge-client-.*"}

Count deployment with correct status (should be 3):

sum(
  (kube_deployment_status_replicas_ready{namespace="ggbridge", deployment=~"ggbridge-client-.*"} == 1) and
  (kube_deployment_spec_replicas{namespace="ggbridge", deployment=~"ggbridge-client-.*"} == 1) and
  (kube_deployment_status_replicas_available{namespace="ggbridge", deployment=~"ggbridge-client-.*"} == 1)
)

Pod Status and Readiness

Check that all pods are running and ready to accept connections:

$ kubectl get pods -n ggbridge
NAME                                  READY   STATUS    RESTARTS   AGE
ggbridge-client-0-76687c7f6f-h6zrj   2/2     Running   0          25h
ggbridge-client-1-89abc123de-xyz45   2/2     Running   0          25h
ggbridge-client-2-12def456gh-abc78   2/2     Running   0          25h

What to monitor:

All pods should show 2/2 in the READY column (ggbridge + nginx containers)
STATUS should be Running
Monitor restart count - frequent restarts indicate issues

Alert conditions:

Pod showing 1/2 ready (connection issues with server)
Pod in CrashLoopBackOff, Error, or Pending status
High restart count (>5 restarts in 1 hour)

Prometheus query example:

kube_pod_status_ready{condition="true", namespace="ggbridge", pod=~"ggbridge-client-.*"}

Container Logs Analysis

Monitor logs from the ggbridge container for connection issues:

Key error patterns to watch for:

WebSocket handshake failures (server connectivity issues):

2025-09-30T15:35:11.627155Z ERROR tunnel{id="01999b43-6b64-7a61-bab6-6ff55b03aade" remote="127.0.0.1:8081"}: wstunnel::tunnel::client::client: failed to do websocket handshake with the server wss://jpynh30wscp60zs4lbdf4m4p8qe9idgu.ggbridge.gitguardian.com:443

What to monitor:

Frequency of ERROR log entries
Specific error patterns indicating connectivity issues
Connection establishment success/failure rates

Loki query example:

{k8s_namespace_name="ggbridge", k8s_pod_name=~"ggbridge-client-.*"} |= "ERROR"

Resource Usage

Monitor pod resource consumption:

$ kubectl top pods -n ggbridge
NAME                                 CPU(cores)   MEMORY(bytes)   
ggbridge-client-0-76687c7f6f-h6zrj   8m           7Mi             
ggbridge-client-1-bd75768f4-cr59l    10m          8Mi             
ggbridge-client-2-689f9d7c5-bz9k5    9m           7Mi

What to monitor:

CPU usage
Memory usage
Sudden spikes in resource usage

Prometheus query example:

# CPU (millicores)
rate(container_cpu_usage_seconds_total{namespace="ggbridge", pod=~"ggbridge-client-.*", container!="POD", container!=""}[5m]) * 1000

# Memory (MB) 
container_memory_working_set_bytes{namespace="ggbridge", pod=~"ggbridge-client-.*", container!="POD", container!=""} / 1024 / 1024

Getting Support

For technical support, please contact support@gitguardian.com with:

Environment details: Kubernetes version, GGBridge version
Error logs: Include relevant nginx and application logs
Configuration: Sanitized values.yaml or docker-compose.yaml
Test results: Output from the connectivity tests above
Network setup: Information about firewalls, proxies, DNS configuration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Troubleshooting

Debug image

Basic checks

Connectivity Tests

1. Client-Side Healthcheck

2. SOCKS Proxy Test (Server-Side)

3. Git Repository Test (Server-Side)

4. Reverse tunneling

Log Analysis

1. Client/Server Healthcheck Logs

2. Client/Server tunnel Logs

3. Server-Side Proxy Logs

Tunnel Disruption Analysis (Server-Side)

Disruption sequence

Tunnel disconnection indicators

Error messages during disruption

Reverse tunnel listener shutdown

Recovery indicators

Observability queries

Client Monitoring/Alerting Guidelines

Overview

Replica count

Pod Status and Readiness

Container Logs Analysis

Resource Usage

Getting Support

FilesExpand file tree

troubleshoot.md

Latest commit

History

troubleshoot.md

File metadata and controls

Troubleshooting

Debug image

Basic checks

Connectivity Tests

1. Client-Side Healthcheck

2. SOCKS Proxy Test (Server-Side)

3. Git Repository Test (Server-Side)

4. Reverse tunneling

Log Analysis

1. Client/Server Healthcheck Logs

2. Client/Server tunnel Logs

3. Server-Side Proxy Logs

Tunnel Disruption Analysis (Server-Side)

Disruption sequence

Tunnel disconnection indicators

Error messages during disruption

Reverse tunnel listener shutdown

Recovery indicators

Observability queries

Client Monitoring/Alerting Guidelines

Overview

Replica count

Pod Status and Readiness

Container Logs Analysis

Resource Usage

Getting Support