Skip to content
This repository was archived by the owner on Dec 18, 2023. It is now read-only.

Commit d2b4b35

Browse files
Event logging - docs and a single event example. (#85)
* documents * first warning was implemented * first version of docs complete
1 parent af9a10e commit d2b4b35

File tree

5 files changed

+209
-2
lines changed

5 files changed

+209
-2
lines changed

docs/error-handling.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Error handling in Open Census C# SDK
2+
3+
Open Census is a library that will in many cases run in a context of customer
4+
app performing non-essential from app business logic perspective operations.
5+
Open Census SDK also can and will often be enabled via platform extensibility
6+
mechanisms and potentially only enabled in runtime. Which makes the use of SDK
7+
non-obvious for the end user and sometimes outside of the end user control.
8+
9+
This makes some unique requirements for Open Census error handling practices.
10+
11+
## Basic error handling principles
12+
13+
Open Census SDK must not throw or leak unhandled or user unhandled exceptions.
14+
15+
1. APIs must not throw or leak unhandled or user unhandled exceptions when the
16+
API is used incorrectly by the developer. Smart defaults should be used so
17+
that the SDK generally works.
18+
2. SDK must not throw or leak unhandled or user unhandled exceptions for
19+
configuration errors.
20+
3. SDK must not throw or leak unhandled or user unhandled exceptions for errors
21+
in their own operations. Examples: telemetry cannot be sent because the
22+
endpoint is down or location information is not available because device
23+
owner has disabled it.
24+
25+
## Guidance
26+
27+
1. In .NET 4.0 and above, catching all exceptions will not catch corrupted
28+
state exceptions (CSEs).
29+
- We want this behavior—don’t catch CSEs
30+
- This allows exceptions like stack overflow, access violation to flow through
31+
- More information: http://msdn.microsoft.com/en-us/magazine/dd419661.aspx
32+
2. Every background operation callback, Task or Thread method should have a
33+
global `try{}catch` statement to ensure reliability of an app.
34+
3. When catching all exceptions in other cases, reduce the scope of the `try` as
35+
much as possible.
36+
4. In general, don't catch, filter, and rethrow
37+
- Catch all exceptions and log error
38+
- If you must rethrow use `throw;` not `throw ex;`. It will ensure
39+
original call stack is preserved.
40+
5. Beware of any call to external callbacks or override-able interface. Expect
41+
them to throw.

docs/error-logging.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# Error logging
2+
3+
This document explains how Open Census SDK logs information about it's own
4+
execution.
5+
6+
There are the following scenarios for SDK manageability:
7+
8+
1. Send error & warning logs to the back-end for customer self-troubleshooting.
9+
2. Visualize OC SDK health in external tools.
10+
3. Visualize OC SDK health in Z-Pages.
11+
4. Show errors/warnings/information in Visual Studio F5 debug window.
12+
5. Testing – no debugger troubleshooting.
13+
6. Customer support – collect verbose logs.
14+
15+
## Definition of verbosity levels
16+
17+
The following severity levels are defined for SDK logs.
18+
19+
### Severity `Error`
20+
21+
Problem in SDK operation resulted in data loss or inability to collect data.
22+
23+
### Severity `Warning`
24+
25+
Problem in SDK operation that MAY result in data loss if not attended to.
26+
`Warning` level may also identify data quality problem.
27+
28+
### Severity `Informational`
29+
30+
Major, most often rarely happening operation completion.
31+
32+
### Severity `Verbose`
33+
34+
All other logs. Typically used for troubleshooting of a hard to reproduce
35+
issues or issues happening in specific production environments.
36+
37+
## Logging with EventSource
38+
39+
1. Find or create an assembly-specific `internal` class inherited from
40+
`EventSource`.
41+
2. Prefix the name of EventSource with `OpenCensus-` using class attribute like
42+
this: `[EventSource(Name = "OpenCensus-Base")]`.
43+
3. Create a new `Event` method with the arguments that needs to be logged. Each
44+
event should have index, message and event severity (level). It is a good
45+
practice to include event severity (level) into the method name.
46+
4. Use the following rules to pick event index:
47+
1. Do not reorder existing event method indexes. Otherwise versioning of
48+
logs metadata will not work well.
49+
2. Do not put large gaps between indices. E.g. use sequential indices
50+
instead of events categorization based on index (`1X` for one category,
51+
`2X` for another). Unassigned indices in `1X` category will affect
52+
logging performance.
53+
5. Use the following rules to author the event message:
54+
1. Make event description actionable and explain the effect of the problem.
55+
For instance, instead of *"No span in current context"* use something
56+
like *"No span in current context. Span name will not be updated. It may
57+
indicate incorrect usage of Open Census API - please ensure span wasn't
58+
overridden explicitly in your code or by other module."*
59+
6. Use the following definition of the severity from the next section.
60+
7. Follow the performance optimization techniques.
61+
62+
## Minimizing logging performance impact
63+
64+
### Pass object references
65+
66+
EventSource requires to use primitive types like `int` or `string` in `Write`
67+
method. This limitation requires to format complex types like `Exception` before
68+
calling trace statement.
69+
70+
Since formatting happens before calling `Write` method it will be called
71+
unconditionally – whether listener enabled or not. To minimize performance hit
72+
create `NonEvent` methods in EventSource that accept complex types and check
73+
`Log.IsEnabled` before serializing those and passing to `Event` methods.
74+
75+
### Diagnostics events throttling
76+
77+
Throttling is required for the following scenarios:
78+
79+
- Minimize traffic we use to report problems to portal
80+
- Make sure *.etl are not overloaded with similar errors
81+
82+
Logs subscribers will implement throttling logic. However log producer may have
83+
an additional logic to prevent excessive logging. For instance, if problem
84+
cannot be resolved in runtime - producer of the `Error` log may decide to only
85+
log it once or once in a while. Note, this technique should be used carefully
86+
as not every log subscriber can be enabled from the process start and may miss
87+
this important error message.
88+
89+
## Subscribing to EventSource
90+
91+
EventSource allows us to separate logic of tracing and delivering those traces
92+
to different channels. Default ETW subscriber works out of the box. For all
93+
other channels in-process subscribers can be used for data delivery.
94+
95+
![event-source-listeners](event-source-listeners.png)
96+
97+
## EventSource vs. using SDK itself
98+
99+
1. No support for `IsEnabled` when exporter/listener exists. It's important for
100+
verbose logging.
101+
2. ETW channel is not supported.
102+
3. In-process subscription/extensibility is not supported.
103+
4. Logging should be more reliable then SDK itself.

docs/event-source-listeners.png

17.8 KB
Loading
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
// <copyright file="OpenCensusEventSource.cs" company="OpenCensus Authors">
2+
// Copyright 2018, OpenCensus Authors
3+
//
4+
// Licensed under the Apache License, Version 2.0 (the "License");
5+
// you may not use this file except in compliance with the License.
6+
// You may obtain a copy of the License at
7+
//
8+
// http://www.apache.org/licenses/LICENSE-2.0
9+
//
10+
// Unless required by applicable law or agreed to in writing, software
11+
// distributed under the License is distributed on an "AS IS" BASIS,
12+
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
// See the License for the specific language governing permissions and
14+
// limitations under the License.
15+
// </copyright>
16+
17+
namespace OpenCensus.Implementation
18+
{
19+
using System;
20+
using System.Diagnostics.Tracing;
21+
using System.Globalization;
22+
using System.Threading;
23+
24+
[EventSource(Name = "OpenCensus-Base")]
25+
internal class OpenCensusEventSource : EventSource
26+
{
27+
public static readonly OpenCensusEventSource Log = new OpenCensusEventSource();
28+
29+
[NonEvent]
30+
public void ExporterThrownExceptionWarning(Exception ex)
31+
{
32+
if (Log.IsEnabled(EventLevel.Warning, EventKeywords.All))
33+
{
34+
this.ExporterThrownExceptionWarning(ToInvariantString(ex));
35+
}
36+
}
37+
38+
[Event(1, Message = "Exporter failed to export items. Exception: {0}", Level = EventLevel.Warning)]
39+
public void ExporterThrownExceptionWarning(string ex)
40+
{
41+
this.WriteEvent(1, ex);
42+
}
43+
44+
/// <summary>
45+
/// Returns a culture-independent string representation of the given <paramref name="exception"/> object,
46+
/// appropriate for diagnostics tracing.
47+
/// </summary>
48+
private static string ToInvariantString(Exception exception)
49+
{
50+
CultureInfo originalUICulture = Thread.CurrentThread.CurrentUICulture;
51+
52+
try
53+
{
54+
Thread.CurrentThread.CurrentUICulture = CultureInfo.InvariantCulture;
55+
return exception.ToString();
56+
}
57+
finally
58+
{
59+
Thread.CurrentThread.CurrentUICulture = originalUICulture;
60+
}
61+
}
62+
}
63+
}

src/OpenCensus/Trace/Export/SpanExporterWorker.cs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ namespace OpenCensus.Trace.Export
2020
using System.Collections.Concurrent;
2121
using System.Collections.Generic;
2222
using OpenCensus.Common;
23+
using OpenCensus.Implementation;
2324

2425
internal class SpanExporterWorker : IDisposable
2526
{
@@ -139,8 +140,7 @@ private void Export(IEnumerable<ISpanData> export)
139140
}
140141
catch (Exception ex)
141142
{
142-
// TODO Log warning
143-
Console.WriteLine(ex);
143+
OpenCensusEventSource.Log.ExporterThrownExceptionWarning(ex);
144144
}
145145
}
146146
}

0 commit comments

Comments
 (0)