|
| 1 | +# Error logging |
| 2 | + |
| 3 | +This document explains how Open Census SDK logs information about it's own |
| 4 | +execution. |
| 5 | + |
| 6 | +There are the following scenarios for SDK manageability: |
| 7 | + |
| 8 | +1. Send error & warning logs to the back-end for customer self-troubleshooting. |
| 9 | +2. Visualize OC SDK health in external tools. |
| 10 | +3. Visualize OC SDK health in Z-Pages. |
| 11 | +4. Show errors/warnings/information in Visual Studio F5 debug window. |
| 12 | +5. Testing – no debugger troubleshooting. |
| 13 | +6. Customer support – collect verbose logs. |
| 14 | + |
| 15 | +## Definition of verbosity levels |
| 16 | + |
| 17 | +The following severity levels are defined for SDK logs. |
| 18 | + |
| 19 | +### Severity `Error` |
| 20 | + |
| 21 | +Problem in SDK operation resulted in data loss or inability to collect data. |
| 22 | + |
| 23 | +### Severity `Warning` |
| 24 | + |
| 25 | +Problem in SDK operation that MAY result in data loss if not attended to. |
| 26 | +`Warning` level may also identify data quality problem. |
| 27 | + |
| 28 | +### Severity `Informational` |
| 29 | + |
| 30 | +Major, most often rarely happening operation completion. |
| 31 | + |
| 32 | +### Severity `Verbose` |
| 33 | + |
| 34 | +All other logs. Typically used for troubleshooting of a hard to reproduce |
| 35 | +issues or issues happening in specific production environments. |
| 36 | + |
| 37 | +## Logging with EventSource |
| 38 | + |
| 39 | +1. Find or create an assembly-specific `internal` class inherited from |
| 40 | + `EventSource`. |
| 41 | +2. Prefix the name of EventSource with `OpenCensus-` using class attribute like |
| 42 | + this: `[EventSource(Name = "OpenCensus-Base")]`. |
| 43 | +3. Create a new `Event` method with the arguments that needs to be logged. Each |
| 44 | + event should have index, message and event severity (level). It is a good |
| 45 | + practice to include event severity (level) into the method name. |
| 46 | +4. Use the following rules to pick event index: |
| 47 | + 1. Do not reorder existing event method indexes. Otherwise versioning of |
| 48 | + logs metadata will not work well. |
| 49 | + 2. Do not put large gaps between indices. E.g. use sequential indices |
| 50 | + instead of events categorization based on index (`1X` for one category, |
| 51 | + `2X` for another). Unassigned indices in `1X` category will affect |
| 52 | + logging performance. |
| 53 | +5. Use the following rules to author the event message: |
| 54 | + 1. Make event description actionable and explain the effect of the problem. |
| 55 | + For instance, instead of *"No span in current context"* use something |
| 56 | + like *"No span in current context. Span name will not be updated. It may |
| 57 | + indicate incorrect usage of Open Census API - please ensure span wasn't |
| 58 | + overridden explicitly in your code or by other module."* |
| 59 | +6. Use the following definition of the severity from the next section. |
| 60 | +7. Follow the performance optimization techniques. |
| 61 | + |
| 62 | +## Minimizing logging performance impact |
| 63 | + |
| 64 | +### Pass object references |
| 65 | + |
| 66 | +EventSource requires to use primitive types like `int` or `string` in `Write` |
| 67 | +method. This limitation requires to format complex types like `Exception` before |
| 68 | +calling trace statement. |
| 69 | + |
| 70 | +Since formatting happens before calling `Write` method it will be called |
| 71 | +unconditionally – whether listener enabled or not. To minimize performance hit |
| 72 | +create `NonEvent` methods in EventSource that accept complex types and check |
| 73 | +`Log.IsEnabled` before serializing those and passing to `Event` methods. |
| 74 | + |
| 75 | +### Diagnostics events throttling |
| 76 | + |
| 77 | +Throttling is required for the following scenarios: |
| 78 | + |
| 79 | +- Minimize traffic we use to report problems to portal |
| 80 | +- Make sure *.etl are not overloaded with similar errors |
| 81 | + |
| 82 | +Logs subscribers will implement throttling logic. However log producer may have |
| 83 | +an additional logic to prevent excessive logging. For instance, if problem |
| 84 | +cannot be resolved in runtime - producer of the `Error` log may decide to only |
| 85 | +log it once or once in a while. Note, this technique should be used carefully |
| 86 | +as not every log subscriber can be enabled from the process start and may miss |
| 87 | +this important error message. |
| 88 | + |
| 89 | +## Subscribing to EventSource |
| 90 | + |
| 91 | +EventSource allows us to separate logic of tracing and delivering those traces |
| 92 | +to different channels. Default ETW subscriber works out of the box. For all |
| 93 | +other channels in-process subscribers can be used for data delivery. |
| 94 | + |
| 95 | + |
| 96 | + |
| 97 | +## EventSource vs. using SDK itself |
| 98 | + |
| 99 | +1. No support for `IsEnabled` when exporter/listener exists. It's important for |
| 100 | + verbose logging. |
| 101 | +2. ETW channel is not supported. |
| 102 | +3. In-process subscription/extensibility is not supported. |
| 103 | +4. Logging should be more reliable then SDK itself. |
0 commit comments