|
| 1 | +# OpenCensus Agent Proto |
| 2 | + |
| 3 | +This package describes the OpenCensus Agent protocol. |
| 4 | + |
| 5 | +## Architecture Overview |
| 6 | + |
| 7 | +TODO(songya): move this section to the README under agent/service repo. |
| 8 | + |
| 9 | +On a typical VM/container, there are user applications running in some processes/pods with |
| 10 | +OpenCensus Library (Library). Previously, Library did all the recording, collecting, sampling and |
| 11 | +aggregation on spans/stats/metrics, and exported them to other persistent storage backends via the |
| 12 | +Library exporters, or displayed them on local zpages. This pattern has several drawbacks, for |
| 13 | +example: |
| 14 | + |
| 15 | +1. For each OpenCensus Library, exporters/zpages need to be re-implemented in native languages. |
| 16 | +2. In some programming languages (e.g Ruby, PHP), it is difficult to do the stats aggregation in |
| 17 | +process. |
| 18 | +3. To enable exporting OpenCensus spans/stats/metrics, application users need to manually add |
| 19 | +library exporters and redeploy their binaries. This is especially difficult when there’s already |
| 20 | +an incident and users want to use OpenCensus to investigate what’s going on right away. |
| 21 | +4. Application users need to take the responsibility in configuring and initializing exporters. |
| 22 | +This is error-prone (e.g they may not set up the correct credentials\monitored resources), and |
| 23 | +users may be reluctant to “pollute” their code with OpenCensus. |
| 24 | + |
| 25 | +To resolve the issues above, we are introducing OpenCensus Agent (Agent). Agent runs as a daemon |
| 26 | +in the VM/container and can be deployed independent of Library. Once Agent is deployed and |
| 27 | +running, it should be able to retrieve spans/stats/metrics from Library, export them to other |
| 28 | +backends. We MAY also give Agent the ability to push configurations (e.g sampling probability) to |
| 29 | +Library. For those languages that cannot do stats aggregation in process, they should also be |
| 30 | +able to send raw measurements and have Agent do the aggregation. In addition, Agent can be |
| 31 | +extended to accept spans/stats/metrics from other tracing/monitoring libraries, such as Zipkin, |
| 32 | +Prometheus, etc. |
| 33 | + |
| 34 | + |
| 35 | + |
| 36 | +To support Agent, Library should have “agent exporters”, similar to the existing exporters to |
| 37 | +other backends. There should be 3 separate agent exporters for tracing/stats/metrics |
| 38 | +respectively. Agent exporters will be responsible for sending spans/stats/metrics and (possibly) |
| 39 | +receiving configuration updates from Agent. |
| 40 | + |
| 41 | +Communication between Library and Agent should user a bi-directional gRPC stream. Library should |
| 42 | +initiate the connection, since there’s only one dedicated port for Agent, while there could be |
| 43 | +multiple processes with Library running. |
| 44 | + |
| 45 | +## Protocol Workflow |
| 46 | + |
| 47 | +1. Library will try to directly establish connections for Config and Export streams. |
| 48 | +2. As the first message in each stream, Library must sent its identifier. Each identifier should |
| 49 | +uniquely identify Library within the VM/container. Identifier is no longer needed once the streams |
| 50 | +are established. |
| 51 | +3. If streams were disconnected and retries failed, the Library identifier would be considered |
| 52 | +expired on Agent side. Library needs to start a new connection with a unique identifier |
| 53 | +(MAY be different than the previous one). |
| 54 | + |
| 55 | +## Packages |
| 56 | + |
| 57 | +1. `common` package contains the common messages shared between different services, such as |
| 58 | +`Node`, `Service` and `Library` identifiers. |
| 59 | +2. `trace` package contains the Trace Service protos. |
| 60 | +3. (Coming soon) `stats` package contains the Stats Service protos. |
| 61 | +4. (Coming soon) `metrics` package contains the Metrics Service protos. |
0 commit comments