Dissecting the OpenTelemetry Collector: An Overview

November 4, 2021

Dissecting the OpenTelemetry Collector: An Overview

The OpenTelemetry Collector is the central data collection mechanism for the OpenTelemetry project. We’re going to focus on different angles in subsequent articles, but for now let’s look at it more generally.

Deployment

With single agent we refer to a scenario where the observability vendor provides a single agent that customers can deploy to their systems, that then acts as a data collection mechanism. In case of Instana, this would be the Instana Agent. You would deploy it to your systems via one of the supported mechanisms and basically leave it at that.

The OpenTelemetry Collector supports this scenario, and the project describes it as follows: “A Collector instance running with the application or on the same host as the application (e.g. binary, sidecar, or daemonset)”

Components & Pipelines

The OpenTelemetry collector architecture
The OpenTelemetry (Otel) collector architecture – source: https://opentelemetry.io/docs/collector/

 

OpenTelemetry recognizes four signal types. Three are explicit: spans, metrics, logs. The fourth is resource description. All three explicit signals are processed in a pipeline. That gives an opportunity to individually post-process and export them to the desired target. If you were to implement a fully open source-based approach to observability, you might be using Prometheus for metrics, Jaeger for traces and some ELK stack for your logging needs. These tools all expect their signals in a dedicated format.

The collector nicely separates those signals into “pipelines,” which helps in tailoring the inputs for the desired output. An example definition for pipelines might look like this:

service:
pipelines:
metrics:
receivers: [opencensus, prometheus]
exporters: [opencensus, prometheus]
traces:
receivers: [jaeger]
processors: [batch]
exporters: [zipkin]

This defines two pipelines: metrics and traces, where the the opencensus-receiver is essentially opening an HTTP endpoint to receive OpenCensus data and the prometheus-receiver can be used for scraping. The trace pipeline applies the OpenCensus receiver as well as the jaeger-receiver for opening a Jaeger-compatible endpoint on the Collector.

The trace pipeline is applying some batching logic through the batch-processor, where the Collector will collect many individual signals in batches before they are dispatched to the exporter so we don’t hammer the signal receivers too much.

After processing, exporters are applied – OpenCensus and Prometheus exporters forward metrics to compatible remote endpoints. Trace data is forwarded to a remote ZipKin endpoint.

On a higher level, the components involved with this handling signals are the following:

  • receivers: receive data from other sources. We can consider them as inputs to the collectors. The only core receiver is the OTLP receiver that can ingest the OpenTelemetry Line Protocol (OTLP).
  • extensions: can add additional functionality to the collector executable such as health checks
  • processors: work on telemetry data pipelines, for example controlling batching, attribute additions, conversions necessary, etc
  • exporters: take the data and make it available to outside consumers, for example an observability platform that can make the data useful by aggregating it and providing insights

It is important to note that the core OpenTelemetry Collector only ships OTLP receivers and exporters, so the project can concentrate on being compliant with OTLP and delegate other protocols to the community.

The core collector distribution is then bundling the opentelemetry-collector-contrib plugins, which are extending the vanilla collector with more vendor-specific exporters, processors, and other components.

Why is the Collector important for Instana?

OpenTelemetry is an amazing project, and it’s great to see the community of observability vendors and developers coming together and further evolving the data collection process that we all tackle individually.

Instana’s high-granularity data model is currently bound to our Instana Host Agent and the in-process collectors we provide. We opened our Agent for ingress of locally produced OTLP data for tracing a while ago, and we are currently analyzing where we can provide the most value for our customers going forward as OpenTelemetry continues to gain traction.

As the OpenTelemetry Collector is the central piece or collection, it is a great target for us to make a dent in the universe. It provides good mechanisms to enrich the telemetry signals with the data points we need and for transforming it to our own data model. The challenge, from a vendor perspective, is to provide the right balance between open ingress and precise output from the collector.

Stay tuned as we continue to work with and support the OpenTelemetry project. For more on Instana’s support of OpenTelemetry, here’s some recreational reading material:

 

Play with Instana’s APM Observability Sandbox

Developer, Product, Thought Leadership
In my last blog post, I talked about the history and rapid adoption of the OpenTelemetry project. Today we discuss the progress made in terms of end-to-end compatibility between distributed tracing with...
|
Developer, Thought Leadership
In the past few years, distributed tracing has emerged in the global DevOps consciousness as an indispensable tool in the microservices arsenal. In April 2019, the open source observability community rose to...
|
Product
Running a complex system landscape comes with its own set of challenges. With microservices architectures this problem becomes more prevalent though. To understanding the behavior of a specific technology, like a database,...
|

Start your FREE TRIAL today!

Instana, an IBM company, provides an Enterprise Observability Platform with automated application monitoring capabilities to businesses operating complex, modern, cloud-native applications no matter where they reside – on-premises or in public and private clouds, including mobile devices or IBM Z.

Control hybrid modern applications with Instana’s AI-powered discovery of deep contextual dependencies inside hybrid applications. Instana also gives visibility into development pipelines to help enable closed-loop DevOps automation.

This provides actionable feedback needed for clients as they to optimize application performance, enable innovation and mitigate risk, helping Dev+Ops add value and efficiency to software delivery pipelines while meeting their service and business level objectives.

For further information, please visit instana.com.