Control Your Services With OTEL, Jaeger, and Prometheus

Let’s discuss an important question: how do we monitor our services if something goes wrong?

On the one hand, we have Prometheus with alerts and Kibana for dashboards and other helpful features. We also know how to gather logs — the ELK stack is our go-to solution. However, simple logging isn’t always enough: it doesn’t provide a holistic view of a request’s journey across the entire ecosystem of components.

You can find more info about ELK here.

But what if we want to visualize requests? What if we need to correlate requests traveling between systems? This applies to both microservices and monoliths — it doesn’t matter how many services we have; what matters is how we manage their latency.

Indeed, each user request might pass through a whole chain of independent services, databases, message queues, and external APIs.

In such a complex environment, it becomes extremely difficult to pinpoint exactly where delays occur, identify which part of the chain acts as a performance bottleneck, and quickly find the root cause of failures when they happen.

To address these challenges effectively, we need a centralized, consistent system to collect telemetry data — traces, metrics, and logs. This is where OpenTelemetry and Jaeger come to the rescue.

Let’s Look at the Basics

There are two main terms we have to understand:

Trace ID

A Trace ID is a 16-byte identifier, often represented as a 32-character hexadecimal string. It’s automatically generated at the start of a trace and stays the same across all spans created by a particular request. This makes it easy to see how a request travels through different services or components in a system.

Span ID

Every individual operation within a trace gets its own Span ID, which is typically a randomly generated 64-bit value. Spans share the same Trace ID, but each one has a unique Span ID, so you can pinpoint exactly which part of the workflow each span represents (like a database query or a call to another microservice).

How Are They Related?

Trace ID and Span ID complement each other. 

When a request is initiated, a Trace ID is generated and passed to all involved services. Each service, in turn, creates a span with a unique Span ID linked to the Trace ID, enabling you to visualize the full lifecycle of the request from start to finish.

Okay, so why not just use Jaeger? Why do we need OpenTelemetry (OTEL) and all its specifications? That’s a great question! Let’s break it down step by step.

Find more about Jaeger here.

TL;DR

  • Jaeger is a system for storing and visualizing distributed traces. It collects, stores, searches, and displays data showing how requests “travel” through your services.
  • OpenTelemetry (OTEL) is a standard (and a set of libraries) for collecting telemetry data (traces, metrics, logs) from your applications and infrastructure. It isn’t tied to any single visualization tool or backend.

Put simply:

  • OTEL is like a “universal language” and set of libraries for telemetry collection.
  • Jaeger is a backend and UI for viewing and analyzing distributed traces.

Why Do We Need OTEL if We Already Have Jaeger?

1. A Single Standard for Collection

In the past, there were projects like OpenTracing and OpenCensus. OpenTelemetry unifies these approaches to collecting metrics and traces into one universal standard.

2. Easy Integration

You write your code in Go (or another language), add OTEL libraries for auto-injecting interceptors and spans, and that’s it. Afterward, it doesn’t matter where you want to send that data—Jaeger, Tempo, Zipkin, Datadog, a custom backend—OpenTelemetry takes care of the plumbing. You just swap out the exporter.

3. Not Just Traces

OpenTelemetry covers traces, but it also handles metrics and logs. You end up with a single toolset for all your telemetry needs, not just tracing.

4. Jaeger as a Backend

Jaeger is an excellent choice if you’re primarily interested in distributed tracing visualization. But it doesn’t provide the cross-language instrumentation by default. OpenTelemetry, on the other hand, gives you a standardized way to collect data, and then you decide where to send it (including Jaeger).

In practice, they often work together:

Your application uses OpenTelemetry → communicates via OTLP protocol → goes to the OpenTelemetry Collector (HTTP or grpc) → exports to Jaeger for visualization.


Tech Part

System Design (A Little Bit)

Let’s quickly sketch out a couple of services that will do the following:

  1. Purchase Service – processes a payment and records it in MongoDB
  2. CDC with Debezium – listens for changes in the MongoDB table and sends them to Kafka
  3. Purchase Processor – consumes the message from Kafka and calls the Auth Service to look up the user_id for validation
  4. Auth Service – a simple user service

In summary:

  • 3 Go services
  • Kafka
  • CDC (Debezium)
  • MongoDB

Code Part

Let’s start with the infrastructure. To tie everything together into one system, we’ll create a large Docker Compose file. We’ll begin by setting up telemetry.

Note: All the code is available via a link at the end of the article, including the infrastructure.

YAML

 

We’ll also configure the collector — the component that gathers telemetry.

Here, we choose gRPC for data transfer, which means communication will happen over HTTP/2:

YAML

 

Make sure to adjust any addresses as needed, and you’re done with the base configuration.

We already know OpenTelemetry (OTEL) uses two key concepts — Trace ID and Span ID — that help track and monitor requests in distributed systems.

Implementing the Code

Now, let’s look at how to get this working in your Go code. We need the following imports:

Go

 

Then, we add a function to initialize our tracer in main() when the application starts:

Go

 

With tracing set up, we just need to place spans in the code to track calls. For example, if we want to measure database calls (since that’s usually the first place we look for performance issues), we can write something like this:

Go

 

We have tracing at the service layer — great! But we can go even deeper, instrumenting the database layer:

Go

 

Now, we have a complete view of the request journey. Head to the Jaeger UI, query for the last 20 traces under auth-service, and you’ll see all the spans and how they connect in one place.

A view of all the spans

Now, everything is visible. If you need it, you can include the entire query in the tags. However, keep in mind that you shouldn’t overload your telemetry — add data deliberately. I’m simply demonstrating what’s possible, but including the full query, this way isn’t something I’d generally recommend.

gRPC client-server

gRPC client-server

If you want to see a trace that spans two gRPC services, it’s quite straightforward. All you need is to add the out-of-the-box interceptors from the library. For example, on the server side:

Go

 

On the client side, the code is just as short:

Go

 

That’s it! Ensure your exporters are configured correctly, and you’ll see a single Trace ID logged across these services when the client calls the server.

Handling CDC Events and Tracing

Want to handle events from the CDC as well? One simple approach is to embed the Trace ID in the object that MongoDB stores. That way, when Debezium captures the change and sends it to Kafka, the Trace ID is already part of the record.

For instance, if you’re using MongoDB, you can do something like this:

Go

 

Debezium captures the change and sends it to Kafka

Debezium then picks up this object (including trace_id) and sends it to Kafka. On the consumer side, you simply parse the incoming message, extract the trace_id, and merge it into your tracing context:

Go

 

Go

 

Alternative: Using Kafka Headers

Sometimes, it’s easier to store the Trace ID in Kafka headers rather than in the payload itself. For CDC workflows, this might not be available out of the box — Debezium can limit what’s added to headers. But if you control the producer side (or if you’re using a standard Kafka producer), you can do something like this with Sarama:

Injecting a Trace ID into Headers

Go

 

Extracting a Trace ID on the Consumer Side

Go

 

Depending on your use case and how your CDC pipeline is set up, you can choose the approach that works best:

  1. Embed the Trace ID in the database record so it flows naturally via CDC.
  2. Use Kafka headers if you have more control over the producer side or you want to avoid inflating the message payload.

Either way, you can keep your traces consistent across multiple services—even when events are asynchronously processed via Kafka and Debezium.

Conclusion

Using OpenTelemetry and Jaeger provides detailed request traces, helping you pinpoint where and why delays occur in distributed systems.

Adding Prometheus completes the picture with metrics — key indicators of performance and stability. Together, these tools form a comprehensive observability stack, enabling faster issue detection and resolution, performance optimization, and overall system reliability.

I can say that this approach significantly speeds up troubleshooting in a microservices environment and is one of the first things we implement in our projects.

Thank you

Links

Source:
https://dzone.com/articles/control-services-otel-jaeger-prometheus