In today’s dynamic landscape, Distributed Tracing has emerged as an indispensable practice.
It helps to understand what is under the hood of distributed transactions, providing answers to pivotal questions: What comprises these diverse requests? What contextual information accompanies them? How extensive is their duration?
This toolkit seamlessly aligns with APIs and synchronous transactions, catering to a broad spectrum of scenarios.
However, what about asynchronous transactions?
The necessity for clarity becomes even more pronounced in such cases.
Particularly in architectures built upon messaging or event streaming brokers, attaining a holistic view of the entire transaction becomes arduous.
Why does this challenge arise?
It’s a consequence of functional transactions fragmenting into two loosely coupled subprocesses:
Hopefully you can rope OpenTelemetry in it to shed light.
What about the main concepts of Distributed Tracing?
I will explain in this article how to set up and plug OpenTelementry to gather asynchronous transaction traces using Apache Camel and Artemis.
The first part will use Jaeger and the second one, Tempo and Grafana to be more production ready.
All the code snippets are part of this project on GitHub.
(Normally) you can use and run it locally on your desktop.
For those who are impatient, here are a short explanation of this configuration file:
Where to pull data?
Where to store data?
What to do with it?
What are the workloads to activate?
2.3 What about the code?
The configuration to apply is pretty simple and straightforward.
To cut long story short, you need to include libraries, add some configuration lines and run your application with an agent which will be responsible for broadcasting the SPANs.
2.3.1 Libraries to add
For an Apache Camel based Java application, you need to add this starter first:
For instance, by default, the OpenTelemetry Collector default endpoint value is http://localhost:4317.
You can alter it by setting the OTEL_EXPORTER_OTLP_METRICS_ENDPOINT environment variable or the otel.exporter.otlp.metrics.endpoint java system variable (e.g., using -Dotel.exporter.otlp.metrics.endpoint option ).
In my example, we use Maven configuration to download the agent JAR file and run our application with it as an agent.
If you dig into one transaction, you will see the whole transaction:
And now, you can correlate two sub transactions:
3 Tempo & Grafana
This solution is pretty similar to the previous one.
Instead of pushing all the data to Jaeger, we will use Tempo to store data and Grafana to render them.
We don’t need to modify the configuration made in the existing Java applications.
As mentioned above, the architecture is quite the same.
Now, we have the collector which broadcast data to Tempo.
We will then configure Grafana to query to it to get traces.
3.2 Collector configuration
The modification of the Collector is easy (for this example).
We only have to specify the tempo URL.
server:http_listen_port:3200distributor:receivers:# this configuration will listen on all ports and protocols that tempo is capable of.jaeger:# the receives all come from the OpenTelemetry collector. more configuration information canprotocols: # be found there:https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiverthrift_http:#grpc:# for a production deployment you should only enable the receivers you need!thrift_binary:thrift_compact:zipkin:otlp:protocols:http:grpc:opencensus:ingester:max_block_duration:5m # cut the headblock when this much time passes. this is being set for demo purposes and should probably be left alone normallycompactor:compaction:block_retention:1h # overall Tempo trace retention. set for demo purposesmetrics_generator:registry:external_labels:source:tempocluster:docker-composestorage:path:/tmp/tempo/generator/walremote_write:- url:http://prometheus:9090/api/v1/writesend_exemplars:truestorage:trace:backend:local # backend configuration to usewal:path:/tmp/tempo/wal # where to store the wal locallylocal:path:/tmp/tempo/blocksoverrides:metrics_generator_processors:[service-graphs, span-metrics]# enables metrics generatorsearch_enabled:true
3.4 Grafana configuration
Now we must configure Grafana to enable querying into our tempo instance.
The configuration is made here using a configuration file provided during the startup
The datasource file:
apiVersion:1datasources:# Prometheus backend where metrics are sent- name:Prometheustype:prometheusuid:prometheusurl:http://prometheus:9090jsonData:httpMethod:GETversion:1- name:Tempotype:tempouid:tempourl:http://tempo:3200jsonData:httpMethod:GETserviceMap:datasourceUid:'prometheus'version:1
As we have done before, we must start the infrastructure using Docker Compose:
docker compose -f docker-compose-grafana.yml up
Then, using the same rocket scientist maven commands, we can run the same commands and browse now Grafana (http://localhost:3000) to see our traces:
We saw how to highlight asynchronous transactions and correlate them through OpenTelemetry and Jaeger or using Tempo & Grafana.
It was voluntarily simple.
I then exposed how to enable this feature on Apache Camel applications.
It can be easily reproduced with several stacks.
Last but not least, which solution is the best?
I have not made any benchmark of Distributed Tracing solutions.
However, for a real life production setup, I would dive into Grafana and Tempo and check their features.
I am particularly interested in mixing logs, traces to orchestrate efficient alerting mechanisms.