Contents

7 Days of OpenTelemetry: Day 7 - Visualization and Analysis

   Jun 1, 2025     12 min read

Day 7: Visualization and Analysis

Welcome to the final day of our “7 Days of OpenTelemetry” challenge! Over the past six days, we’ve built a solid foundation for observability with OpenTelemetry. We’ve covered the fundamentals, set up the OpenTelemetry Collector, implemented both manual and automatic instrumentation, and explored context propagation and logs correlation.

Today, we’ll complete our journey by connecting our telemetry data to visualization tools and learning how to analyze it effectively. This is where all our hard work pays off, as we transform raw telemetry data into actionable insights.

The Value of Visualization

While the debug output from our Collector has been useful for learning and testing, a proper visualization tool provides:

  1. Interactive Exploration: Drill down into traces and spans
  2. Search and Filter: Find specific traces based on various criteria
  3. Performance Analysis: Identify bottlenecks and slow operations
  4. Error Detection: Quickly spot and diagnose errors
  5. Dependency Mapping: Understand service relationships
  6. Alerting: Set up alerts for performance issues or errors

Let’s explore how to connect our OpenTelemetry data to visualization tools and how to derive insights from it.

Overview of Visualization Options

There are several options for visualizing OpenTelemetry data:

Open Source Options

  1. Jaeger: A popular distributed tracing system with a powerful UI
  2. Zipkin: Another distributed tracing system with a focus on simplicity
  3. Grafana Tempo: A high-scale, minimal-dependency distributed tracing backend
  4. SigNoz: An open-source alternative to DataDog, NewRelic, etc.

Commercial Options

  1. Datadog: A comprehensive monitoring and analytics platform
  2. New Relic: An observability platform with APM, infrastructure monitoring, etc.
  3. Honeycomb: A platform designed for high-cardinality observability
  4. Lightstep: A platform focused on understanding system behavior
  5. Dynatrace: An AI-powered observability platform

For this tutorial, we’ll use Jaeger, which is open source, easy to set up, and provides a good introduction to trace visualization.

Setting Up Jaeger

Let’s set up Jaeger and configure our Collector to send traces to it.

Step 1: Add Jaeger to Docker Compose

Update our docker-compose.yaml file in the otel-collector directory to include Jaeger:

version: '3'
services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otel-collector/config.yaml"]
    volumes:
      - ./config:/etc/otel-collector
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP
    depends_on:
      - jaeger
    restart: unless-stopped

  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"  # Jaeger UI
      - "14250:14250"  # Jaeger gRPC
    environment:
      - COLLECTOR_OTLP_ENABLED=true
    restart: unless-stopped

This adds a Jaeger container to our setup, with the UI accessible on port 16686.

Step 2: Update the Collector Configuration

Now, let’s update our Collector configuration to send traces to Jaeger. Modify the config.yaml file in the otel-collector/config directory:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024

exporters:
  debug:
    verbosity: detailed
  
  logging:
    loglevel: debug
  
  jaeger:
    endpoint: jaeger:14250
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug, logging, jaeger]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug, logging]
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug, logging]

This adds a Jaeger exporter to our traces pipeline, sending traces to the Jaeger container.

Step 3: Restart the Collector

Now, let’s restart our Docker Compose setup:

docker-compose -f otel-collector/docker-compose.yaml down
docker-compose -f otel-collector/docker-compose.yaml up -d

Step 4: Generate Some Traces

Let’s run one of our previous examples to generate some traces. For instance, you could run the context propagation example from Day 6:

cd otel-context
go run cmd/backend/main.go

In another terminal:

cd otel-context
go run cmd/frontend/main.go

And make some requests:

curl "http://localhost:8080/api?id=123&value=test"
curl "http://localhost:8080/api?id=456&value=example"
curl "http://localhost:8080/api?id=789&value=demo"

Step 5: Access the Jaeger UI

Now, open a web browser and navigate to http://localhost:16686. You should see the Jaeger UI with our traces.

Exploring Traces in Jaeger

Let’s explore the Jaeger UI and learn how to analyze traces effectively.

The Search Interface

The main Jaeger UI shows a search interface where you can:

  1. Select a Service: Choose which service’s traces to view
  2. Set a Time Range: Narrow down to a specific time period
  3. Filter by Tags: Search for traces with specific attributes
  4. Limit Results: Control how many traces are returned
  5. Find Traces: Execute the search

Try selecting one of our services (e.g., “otel-context-frontend”) and clicking “Find Traces”. You should see a list of traces for that service.

Trace View

Click on one of the traces to open the trace view. This shows:

  1. Trace Timeline: A visual representation of the spans in the trace
  2. Span Details: Information about each span, including:
    • Service name
    • Operation name
    • Duration
    • Start time
    • Tags (attributes)
    • Logs (events)
    • Process information

The trace timeline is particularly useful for identifying bottlenecks, as it visually shows which operations take the most time.

Span Details

Click on a span to see its details. This includes:

  1. Tags: Key-value pairs that provide context about the span
  2. Logs: Time-stamped events within the span
  3. Process: Information about the process that generated the span

These details help you understand what happened during the span and why it might have taken a long time or resulted in an error.

Analyzing Traces for Performance Issues

Now that we can visualize our traces, let’s learn how to analyze them for performance issues.

Identifying Bottlenecks

Bottlenecks are operations that take a disproportionate amount of time. In the trace timeline, they appear as wide spans. To identify bottlenecks:

  1. Look for spans that take a long time relative to the total trace duration
  2. Check if the bottleneck is in your application code or in a dependency
  3. Look at the span’s tags and logs for clues about why it’s slow

For example, if a database query is taking a long time, you might see a span for the query with a long duration. The span’s tags might include the SQL query, which could help you optimize it.

Analyzing Error Paths

Errors in traces are typically marked with an error tag or status. To analyze error paths:

  1. Look for spans with error tags or status
  2. Check the span’s logs for error messages
  3. Trace the error back to its source
  4. Look at the context in which the error occurred

For example, if a service returns an error, you might see a span with an error status. The span’s logs might include the error message, and you can trace back through parent spans to understand what led to the error.

Comparing Normal and Abnormal Traces

One powerful analysis technique is to compare normal and abnormal traces. For example:

  1. Find a trace for a successful request with normal performance
  2. Find a trace for a slow or failed request
  3. Compare the two traces to identify differences
  4. Look for missing spans, different execution paths, or timing differences

This can help you understand what conditions lead to performance issues or errors.

Advanced Visualization Techniques

Beyond basic trace visualization, there are several advanced techniques that can provide deeper insights:

Service Dependency Graphs

Jaeger can generate service dependency graphs that show how services interact. To access this:

  1. Click on “System Architecture” in the Jaeger UI
  2. Select a time range
  3. View the graph of service dependencies

This helps you understand the architecture of your system and identify potential bottlenecks or single points of failure.

Trace Comparison

Jaeger allows you to compare two traces side by side. To use this feature:

  1. Find two traces you want to compare
  2. Click on the “Compare” button for one trace
  3. Select the second trace to compare
  4. View the traces side by side

This is useful for comparing normal and abnormal traces, or before and after a change.

Trace Statistics

Jaeger provides statistics about traces, such as the distribution of durations. To access this:

  1. Run a search for traces
  2. Look at the histogram of trace durations
  3. Identify patterns or outliers

This helps you understand the overall performance profile of your system.

Connecting to Other Backends

While we’ve used Jaeger for this tutorial, OpenTelemetry’s vendor-neutral approach means you can easily switch to a different backend. Let’s look at how to configure the Collector for some other common backends:

Zipkin

exporters:
  zipkin:
    endpoint: "http://zipkin:9411/api/v2/spans"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [zipkin]

Prometheus (for metrics)

exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheus]

Elasticsearch

exporters:
  elasticsearch:
    endpoints: ["http://elasticsearch:9200"]
    index: "traces"

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [elasticsearch]

This flexibility is one of the key benefits of OpenTelemetry: you can change your backend without changing your instrumentation code.

Best Practices for Trace Analysis

Based on our exploration, here are some best practices for effective trace analysis:

  1. Start with the Big Picture: Look at the overall trace before diving into details
  2. Focus on Outliers: Investigate traces that are unusually slow or result in errors
  3. Compare and Contrast: Compare normal and abnormal traces to identify differences
  4. Look for Patterns: Identify recurring patterns in performance issues or errors
  5. Correlate with Logs: Use trace IDs to find related logs for more context
  6. Monitor Trends: Track performance over time to identify gradual degradation
  7. Set Baselines: Establish performance baselines to detect deviations
  8. Use Multiple Views: Combine trace visualization with metrics and logs for a complete picture

Building a Complete Observability Pipeline

Now that we’ve explored all the components of OpenTelemetry, let’s discuss how to build a complete observability pipeline for production use.

Components of a Production Pipeline

A production-ready observability pipeline typically includes:

  1. Instrumentation: OpenTelemetry SDKs and auto-instrumentation in your applications
  2. Collection: OpenTelemetry Collectors deployed as agents and gateways
  3. Processing: Filtering, sampling, and enrichment of telemetry data
  4. Storage: Backends for storing traces, metrics, and logs
  5. Visualization: Tools for exploring and analyzing telemetry data
  6. Alerting: Notifications for performance issues or errors

Deployment Patterns

There are several common deployment patterns for OpenTelemetry:

  1. Sidecar Pattern: A Collector runs alongside each application as a sidecar container
  2. Agent Pattern: A Collector runs on each host, collecting data from multiple applications
  3. Gateway Pattern: Collectors run as agents, forwarding data to a central gateway
  4. Hybrid Pattern: A combination of the above patterns based on specific needs

The best pattern depends on your infrastructure, scale, and requirements.

Scaling Considerations

As you scale your observability pipeline, consider:

  1. Sampling: Use head-based or tail-based sampling to reduce data volume
  2. Resource Usage: Monitor the resource usage of your Collectors
  3. High Availability: Deploy redundant Collectors for reliability
  4. Load Balancing: Distribute telemetry data across multiple backends
  5. Cost Management: Balance data retention with cost considerations

A Brief Introduction to Metrics Visualization

While our challenge has focused primarily on tracing, OpenTelemetry also supports metrics. Let’s briefly look at how metrics visualization works.

Metrics are typically visualized as time-series graphs, showing how values change over time. Common visualizations include:

  1. Line Charts: Show trends over time
  2. Gauges: Display current values
  3. Histograms: Show the distribution of values
  4. Heatmaps: Visualize high-cardinality data

Tools like Grafana, Prometheus, and Datadog provide powerful metrics visualization capabilities. When combined with traces and logs, metrics provide a complete observability picture.

Next Steps Beyond the Challenge

Congratulations on completing the “7 Days of OpenTelemetry” challenge! You now have a solid foundation in OpenTelemetry and observability. Here are some suggestions for next steps:

  1. Instrument Your Applications: Apply what you’ve learned to your own applications
  2. Explore Advanced Features: Dive deeper into sampling, processors, and other advanced topics
  3. Contribute to OpenTelemetry: Join the community and contribute to the project
  4. Explore Other Signals: Learn more about metrics and logs in OpenTelemetry
  5. Build Custom Components: Develop custom processors, exporters, or instrumentation
  6. Integrate with CI/CD: Automate the deployment of your observability pipeline
  7. Implement SLOs: Use telemetry data to define and monitor Service Level Objectives

Resources for Continued Learning

To continue your OpenTelemetry journey, here are some valuable resources:

  1. Official Documentation: https://opentelemetry.io/docs/
  2. GitHub Repository: https://github.com/open-telemetry
  3. Community Meetings: https://opentelemetry.io/community/
  4. Slack Channel: https://cloud-native.slack.com/archives/C01NPAXACKT
  5. CNCF Landscape: https://landscape.cncf.io/
  6. OpenTelemetry Blog: https://opentelemetry.io/blog/
  7. Jaeger Documentation: https://www.jaegertracing.io/docs/

Conclusion

Over the past seven days, we’ve taken a comprehensive journey through OpenTelemetry, from understanding the basic concepts to implementing a complete observability pipeline. We’ve learned how to:

  1. Understand Observability: Grasp the fundamentals of observability and distributed tracing
  2. Set Up Infrastructure: Configure the OpenTelemetry Collector and visualization tools
  3. Instrument Applications: Implement both manual and automatic instrumentation
  4. Connect Services: Propagate context across service boundaries
  5. Correlate Data: Link traces with logs for a complete picture
  6. Visualize and Analyze: Explore and derive insights from telemetry data

OpenTelemetry provides a powerful, vendor-neutral approach to observability that works across languages, frameworks, and backends. By adopting OpenTelemetry, you gain flexibility, standardization, and a future-proof observability strategy.

Remember, observability is not just about collecting data—it’s about gaining insights that help you understand, troubleshoot, and optimize your systems. With the knowledge and skills you’ve gained in this challenge, you’re well-equipped to implement effective observability in your own applications.

Thank you for joining me on this journey, and happy observing!