Config Management Camp 2023 Ghent

From Monitoring to Observability: eBPF Chaos
2023-02-07, 15:55–16:45, B.1.015

Collecting Observability data with eBPF aims to help Dev, Ops, and SREs to debug and troubleshoot incidents. Data requires storage, visualization, and verification: Do the Service Level Objectives (SLOs) match, dashboards visualize useful data correlation, network service maps make sense, and what about security policies?

Simulating a production incident is challenging. Chaos engineering enables teams to break things in a controlled environment and verify alerts, SLOs, and data accuracy. Which data retention cycle is best, and which dashboards reduce the mean-time-to-response? Anomaly detection and forecasting would be great too.

This talk dives into the learning steps with eBPF and discusses traditional metrics monitoring and future Observability data collection, storage and visualization. Learn from hands-on examples with chaos experiments that attempt to break eBPF probes, data collection, and policies in unexpected ways … and bring new perspectives into cloud-native reliability.

Getting started with new cloud-native technologies can be overwhelming. This talk dives into the getting started experiences with eBPF from a developer experience POV, switching sides into Ops and reliability use cases.

Established workflows with chaos engineering meet the Kernel with eBPF - it is crucial to understand the possibilities and risks next to the presented advantages.

The stories tell potential traps and successful learning paths to inspire the community to do the same. The feedback and conversations during and after the talk will help improve the project documentation and use case references. The research on integrating different technologies (SLOs, eBPF, Chaos Engineering, dashboards) and how they fit into the big (cloud-native) picture can help with the foundation for future ideas and new projects in the cloud-native ecosystem.

Everything shown and developed in this talk is available as Open Source, ensuring everyone can contribute.

Michael Friedrich is a Senior Developer Evangelist at GitLab, focussing on Observability, SRE, and Ops. He loves to help educate everyone and regularly speaks at events and meetups. Michael co-founded the #EveryoneCanContribute cafe meetup group to learn cloud-native & DevOps. Michael is a Polynaut advisor at Polywork, created as a learning platform for Observability, and shares insights in the newsletter.