wachy

A New Approach to Performance Debugging

Trace arbitrary compiled binaries and functions on Linux, at runtime with 0 modifications.

Low overhead dynamic instrumentation: Wachy uses the magic of eBPF to dynamically instrument binaries with minimal overhead. This also means there is 0 overhead for untraced functions.
Deep code integration: eBPF on its own can be difficult and time-consuming to use. The goal of wachy is to make userspace eBPF tracing 10-100x faster and easier by connecting it back to your source code.
Understand real latencies: Stack sampling profilers only provide part of the picture as they usually show the proportion of active CPU cycles. With wachy, you get accurate function latencies including time spent in common blocking calls like waiting on network, IO or mutexes. It can also gather latency histograms.
Powerful runtime filtering: Add filters for conditions you want to trace. At runtime with no code changes. eBPF truly is magic.

Install

Read Guide

GitHub

What is it good for?

Frequently executed functions: The wachy interface displays average latency of each tracepoint, or the latency histogram of a function. If you have something like an RPC or web server with frequent requests, this works great for understanding latency, down to the level of individual functions.
Interactive debugging with filtering: Wachy maintains a stack of functions being traced, which lends itself well to iterative exploration of nested functions. You can also specify custom filters. Want to only see the latency of function B called from function A where A's first argument is 0? No problem.
Understanding tail latencies: Wachy allows specifying runtime filters to understand program behavior under various conditions. For example, where is the time spent inside a function when it takes longer than 100ms to execute?
Debugging in production: There's often some performance issue which only occurs in production. And yeah, sure you follow all the best practices but sometimes the best way to debug it is just to get in there and examine at what's happening live. eBPF guarantees that any tracing you do is completely safe (I'm looking at you, gdb) with the only side effect being minor tracing overhead. Wachy's TUI is designed with this use case in mind – there's no need to forward ports, all you need is an SSH connection to the machine you want to debug on.
Debugging on arbitrary platforms: Necessary eBPF features are only available on Linux 4.6 or later kernels.
Debugging arbitrary languages: Wachy relies on eBPF uprobes and debugging symbols, which only work for compiled languages. C++ symbol demangling for displaying human-readable function names is also supported.
Debugging extremely latency-sensitive code: While eBPF overhead is fairly low, there is some overhead – in my measurements, about 3μs per traced function call. For functions that take less time than that and are frequently called, this may be unacceptable and wachy's precision will not be good enough.