The last post in our Profiling in Production series we covered instrumentation and sampling profilers. We talked about “resources of interest” but which ones are the most useful for diagnosing performance bottlenecks? The most commonly used resource is time. We’ve been deliberately vague about the concept of timing so far because when it comes to computers there’s actually two types of timing.
The first is CPU Time. This is the time that your code actually spends executing code on the CPU. The other type of time that we’re interested in measuring is Wallclock time. Wallclock time is the total time between starting an operation and it completing.
The way that we like to think of the difference between the two types of profiling is that it's a bit like buying a coffee. If you go to any good coffee shop, or a Starbucks at lunchtime, then you'll find they tend to be pretty busy with a queue forming at the bar. In order to drink the delightful caffeinated beverage you desire you need to both get through the queue to be served and also have the barista make your coffee. CPU Time is the time that the barista spends making coffee - it's the actual time spent performing work. The Wallclock time is the total time it takes for you to get a coffee including waiting to be served.
Now you maybe wondering why CPU Time and Wallclock time aren’t the same number: after all your Java process doesn't have to wait in line at Starbucks. Code may not be running on a CPU for a variety of reasons. For example, it might be blocked on a lock, not scheduled on CPU due to too many threads running or have gone to sleep.
So when should you use the different types of time?
CPU Time is more useful for problems where there are few threads showing high CPU usage, as many event-loop style applications do. You may see plenty of time being spent in other threads that are waiting around performing background work but aren't actually limiting the throughput of your application. So CPU Time is useful when you know you’re bottlenecked on processing capacity. This is often the case in applications where significant processing needs to happen for each request or event. Machine learning systems are a common example.
Wallclock Time is useful for problems where CPU utilisation seems low but application throughput or latency doesn't meet expectations. This is because threads are probably blocked on something, for example multiple threads contenting on a lock, times when you're waiting for downstream network IO to happen or running with too many threads so some of them don't get scheduled. The trigger for looking at Wallclock time would be low CPU usage during your period of performance issues. This tends to happen in modern microservices deployments when services spend time blocked on other services - which limits overall throughput.
In summary, so far in this series we’ve learnt about the relative merits of instrumentation and sampling profilers and the most common timing types and where they should be used. In the next post we’re going to cover how sampling Java profilers specifically work and how to pick one that’s suitable for production.