May 26, 2022

WhyProfiler: The World's First Hybrid Profiler, Now for Jupyter Notebook and Python

This post is by Natan Yellin, the founder of Robusta.dev - an open source Kubernetes observability platform. Natan is also a long-time Semgrep fan.

I built a CPU profiler for Python and Jupyter notebook that not only identifies hotspots but can suggest faster alternatives. It is powered by Semgrep, a powerful and easy to use static analysis tool. To the best of my knowledge, it is the world’s first hybrid CPU profiler that combines dynamic profiling with general purpose static analysis. It is also the only Python profiler that both identifies hotspots and recommends equivalent code that can fix them.

I’ll explain why I wrote it and how it works. But first, a demo:

An Introduction to Hybrid Profiling

A traditional dynamic profiler can identify hotspots in your code. It is then up to you to fix them.

On the other hand, a static analysis tool can identify potential optimizations but not prioritize them. (Unlike security use-cases where prioritizing static analysis output is easier.) There can be thousands or even millions of ways to optimize a codebase but many of them can have little impact on the actual runtime performance.

By combining dynamic profiling and static analysis, we can get the best of both worlds: Targeted suggestions on how to fix your code in the places where it matters.

How it works

  1. Dynamic Profiling: You run your code - in this case, Python cells inside a Jupyter notebook. In the background, a profiler (here, yappi) records information on what your code is doing and when. It builds a heatmap of the most CPU intensive lines and colors them accordingly.
  2. Static Analysis: Your code finishes running. Using Semgrep, your code is now analyzed with a database of performance-related rules. Each rule contains a pattern to look for and a way to rewrite matching code to be more performant.
  3. Hybrid Recommendations: The output from the profiler and Semgrep are now compared. Suggested rewrites from Semgrep are thrown away if they don’t match a hotspot in the code. The user is presented with potential fixes that matter most.

Getting Started

WhyProfiler is available on GitHub. Try it out!

Questions? Comments?

Yes, please. I'm on LinkedIn and Twitter.

If you use Kubernetes and Prometheus, see our other open source projects.

Never miss a blog post.