I built a CPU profiler for Python and Jupyter notebook that not only identifies hotspots but can suggest faster alternatives. It is powered by Semgrep, a powerful and easy to use static analysis tool. To the best of my knowledge, it is the world’s first hybrid CPU profiler that combines dynamic profiling with general purpose static analysis. It is also the only Python profiler that both identifies hotspots and recommends equivalent code that can fix them.
An Introduction to Hybrid Profiling
A traditional dynamic profiler can identify hotspots in your code. It is then up to you to fix them.
On the other hand, a static analysis tool can identify potential optimizations but not prioritize them. There can be thousands or even millions of ways to optimize a codebase but many of them can have little impact on the actual runtime performance.
By combining dynamic profiling and static analysis, we can get the best of both worlds: Targeted suggestions on how to fix your code in the places where it matters.
How it works
- Dynamic Profiling: You run your code - in this case, Python cells inside a Jupyter notebook. In the background, a profiler records information on what your code is doing and when. It builds a heatmap of the most CPU intensive lines and colors them accordingly.
- Static Analysis: Your code finishes running. Using Semgrep, your code is now analyzed with a database of performance-related rules. Each rule contains a pattern to look for and a way to rewrite matching code to be more performant.
- Hybrid Recommendations: The output from the profiler and Semgrep are now compared. Suggested rewrites from Semgrep are thrown away if they don’t match a hotspot in the code. The user is presented with potential fixes that matter most.
