Our platform, anecdotes.ai, is an Angular Single Page Application (SPA), which has several main components. When you log in into the platform, Angular triggers 3 API requests to the anecdotes backend, each representing a component. While waiting for a response, the UI shows a loading page so it’s clear why our goal, as backend developers, is to reduce this time as much as we can.
As you can see, we have:
- Framework endpoint: ~6 seconds
- Control endpoint: ~4 seconds
- Service endpoint: ~3 seconds
Summing it up, I’ll use the words of Chrenobyl’s Dyatlov:
One evening, Tal Borenstein and I started to bounce around some ideas on how to reduce the loading time.
Here I should describe the stack that our backend is built on:
- Programming language — Python
- Database — Snowflake
- Web stack — Gunicorn + Flask
- ORM — SQLAlchemy
Each of these parts could potentially cause performance problems.
To take the discussion to a more professional level, we decided to use a profiler so we would have some idea of what was going on. I’ll focus on one specific endpoint (/control) but the general idea is the same for the other two.
Let’s review the first profiler iteration:
At first sight, it didn’t tell us a lot, but there was one thing that caught our attention:
The function “get_specific_global_control” was invoked 3586 times, and it triggered another function (“get”) 1084112 times (!), taking 40% of the processing time!
We jumped to the code and reviewed the implementation:
{{banner-image}}
We could immediately see that although we used a cache so that we wouldn’t query the database for each control, each invocation of “get_specific_global_control” took o(N) (where N is the number of controls in our system) to find a specific control!
This could have been immediately reduced just by keeping the cache as a dictionary and not as a list.
It would make the get_specific_global_control run in O(1) instead of O(N).
Now that we made this change, let’s review our second iteration of profiling and see if it helped:
You can see that now, control_facade:control_related_controls takes only 4%(!) of the processing time. It had previously taken 58% of the processing time.
So what’s next?
The next thing that caught our eye was the “get_services_that_automate” function, which now, after the second iteration, took 40% of the time.
This function should have returned a list of services that automated a specific control. This is a new property that was added recently, and we didn’t expect it to cause such significant performance degradation. The root cause for that “get_services_that_automates” function taking so long is that it queried the database, instead of using already-queried data. We bypassed this by adding this property as a column to the already-queried object (control), so we now get this property “for free”.
So here’s our third iteration of profiling. After holding the controls cache as a dictionary instead of a list AND refactoring “get_services_that_automates” to use an already-queried object instead of querying the database, we managed to reduce around 50%~ of the time for our APIs.
Let’s see now how the platform behaves and compare the results:
Summary
Finally, we can review the “performance blitz” results in a simple table, showing a performance enhancement of 67%.
So there you have it; how we improved performance by A LOT in one day. It involved a bit of playing around and experimentation but that’s what we love doing here at anecdotes—seeing how already-decent processes can be made even better and optimizing what’s already good.