bii internals: C vs Python efficiency

In our bii internal series we’ve walked through the process to convert our python code into C code, compile it as a python native extension to distribute it for different platforms. One of the major drawbacks of using native code is that we are not supporting all systems but, on the other side, we gain in efficiency and have more control over the environments where the app runs. We’re doing some benchmark to see how much faster is biicode processing projects: running all in python code or with the native extensions.

First of all, let’s explain a bit what biicode does on every operation. At first it makes a check in to read files from the hard disk and it checks if they’ve changed and caches them. Then, if any files have changed, it processes your project which means that it parses source code, searching for dependencies, analyzes your dependencies, configuration etc. The final step is checking out to disk file changes and external dependencies that were already in local cache.

We’ve tested how biicode processes different libraries: running python code vs cythonized code. We’ve measured following times:

  • Check-In: Time to read all files and cache them in memory.
  • Process: Time to parse code, and analyze dependencies.
  • Reprocess: Time to reprocess files without changes.

These are the results (in seconds) for SDL library, which contains 2130 files:

PythonNative extensions
Check-In0.26206111908 s0.262398004532 s
Process9.54270887375 s6.47844004631 s
Reprocess1.45510792732 s1.36480784416 s

As you can see that check-in time is the same in both cases as it involves reading tons of files from disk so it’s IO bounded not processor bounded.

Also reprocess time is very similar, with a slightly improvement in native extensions. Reprocess makes sure there’s no need to calculate anything new.

Where you can notice major improvements is in processing time where native extensions take the 68% of the time of plain python code, 3 seconds of extra time mean a lot in terms of user experience.

Performance gain is not constant  for every library, but it increases along with number of files/relations being processed. For projects smaller than 500 files performance gain is between 7% and 8%, for larger projects it boosts up to 32%  as observed in SDL case.

Same code running on python or python native extension

Same code running on python or python native extension

So, is it worth compiling to C code? Well, depends on your project, of course. If your program is IO bounded then probably it isn’t worthy, but if you need to do tons of processing then you might consider it,  setting up the compilepackage process is very easy.

You can check all the posts in the series, and in case of doubts, I’ll be happy to help you, contact me.

Stay tuned

Related Posts