Atoms 2 Bits

Applied Analytics in Retail and Supply Chain Management

Numba vs Cython

For a more up-to-date comparison of Numba and Cython, see the newer post on this subject.

Often I'll tell people that I use python for computational analysis, and they look at me inquisitively. "Isn't python pretty slow?" They have a point. Python is an interpreted language, and as such cannot natively perform many operations as quickly as a compiled language such as C or Fortran. There is also the issue of the oft-misunderstood and much-maligned GIL, which calls into question python's ability to allow true parallel computing.

Many solutions have been proposed: PyPy is a much faster version of the core python language; numexpr provides optimized performance on certain classes of operations from within python; weave allows inline inclusion of compiled C/C++ code; cython provides extra markup that allows python and/or python-like code to be compiled into C for fast operations. But a naysayer might point out: many of these "python" solutions in practice are not really python at all, but clever hacks into Fortran or C.

Matplotlib Animation Tutorial

Matplotlib version 1.1 added some tools for creating animations which are really slick. You can find some good example animations on the matplotlib examples page. I thought I'd share here some of the things I've learned when playing around with these tools.

Basic Animation

The animation tools center around the matplotlib.animation.Animation base class, which provides a framework around which the animation functionality is built. The main interfaces are TimedAnimation and FuncAnimation, which you can read more about in the documentation. Here I'll explore using the FuncAnimation tool, which I have found to be the most useful.

Memoryview Benchmarks 2

In the previous post, I explored how cython typed memoryviews can be used to speed up repeated array operations. It became clear that typed memoryviews are superior to the ndarray syntax for slicing, and as fast as raw pointers for single element access. In the comments, Mathieu brought up an interesting question: is the ndarray syntax as good as typed memoryviews if you're not doing slicing?

The answer turns out to be yes, unless the compiler tries to inline your function.

Memoryview Benchmarks

There was recently a thread on cython-users which caught my eye. It has to do with memoryviews, a new way of working with memory buffers in cython.

I've been thinking recently about how to do fast and flexible memory buffer access in cython. I contributed the BallTree implementation for nearest neighbors searching in scikit-learn, and have been actively thinking about how to make it faster and more flexible, including adding the ability to specify distance metrics other than euclidean and minkowski.

In order to accomplish this, I'd like to have a set of distance metric functions which take two vectors and compute a distance. There would be many functions with similar call signatures which could then be plugged into a code that would iterate over a set of vectors and compute the appropriate distances.