Atoms 2 Bits

Applied Analytics in Retail and Supply Chain Management

Migrating from Octopress to Pelican

After nine months on Octopress, I've decided to move on.

I should start by saying that Octopress is a great platform for static blogging: it's powerful, flexible, well-supported, well-integrated with GitHub pages, and has tools and plugins to do just about anything you might imagine. There's only one problem:

It's written in Ruby.

Now I don't have anything against Ruby per se. However, it was starting to seem a bit awkward that a blog called Pythonic Perambulations was built with Ruby, especially given the availability of so many excellent Python-based static site generators (Hyde, Nikola, and Pelican in particular).

Additionally, a few things with Octopress were starting to become difficult:

Benchmarking Nearest Neighbor Searches in Python

I recently submitted a scikit-learn pull request containing a brand new ball tree and kd-tree for fast nearest neighbor searches in python. In this post I want to highlight some of the features of the new ball tree and kd-tree code that's part of this pull request, compare it to what's available in the scipy.spatial.cKDTree implementation, and run a few benchmarks showing the performance of these methods on various data sets.

Code Golf in Python: Sudoku

Edit: based on suggestions from readers, the best solution is down to 162 characters! Read to the end to see how

A highlight of PyCon each year for me is working on the little coding challenges offered by companies in the expo center. I love testing my Python prowess against the problems they pose (and being rewarded with a branded mug or T-shirt!) This year, several of the challenges involved what's become known as code golf: writing a solution with minimal keystrokes.

By way of example, take a look at this function definition:

In [1]:
def S(p):i=p.find('0');return[(s for v in
set(`5**18`)-{(i-j)%9*(i/9^j/9)*(i/27^j/27|i%9/3^j%9/3)or
p[j]for j in range(81)}for s in S(p[:i]+v+p[i+1:])),[p]][i<0]

This is a valid function definition (in Python 2.7) which executes a particular task. I'll give more information on the workings of this script later on, but for now I'll leave it to the reader to ponder over what it might do.

Matplotlib and the Future of Visualization in Python

Last week, I had the privilege of attending and speaking at the PyCon and PyData conferences in Santa Clara, CA. As usual, there were some amazing and inspiring talks throughout: I would highly recommend browsing through the videos as they are put up on pyvideo.

One thing I spent a lot of time thinking, talking, and learning about during these two conferences was the topic of data visualization in Python. Data visualization seemed to be everywhere: PyData had two tutorials on matplotlib (the second given by yours truly), as well as a talk about NodeBox OpenGL and a keynote by Fernando Perez about IPython, including the notebook and the nice interactive data-visualization it allows. Pycon had a tutorial on network visualization, a talk on generating art in Python, and a talk on visualizing Github.

Animating the Lorenz System in 3D

One of the things I really enjoy about Python is how easy it makes it to solve interesting problems and visualize those solutions in a compelling way. I've done several posts on creating animations using matplotlib's relatively new animation toolkit: (some examples are a chaotic double pendulum, the collisions of particles in a box, the time-evolution of a quantum-mechanical wavefunction, and even a scene from the classic video game, Super Mario Bros.).

Recently, a reader commented asking whether I might do a 3D animation example. Matplotlib has a decent 3D toolkit called mplot3D, and though I haven't previously seen it used in conjunction with the animation tools, there's nothing fundamental that prevents it.

At the commenter's suggestion, I decided to try this out with a simple example of a chaotic system: the Lorenz equations.

Setting Up a Mac for Python Development

Edit, August 2013: my current favorite way to set up a python installation on mac (and any other system) is to use the anaconda package offered by Continuum Analytics. It's free, full-featured, and extremely easy to use.

'OSX 10.8 Logo' A few weeks ago, after years of using Linux exclusively for all my computing, I started a research fellowship in a new department and found a brand new Macbook Pro on my desk. Naturally, my first instinct was to set up the system for efficient Python development. In order to help others who might find themself in a similar situation, I took some notes on the process, and I'll summarize what I learned below.

Hacking Super Mario Bros. with Python

This weekend I was coming home from the meeting of the LSST Dark Energy Science Collaboration, and found myself with a few extra hours in the airport. I started passing the time by poking around on the imgur gallery, and saw a couple animated gifs based on one of my all-time favorite games, Super Mario Bros. It got me wondering: could I use matplotlib's animation tools to create these sorts of gifs in Python? Over a few beers at an SFO bar, I started to try to figure it out. To spoil the punchline a bit, I managed to do it, and the result looks like this:

This animation was created entirely in Python and matplotlib, by scraping the image data directly from the Super Mario Bros. ROM. Below I'll explain how I managed to do it.

Will Scientists Ever Move to Python 3?

It's been just over four years since the introduction of Python 3, and there are still about as many opinions on it as there are Python users. For those who haven't been following, Python 3 is a release which offers several nice improvements over the 2.x series (summarized here) with the distinct disadvantage that it broke backward compatibility: though Python 3.x (often referred to as "Py3k" for short) is true to the spirit of earlier Python versions, there are a few valid 2.x constructions which will not parse under 3.x.

Breaking backward compatibility was controversial, to say the least. I think of the debate as one between the pragmatists -- those who see Python as an extremely useful tool, which should not be unnecessarily tampered with -- and the idealists -- those who view the Python language as a living, breathing entity, which should be allowed to grow into the fullest and most Pythonic possible version of itself.

Sparse SVDs in Python

After Fabian's post on the topic, I have recently returned to thinking about the subject of sparse singular value decompositions (SVDs) in Python.

For those who haven't used it, the SVD is an extremely powerful technique. It is the core routine of many applications, from filtering to dimensionality reduction to graph analysis to supervised classification and much, much more.

I first came across the need for a fast sparse SVD when applying a technique called Locally Linear Embedding (LLE) to astronomy spectra: it was the first astronomy paper I published, and you can read it here. In LLE, one visualizes the nonlinear relationship between high-dimensional observations. The computational cost is extreme: for N objects, one must compute the null space (intimately related to the SVD) of a N by N matrix. Using direct methods (e.g. LAPACK), this can scale as bad as $\mathcal{O}[N^3]$ in both memory and speed!

Minesweeper in Matplotlib

Lately I've been playing around with interactivity in matplotlib. A couple weeks ago, I discussed briefly how to use event callbacks to implement simple 3D visualization and later used this as a base for creating a working 3D Rubik's cube entirely in matplotlib.

Today I have a different goal: re-create minesweeper, that ubiquitous single-player puzzle game that most of us will admit to having binged on at least once or twice in their lives. In minesweeper, the goal is to discover and avoid hidden mines within a gridded minefield, and the process takes some logic and quick thinking.