Last week Michael Lerner posted a nice explanation of the relationship between histograms and kernel density estimation (KDE). I've made some attempts in this direction before (both in the scikit-learn documentation and in our upcoming textbook), but Michael's use of interactive javascript widgets makes the relationship extremely intuitive. I had been planning to write a similar post on the theory behind KDE and why it's useful, but Michael took care of that part. Instead, I'm going to focus here on comparing the actual implementations of KDE currently available in Python. If you're unsure what kernel density estimation is, read Michael's post and then come back here.
There are several options available for computing kernel density estimates in Python. The question of the optimal KDE implementation for any situation, however, is not entirely straightforward, and depends a lot on what your particular goals are. Here are the four KDE implementations I'm aware of in the SciPy/Scikits stack:
In SciPy:
gaussian_kde
.In Statsmodels:
KDEUnivariate
andKDEMultivariate
(See an example here).In Scikit-learn:
KernelDensity
(See further examples here).