"Reposting what I wrote over on [Hacker News](http://news.ycombinator.com/item...), where this story originated: Facebook engineer here, working on this problem with Joshua. What this comes down to is that git uses a lot of essentially O(*n*) data structures, and when *n* gets big, that can be painful. A few examples: * There's no secondary index from file or path name to commit hash. This is what slows down operations like `git blame`: they have to search every commit to see if it touched a file. * Since git uses `lstat` to see if files have been changed, the sheer number of system calls on a large filesystem becomes an issue. If the dentry and inode caches aren't warm, you spend a ton of time waiting on disk I/O. An inotify daemon could help, but it's not perfect: it needs a long time to warm up in the case of a reboot or crash. Also, inotify is an incredibly tricky interface to use efficiently and reliably. (I wrote the inotify support in Mercurial, FWIW.) * The index is..."
- Bryan O'Sullivan
"New place open where the old Tower Bar was. It's fairly small and cosy, with a decent bar and menu, and prices that you'll definitely like if you're used to typical ski resort gouging. I especially…"
- Bryan O'Sullivan
"I'm really lookin forward to seeing where the conduit work goes. Very much liking the first few posts in this series. Even the terminology is muh easier to follow than the enumerator business."
- Bryan O'Sullivan
"Er, no? Firstly, the flot plotting library doesn't support axis labels (I know, right?). But I managed to fudge a way to get units on the x axis; whew! But the y axis has very little meaning on both charts, so it doesn't make sense for it to be labeled. The heights I spikes in the KDE chart are visually useful, but they do *not* correspond to probabilities or any other number you could do anything with."
- Bryan O'Sullivan
bos on [Newbie]: Haskell code is really slow compared to same code in Python. Can anyone help me find what the problem is ? - http://www.reddit.com/r...
"Independent of the bugs in your code, I think that the radix-2 DIT FFT (which is what this is) is one of those algorithms where the in-place version is both way more efficient and easier to follow. So you can see what I mean, here's an implementation of that algorithm that I wrote a few weeks ago: https://github.com/bos..."
- Bryan O'Sullivan
Efficient use of Kernel density estimation (KDE) requires the optimal selection of the smoothing parameter called the bandwidth h of the kernel. For a practical implementation of KDE the choice of the bandwidth h is very important. Small h leads to an estimator with small bias and large variance. Large h leads to a small variance at the expense of increase in bias. The bandwidth h has to be chosen optimally. The most successful among all the current methods, both empirically and theoretically, is the solve-the-equation plug-in method. However this is computationally very expensive and scales as O(N^2). The code given here computes the same in O(N) time at the expense of reduced precision, which can be arbitrary.
- Bryan O'Sullivan
Reliable and extremely fast kernel density estimator for one-dimensional data. Gaussian kernel is assumed and the bandwidth is chosen automatically. Unlike many other implementations, this one is immune to problems caused by multimodal densities with widely separated modes.
- Bryan O'Sullivan