Once upon a time I got mixed up in a project that had a lot
of timestamped data.
The company that sold the data acquisition instrument also
provided some fancy software to analyze the data, including
convolutions and correlation functions. The problem was that
for each of the data sets in question, the analysis took
five days on a cluster of 8 supercomputers. That's a pain
in the neck, because you have to ask permission and get on
a schedule and then wait.....
What's worse, the output of the analysis program didn't look
entirely plausible. Nobody knew for sure, because just running
calibration vectors through the system was so costly that nobody
was eager to do it.
So I decided to write my own correlation software. I reported
to the principal investigator:
++ It works! You know that calculation that heretofore took 5
days on 8 supercomputers? I got it down to three.
−− Three what? Three days? Three computers?
++ Three SI units.
On a laptop.
The inner loop of my program, the part that does the dot product,
is about 20 lines of software. The rest is mostly just plumbing,
i.e. reading the files and formatting the data. Plus about 10
pages of documentation to explain what's going on, which is not
I have no idea what the commercial software was doing. The code
is proprietary and I never got to see it. I suspect it was taking
the sparse data and unpacking it into bins — even though 99.9999%
of the bins were empty — and then performing convolutions on the
unpacked representation. My code acts directly on the timestamped
data, with no binning, no unpacking.
And yeah, the commercial software was in fact getting the wrong
We like timestamped data. We don't like bins. We don't like windows.