Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

[Phys-L] timestamped events: funny story



Once upon a time I got mixed up in a project that had a lot
of timestamped data.

The company that sold the data acquisition instrument also
provided some fancy software to analyze the data, including
convolutions and correlation functions. The problem was that
for each of the data sets in question, the analysis took
five days on a cluster of 8 supercomputers. That's a pain
in the neck, because you have to ask permission and get on
a schedule and then wait.....

What's worse, the output of the analysis program didn't look
entirely plausible. Nobody knew for sure, because just running
calibration vectors through the system was so costly that nobody
was eager to do it.

So I decided to write my own correlation software. I reported
to the principal investigator:

++ It works! You know that calculation that heretofore took 5
days on 8 supercomputers? I got it down to three.
−− Three what? Three days? Three computers?
++ Three SI units.
Three seconds.
On a laptop.





The inner loop of my program, the part that does the dot product,
is about 20 lines of software. The rest is mostly just plumbing,
i.e. reading the files and formatting the data. Plus about 10
pages of documentation to explain what's going on, which is not
exactly obvious.

I have no idea what the commercial software was doing. The code
is proprietary and I never got to see it. I suspect it was taking
the sparse data and unpacking it into bins — even though 99.9999%
of the bins were empty — and then performing convolutions on the
unpacked representation. My code acts directly on the timestamped
data, with no binning, no unpacking.

And yeah, the commercial software was in fact getting the wrong
answer.

We like timestamped data. We don't like bins. We don't like windows.