Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

[Phys-L] Re: Average earlier or average later?



Michael Edmiston wrote:

I am inclined to agree with Alvin Bachman that averaging first makes the
most sense for this kind of data.

I don't know what is the antecedent of "this". Sometimes you want to
average early in the process. Or later. Or both. Or neither.

I also agree with John Denker that
proper data analysis can be difficult, and it certainly depends on what
you are trying to accomplish.

Performing repeated trials of a particular experiment then performing
some type of averaging was not done in any experiments I participated in
as a professional scientist. This means that in my professional
research career I don't think I ever took or analyzed data the way many
student labs are organized.

That is a scary thought, and suggests that there is room for
improving the curriculum.

Actually, as an issue of principle, there is always "some" averaging
going on very near the front end of the measurement. For instance,
an ordinary voltmeter looks at the voltage for some entirely nontrivial
time (typically a goodly fraction of a second) before completing the
measurement. A lockin amplifier typically has an explicit "bandwith"
knob. As a rule, you want enough bandwidth so that the time-dependence
of your signal gets through, but then you want to minimize the
bandwidth, to minimize the amount of noise that gets through.

Having "no" front-end averaging would imply infinite noise bandwidth
and hence infinite noise, which would be a Bad Thing.

If an experiment is repeated multiple times to get "an average value,"
it seems to me the raw data should be averaged. The raw value is what
has been measured; it is what we assume has the normal distribution; and
as I pointed out by example, others stated, and Alvin showed by
analysis... the average of a function of the data does not necessarily
approach the function of the average of the data.

This was advertised as a "formal" analysis ... but in fact it
only analyzed an example. Counterexamples abound. Nobody has come
close to stating necessary and/or sufficient conditions for averaging
to make sense. Averaging is a way of estimating the mean. Sometimes
that's what you want, and sometimes it's not. Or perhas a better way
of saying it is sometimes you want minimal front-end averaging (i.e.
large bandwidth) and sometimes you want a lot of front-end averaging
(i.e. small bandwidth).

The right answer is sensitive to the nature of the data and the nature
of the noise. Any pat answer of the form "averaging is good" or "averaging
is bad" is guaranteed to be wrong.


In the example I gave we might
say the goal was to determine the average velocity,

A better phraseology would be that we want to determine a good
estimate of the velocity. Averaging is sometimes a good estimation
strategy, and sometimes not.

Here is a specific description of what I mean. In my particular
example, the average delta-t for a 20-cm delta-x was 0.563 s yielding an
average velocity of 35.55 cm/s. For this data set of 15 trials, if each
individual velocity is calculated then these are averaged, the result is
37.23 cm/s. If we report the 37.23 cm/s as the measured velocity, then
the reverse calculation implies we measured an average delta-t of 0.537
s which is clearly incorrect. That is, our actual data do not yield an
average time of 0.537 s, rather our data yield an average time of 0.563
s. Thus, averaging later would give a false impression of the average
value of the actual data for the experiment.

This evidently assumes that the dominant source of noise is in the
measurement of delta_t. One can easily imagine a situation where
the uncertainty in delta_t is negligible, and the dominant source of
noise is elsewhere ... perhaps because the launcher produces a
distribution of velocities, or perhaps because of a distribution
of stray air currents that affect the amount of drag.

I say again, deciding how to analyze the data is tricky, and requires
knowing about your data and about your noise.

SUGGESTION: Make a scatter plot of the data. Do not assume the noise
is normally distributed. Check whether it is or not.

I would also like to repeat that I think this is somewhat of a new
problem for student labs that involve this type of repetitive
measurement. In "the old days" before spreadsheets we would ...

Naaah.

Students have been mishandling their data for untold generations.
Computers allow them to mishandle it more quickly ... but we should
also keep in mind that computers -- if used properly -- make it
possible to do things far better than could be done otherwise.

The power of a computer for storing large numbers of data points,
plotting them, performing nonlinear analysis, et cetera ... is
immense.

It reminds me (roughly) of the advent of word processors. Computers
weren't nearly so much of a time-saver as had been expected. What
happened is that the amount of time spent writing stayed about the
same, but the standards went way up.
_______________________________________________
Phys-L mailing list
Phys-L@electron.physics.buffalo.edu
https://www.physics.buffalo.edu/mailman/listinfo/phys-l