Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: purposefully modifying a distribution to get...?



Is the data removed in a non random manner?

e.g. a finite resolving time causes poisson time data to be skewed (I wrote this a few days ago, including the error functions) If I understand your apparatus (pulse pile up is eliminated by removing
coincidences): if the pulse height and the arrival times of the pulses are independent, I don't think it will matter. Otherwise you PHA will be incorrect.

bc



Stefan Jeglinski wrote:

Recent discussions lead me ask about a minor physics quandry I have.
I am working on software that simulates the acquisition and analysis
of an x-ray pulse processor data stream such as occurs in x-ray
microanalysis using an electron microscope (EDS, Energy Dispersive
Spectroscopy). The software will ultimately operate with a -real-
stream of data taking the place of the simulated stream.

The x-ray emission events in time are described by a Poisson process.
This is straightforward to simulate using a computer. However,
practicalities (limits on memory, etc) require that the pulse height
analysis data be recorded into finite arrays. At this point in the
development of the software, it is much easier to not create a
sophisticated method of "splicing" arrays to insure the inclusion of
all pulses. And so, pulses whose peaks, heads, or tails are cut off
at either the start or end of a data array are "malformed" and must
be rejected.

Currently, pulses that overlap an array border are simply excised by
hand (by computer of course) after the fact. My concern is that in
doing so, is not one interfering with the simulated Poisson process?
Clearly if one lets the process run for a time and then goes back and
removes half the pulses, the simulated physics would be seriously
altered (or would it?). On the other end of the spectrum, if one
allowed thousands of pulses to accumulate and then removed 1% of
them, could this be seriously distinguished from an "untouched"
Poisson stream? It seems like there must be some measure of how much
one could alter the distribution (by removing data), departing from
it in such a way that would be harmful from a physics standpoint.

I note that in the real world of EDS x-ray pulse processors, such
"interference" in the Poisson process is commonplace. Any time a
one-detector-processor system (virtually every electron-microscope
EDS system in existence) detects that two pulses have arrived within
a time interval that will cause an incorrect pulse height analysis,
all such contributing pulses are rejected with prejudice. "Holes in
the resulting spectrum" are just filled in at a later time by
acquiring more data. This suggests that pulses can be rejected at
will, as long as they are "made up for" at a later time. While this
sort of reasoning seems to make sense, it doesn't appear rigorous,
and I am especially concerned with the ramifications of doing it in
simulation. I'm trying to avoid the commonplace software engineer's
approach ("we just figured it wouldn't matter"). But it -seems- as if
it really -wouldn't- matter :-)

If it really doesn't matter (because the data will be filled in
later), what does this say about manipulation of measurement
processes which are statistical in nature?

In trying to compare this to dice-throwing for example, lets say I'm
wanting to measure the distribution of rolls with 2 dice. Over a long
period of time, I will see values like 6-7-8 form a peak and values
like 2 and 12 form tails.

But let's say that along the way, I throw away results whenever I
want. With a very few rolls, this would affect the distribution
greatly. But with few rolls, I would also not expect the distribution
to approximate the final distribution very well anyway. So how can I
tell if any data has been thrown away?

Likewise with many rolls. With many rolls I would be "asymptotically"
closing in on the final distribution, and throwing away a few would
seem to make no difference in any statistical measure of the
distribution.

By this reasoning, I can throw away as much data as I want, as long
as I take more to replace it. I suspect by training this is specious,
but I can't put my finger on the specific fault.

Pedagogical or instrumentation discussions are welcomed.

Stefan Jeglinski