Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

[Phys-l] NASA versus Bayes





FORGET CIVIL LIBERTIES, JUST CHECK NSA'S MATH

BRUCE SCHNEIER - What is the probability that people are terrorists
given that NSA's mass surveillance identifies them as terrorists? If
the probability is zero (p=0.00), then they certainly are not
terrorists, and NSA was wasting resources and damaging the lives of
innocent citizens. If the probability is one (p=1.00), then they
definitely are terrorists, and NSA has saved the day. If the
probability is fifty-fifty (p=0.50), that is the same as guessing the
flip of a coin. The conditional probability that people are terrorists
given that the NSA surveillance system says they are, that had better
be very near to one (p=1.00) and very far from zero (p=0.00). The
mathematics of conditional probability were figured out by the
Scottish logician Thomas Bayes. If you Google "Bayes' Theorem", you
will get more than a million hits. Bayes' Theorem is taught in all
elementary statistics classes. Everyone at NSA certainly knows Bayes'
Theorem. . .

No matter how sophisticated and super-duper are NSA's methods for
identifying terrorists, no matter how big and fast are NSA's
computers, NSA's accuracy rate will never be 100% and their
misidentification rate will never be 0%. That fact, plus the extremely
low base-rate for terrorists, means it is logically impossible for
mass surveillance to be an effective way to find terrorists. . .

The US Census shows that there are about 300 million people living in the USA.

Suppose that there are 1,000 terrorists there as well, which is
probably a high estimate. The base-rate would be 1 terrorist per
300,000 people. In percentages, that is .00033%, which is way less
than 1%. Suppose that NSA surveillance has an accuracy rate of .40,
which means that 40% of real terrorists in the USA will be identified
by NSA's monitoring of everyone's email and phone calls. This is
probably a high estimate, considering that terrorists are doing their
best to avoid detection. There is no evidence thus far that NSA has
been so successful at finding terrorists. And suppose NSA's
misidentification rate is .0001, which means that .01% of innocent
people will be misidentified as terrorists, at least until they are
investigated, detained and interrogated. Note that .01% of the US
population is 30,000 people. With these suppositions, then the
probability that people are terrorists given that NSA's system of
surveillance identifies them as terrorists is only p=0.0132, which is
near zero, very far from one. Ergo, NSA's surveillance system is
useless for finding terrorists.

Suppose that NSA's system is more accurate than .40, let's say, .70,
which means that 70% of terrorists in the USA will be found by mass
monitoring of phone calls and email messages. Then, by Bayes' Theorem,
the probability that a person is a terrorist if targeted by NSA is
still only p=0.0228, which is near zero, far from one, and useless.

Suppose that NSA's system is really, really, really good, really,
really good, with an accuracy rate of .90, and a misidentification
rate of .00001, which means that only 3,000 innocent people are
misidentified as terrorists. With these suppositions, then the
probability that people are terrorists given that NSA's system of
surveillance identifies them as terrorists is only p=0.2308, which is
far from one and well below flipping a coin. NSA's domestic monitoring
of everyone's email and phone calls is useless for finding terrorists.

As an exercise to the reader, you can use the same analysis to show
that data mining is an excellent tool for finding stolen credit cards,
or stolen cell phones. Data mining is by no means useless; it's just
useless for this particular application



http://www.schneier.com/blog/archives/2006/07/terrorists_data.html


Wiki's exposition:


http://en.wikipedia.org/wiki/Bayes'_theorem <http://en.wikipedia.org/wiki/Bayes%27_theorem>


bc, too lazy to read all carefully, and willing to accept bets on the likelihood of a JD comment.