Chronology	Current Month	Current Thread	Current Date
[Year List] [Month List (current year)]	[Date Index] [Thread Index]	[Thread Prev] [Thread Next]	[Date Prev] [Date Next]

Re: [Phys-L] From a Math Prof (physics BS major) at my institution ( math challenge)

From: brian whatcott <betwys1@sbcglobal.net>
Date: Tue, 18 Feb 2014 21:19:50 -0600

In a particular matrix math package called Matlab we can quickly compute correlations (among many other operations) : specifically -
R = corrcoef(X) returns a matrix R of correlation coefficients calculated from an input matrix X whose rows are observations and whose columns are variables.
[R,P]=corrcoef(...) also returns P, a matrix of p-values for testing the hypothesis of no correlation. Each p-value is the probability of getting a correlation as large as the observed value by random chance, when the true correlation is zero. If P(i,j) is small, say less than 0.05, then the correlation R(i,j) is significant.

The p-value is computed by transforming the correlation to create a t statistic having n-2 degrees of freedom, where n is the number of rows of X. The confidence bounds are based on an asymptotic normal distribution of 0.5*log((1+R)/(1-R)), with an approximate variance equal to 1/(n-3). These bounds are accurate for large samples when X has a multivariate normal distribution. The 'pairwise' option can produce an R matrix that is not positive definite.

As an example, if we generate random data having correlation between column 4 and the other columns.
x = randn(30,4); % Uncorrelated data
x(:,4) = sum(x,2); % Introduce correlation.
[r,p] = corrcoef(x) % Compute sample correlation and p-values.
[i,j] = find(p<0.05); % Find significant correlations.
[i,j] % Display their (row,col) indices.

r =
1.0000 -0.3566 0.1929 0.3457
-0.3566 1.0000 -0.1429 0.4461
0.1929 -0.1429 1.0000 0.5183
0.3457 0.4461 0.5183 1.0000

p =
1.0000 0.0531 0.3072 0.0613
0.0531 1.0000 0.4511 0.0135
0.3072 0.4511 1.0000 0.0033
0.0613 0.0135 0.0033 1.0000

ans =
4 2
4 3
2 4
3 4
***************************

Now in the particular data sets of interest, we have
data =

2 6 7 25 34
3 9 12 15 34
6 16 21 28 32
6 10 13 21 23
4 18 26 27 34
3 6 17 27 32
3 11 21 22 35
1 2 8 17 27
7 12 14 24 31
3 7 14 18 27
7 13 22 25 31
7 12 23 31 32
4 17 18 22 35
8 15 17 20 25
12 16 18 29 34
2 7 11 16 21
8 23 24 32 35
17 19 23 29 31
9 16 27 28 32
6 15 19 26 32
6 13 15 23 31

and data2 =

11 17 19 28 31
3 11 29 32 35
14 21 24 28 33
9 14 22 23 31
3 21 26 30 31
5 15 20 27 29
2 23 24 25 26
7 13 20 24 25
3 23 26 27 28
6 20 21 26 29
1 10 14 19 35
12 18 27 32 35
2 6 24 27 28
3 8 11 21 30
9 14 20 25 31
4 13 19 21 28
10 11 12 21 31
2 7 11 20 24
6 17 25 29 30
13 23 24 26 34
9 17 21 25 26

We then compute correlations....

[r,p]=corrcoef(data)

r =

1.0000 0.6782 0.5378 0.5900 0.1331
0.6782 1.0000 0.7823 0.6477 0.4311
0.5378 0.7823 1.0000 0.7075 0.4356
0.5900 0.6477 0.7075 1.0000 0.5493
0.1331 0.4311 0.4356 0.5493 1.0000

p =

1.0000 0.0007 0.0119 0.0049 0.5652
0.0007 1.0000 0.0000 0.0015 0.0510
0.0119 0.0000 1.0000 0.0003 0.0484
0.0049 0.0015 0.0003 1.0000 0.0099
0.5652 0.0510 0.0484 0.0099 1.0000

[i,j] = find(p<0.005);
[i,j]

ans =

2 1
4 1
1 2
3 2
4 2
2 3
4 3
1 4
2 4
3 4

We repeat the process for the second data set data2, in this way:

[r2,p2]=corrcoef(data2)
r2 =

1.0000 0.3869 0.1578 0.2600 0.3635
0.3869 1.0000 0.5874 0.4726 0.1181
0.1578 0.5874 1.0000 0.8598 0.2710
0.2600 0.4726 0.8598 1.0000 0.3576
0.3635 0.1181 0.2710 0.3576 1.0000

p2 =

1.0000 0.0832 0.4944 0.2550 0.1053
0.0832 1.0000 0.0051 0.0305 0.6102
0.4944 0.0051 1.0000 0.0000 0.2348
0.2550 0.0305 0.0000 1.0000 0.1115
0.1053 0.6102 0.2348 0.1115 1.0000

Next we check for significant correlation at p < 0.005

>> [i,j] = find(p<0.005);
>> [i,j]

ans =

4 3
3 4

you will notice that the Matlab package indicates more unexpected correlations in the data array than in the data2 array, so you might conclude that the process responsible for generating data2 was much more capable of random outputs than the process used to generate the data array. This agrees with Joel's identification of the student generated list (given here as data)

Brian Whatcott Altus OK

On 2/18/2014 9:46 AM, Rauber, Joel wrote:

The second list was the random list. As noted, one cannot prove which one was the random list, you can only make a probabilistic guess.

I looked at two factors, the number of times consecutive numbers appear -> leads to 2nd list is random
The number of times numbers in the range [30-35] appeared compared to the other decade ranges, which also lends evidence that the second list was the random one.

/snip/

Follow-Ups:
- Re: [Phys-L] From a Math Prof (physics BS major) at my institution ( math challenge)
  - From: "Donald Polvani" <dgpolvani@verizon.net>

Prev by Date: Re: [Phys-L] From a Math Prof (physics BS major) at my institution ( math challenge)
Next by Date: Re: [Phys-L] From a Math Prof (physics BS major) at my institution ( math challenge)
Previous by thread: Re: [Phys-L] zero-point motion at the introductory level
Next by thread: Re: [Phys-L] From a Math Prof (physics BS major) at my institution ( math challenge)
Index(es):
- Date
- Thread