Chacko’s ordering process

We begin with a list of kk values {x1,,xk}\{x_1, \ldots, x_k\} with associated weights t1==tk=1t_1 = \ldots = t_k = 1.

If for any 1i(k1)1 \leq i \leq (k - 1) we have that xk>xk+1x_k > x_{k + 1}, then we replace both values xkx_k and xk+1x_{k + 1} by a single value which is their weighted average (using the weights tkt_k and tk+1)t_{k + 1}). This new value takes the combined weight of the two values it replaces, tk+tk+1t_k + t_{k + 1}. The list is now one shorter, so kk becomes k1k - 1.

We repeat this process until either k=1k = 1 or we have a monotone increasing sequence of numbers.

We can recreate the two ordering process examples shown on pages 187 (section 3) and 189 (section 5) of Chacko (1966) in this package.

## This is permChacko 1.0.1
## To learn more about the package, please read the documentation available in the commands below:
## help("permChacko-package")
## browseVignettes("permChacko")
reduceVector(chacko66_sec3)
## 
## Original vector: 10  16  14  12  18
## Reduced vector : 10  14  18
reduceVector(chacko66_sec5)
## 
## Original vector: 12  14  18  16  22  20  18  24  26  30
## Reduced vector : 12  14  17  20  24  26  30

Notice that Chacko was entirely comfortable with this ordering process ending with a single value. If you look at their table on page 188 then he suggests that under the null hypothesis, if you start with a list of 5 values, then you have a 20% chance that the order process results in a single value.

The aforementioned table is available both in Chacko (1963) and Chacko (1966), as well as in this package:

chacko63_tab1
##           k = 3    k = 4    k = 5    k = 6    k = 7    k = 8    k = 9   k = 10
## m = 1  0.333333 0.250000 0.200000 0.166667 0.142857 0.125000 0.111111 0.100000
## m = 2  0.500000 0.458333 0.416667 0.380556 0.350000 0.324107 0.301984 0.282897
## m = 3  0.166667 0.250000 0.291667 0.312500 0.322222 0.325694 0.325519 0.323165
## m = 4        NA 0.041667 0.083333 0.118055 0.145833 0.167882 0.185417 0.199427
## m = 5        NA       NA 0.008333 0.020833 0.034722 0.048611 0.061863 0.074219
## m = 6        NA       NA       NA 0.001389 0.004167 0.007986 0.012500 0.017436
## m = 7        NA       NA       NA       NA 0.000198 0.000694 0.001505 0.002604
## m = 8        NA       NA       NA       NA       NA 0.000025 0.000099 0.000240
## m = 9        NA       NA       NA       NA       NA       NA 0.000003 0.000012
## m = 10       NA       NA       NA       NA       NA       NA       NA 0.000000

Even if the outcome is a single value, the test statistic can be calculated from equation 5 on page 188, i.e.:

χ2=knj=1mtj[xtjnk]2 \bar\chi^2 = \frac{k}{n} \sum_{j = 1}^m t_j \left[ \bar x_{t_j} - \frac{n}{k}\right]^2

In the equation above, xtj\bar x_{t_j} are the values in the reduced vector, n=i=1kxin = \sum_{i = 1}^k x_i, mm is the number of elements in the final reduced vector x\bar x. The statistic χ2\bar\chi^2 is asymptotically distributed as χ2(m1)\chi^2(m - 1).

The question then becomes how to obtain a pp-value associated with this test statistic.

Calculating pp-values

Given that χ2χ2(m1)\bar\chi^2 \sim \chi^2(m - 1), an analytical pp-value can be calculated for the test statistic for any m>1m > 1. Moreover, for certain combinations of mm and kk the pp-value can be calculated using the table shown below. Finally, a numeric pp-value can be calculated through computer simulation, which will be explained later. All these p-values are calculated using the permChacko() function.

permChacko(chacko66_sec3)
## 
##         Chacko Test for Order-restriction with Permutation Test
## 
## Null hypothesis           : p1 == p2 == p3 == p4 == p5
## Alternative hypothesis    : p1 <= p2 <= p3 <= p4 <= p5
## 
## Test statistic (chisq_bar): 2.285714
## p-values:
##   Analytic p-value        : 0.318907
##   Numeric p-value         : 0.195000 (1000 permutations)
##   Numeric mid-p value     : 0.192500 (1000 permutations)
##   Tabular p-value         : 0.196052
permChacko(chacko66_sec5)
## 
##         Chacko Test for Order-restriction with Permutation Test
## 
## Null hypothesis           : p1 == p2 == ... == p9 == p10
## Alternative hypothesis    : p1 <= p2 <= ... <= p9 <= p10
## 
## Test statistic (chisq_bar): 13.500000
## p-values:
##   Analytic p-value        : 0.035748
##   Numeric p-value         : 0.003000 (1000 permutations)
##   Numeric mid-p value     : 0.003000 (1000 permutations)
##   Tabular p-value         : 0.002294

One omission from Chacko (1966) is the event where m=1m = 1. As discussed above, they clearly expect that this will happen sometimes, but provide no explanation as to how to calculate the pp-value in this case. In this package, we suggest an approach to calculate this value using a permutation test.

If you look at Chacko’s final example on how they evaluate significance, they say that you reject the null hypothesis if the observed calculated value obtained from the equation for χ2\bar\chi^2 above is greater than the value cc obtained from equation (6) in the paper (reproduced below).

α=m=2kpm,kp[χ(m1)2c] \alpha = \sum_{m = 2}^k p_{m, k} \cdot p\left[\chi^2_{(m - 1)} \geq c\right]

This approach does work when m=1m = 1, but seems very cumbersome, requiring calculation of the probability values pm,kp_{m, k}, which in turn requires consultation or partial recreation of the Chacko table. With the computing power available to us, 60 years into the future from Chacko’s seminal work, we can simply obtain a close-enough pp-value using a permutation test.

Permutation test

Imagine that the sum of our original kk values is nn. Let a permutation be a stochastic distribution of nn objects (independently) across the kk categories (with each one being equally likely under the null hypothesis). For each such permutation we can go through Chacko’s ordering procedure and calculate the test statistic according to equation 5.

The pp-value in this case is simply the fraction of such permutations that yield a test statistic equal to or greater than the one we originally observed.