Improving the Chacko (1966) test

Chacko’s ordering process

We begin with a list of $k$ values $\{x_1, \ldots, x_k\}$ with associated weights $t_1 = \ldots = t_k = 1$ .

If for any $1 \leq i \leq (k - 1)$ we have that $x_k > x_{k + 1}$ , then we replace both values $x_k$ and $x_{k + 1}$ by a single value which is their weighted average (using the weights $t_k$ and $t_{k + 1})$ . This new value takes the combined weight of the two values it replaces, $t_k + t_{k + 1}$ . The list is now one shorter, so $k$ becomes $k - 1$ .

We repeat this process until either $k = 1$ or we have a monotone increasing sequence of numbers.

We can recreate the two ordering process examples shown on pages 187 (section 3) and 189 (section 5) of Chacko (1966) in this package.

library(permChacko)

## This is permChacko 1.0.1
## To learn more about the package, please read the documentation available in the commands below:
## help("permChacko-package")
## browseVignettes("permChacko")

reduceVector(chacko66_sec3)

## 
## Original vector: 10  16  14  12  18
## Reduced vector : 10  14  18

reduceVector(chacko66_sec5)

## 
## Original vector: 12  14  18  16  22  20  18  24  26  30
## Reduced vector : 12  14  17  20  24  26  30

Notice that Chacko was entirely comfortable with this ordering process ending with a single value. If you look at their table on page 188 then he suggests that under the null hypothesis, if you start with a list of 5 values, then you have a 20% chance that the order process results in a single value.

The aforementioned table is available both in Chacko (1963) and Chacko (1966), as well as in this package:

chacko63_tab1

##           k = 3    k = 4    k = 5    k = 6    k = 7    k = 8    k = 9   k = 10
## m = 1  0.333333 0.250000 0.200000 0.166667 0.142857 0.125000 0.111111 0.100000
## m = 2  0.500000 0.458333 0.416667 0.380556 0.350000 0.324107 0.301984 0.282897
## m = 3  0.166667 0.250000 0.291667 0.312500 0.322222 0.325694 0.325519 0.323165
## m = 4        NA 0.041667 0.083333 0.118055 0.145833 0.167882 0.185417 0.199427
## m = 5        NA       NA 0.008333 0.020833 0.034722 0.048611 0.061863 0.074219
## m = 6        NA       NA       NA 0.001389 0.004167 0.007986 0.012500 0.017436
## m = 7        NA       NA       NA       NA 0.000198 0.000694 0.001505 0.002604
## m = 8        NA       NA       NA       NA       NA 0.000025 0.000099 0.000240
## m = 9        NA       NA       NA       NA       NA       NA 0.000003 0.000012
## m = 10       NA       NA       NA       NA       NA       NA       NA 0.000000

Even if the outcome is a single value, the test statistic can be calculated from equation 5 on page 188, i.e.:

$\bar\chi^2 = \frac{k}{n} \sum_{j = 1}^m t_j \left[ \bar x_{t_j} - \frac{n}{k}\right]^2$

In the equation above, $\bar x_{t_j}$ are the values in the reduced vector, $n = \sum_{i = 1}^k x_i$ , $m$ is the number of elements in the final reduced vector $\bar x$ . The statistic $\bar\chi^2$ is asymptotically distributed as $\chi^2(m - 1)$ .

The question then becomes how to obtain a $p$ -value associated with this test statistic.

Calculating $p$ -values

Given that $\bar\chi^2 \sim \chi^2(m - 1)$ , an analytical $p$ -value can be calculated for the test statistic for any $m > 1$ . Moreover, for certain combinations of $m$ and $k$ the $p$ -value can be calculated using the table shown below. Finally, a numeric $p$ -value can be calculated through computer simulation, which will be explained later. All these p-values are calculated using the permChacko() function.

permChacko(chacko66_sec3)

## 
##         Chacko Test for Order-restriction with Permutation Test
## 
## Null hypothesis           : p1 == p2 == p3 == p4 == p5
## Alternative hypothesis    : p1 <= p2 <= p3 <= p4 <= p5
## 
## Test statistic (chisq_bar): 2.285714
## p-values:
##   Analytic p-value        : 0.318907
##   Numeric p-value         : 0.195000 (1000 permutations)
##   Numeric mid-p value     : 0.192500 (1000 permutations)
##   Tabular p-value         : 0.196052

permChacko(chacko66_sec5)

## 
##         Chacko Test for Order-restriction with Permutation Test
## 
## Null hypothesis           : p1 == p2 == ... == p9 == p10
## Alternative hypothesis    : p1 <= p2 <= ... <= p9 <= p10
## 
## Test statistic (chisq_bar): 13.500000
## p-values:
##   Analytic p-value        : 0.035748
##   Numeric p-value         : 0.003000 (1000 permutations)
##   Numeric mid-p value     : 0.003000 (1000 permutations)
##   Tabular p-value         : 0.002294

One omission from Chacko (1966) is the event where $m = 1$ . As discussed above, they clearly expect that this will happen sometimes, but provide no explanation as to how to calculate the $p$ -value in this case. In this package, we suggest an approach to calculate this value using a permutation test.

If you look at Chacko’s final example on how they evaluate significance, they say that you reject the null hypothesis if the observed calculated value obtained from the equation for $\bar\chi^2$ above is greater than the value $c$ obtained from equation (6) in the paper (reproduced below).

$\alpha = \sum_{m = 2}^k p_{m, k} \cdot p\left[\chi^2_{(m - 1)} \geq c\right]$

This approach does work when $m = 1$ , but seems very cumbersome, requiring calculation of the probability values $p_{m, k}$ , which in turn requires consultation or partial recreation of the Chacko table. With the computing power available to us, 60 years into the future from Chacko’s seminal work, we can simply obtain a close-enough $p$ -value using a permutation test.

Permutation test

Imagine that the sum of our original $k$ values is $n$ . Let a permutation be a stochastic distribution of $n$ objects (independently) across the $k$ categories (with each one being equally likely under the null hypothesis). For each such permutation we can go through Chacko’s ordering procedure and calculate the test statistic according to equation 5.

The $p$ -value in this case is simply the fraction of such permutations that yield a test statistic equal to or greater than the one we originally observed.

Chacko’s ordering process

Calculating pp-values

Permutation test

Calculating $p$ -values