Cochran-Armitage Test for Trend with R
Suppose, for a set of entities randomly sampled from a population, you record observations about two variable features of each entity. Furthermore, suppose that one of these variables can take on two levels, or values, and the other can take on k levels. Given such categorical data, you may wonder if there exists an association between the two variables. In order to answer this question, you could use a chi-square test. This test is appropriate in most situations; however, if the k levels of your second variable have a natural ordering, and you suspect that this ordering has an effect on the first variable, you might want to use the Cochran-Armitage test.
Perhaps the best way to understand the Cochran-Armitage test is through an example. Below is a 2 x k contingency table, where k=4, that shows screening mammography attendance by time since last visit to a general practitioner for N=278 patients:1
| Time since Last Visit | ||||
|---|---|---|---|---|
| Attendance | <6 mo. | 6-12 mo. | 1-2 yr. | >2 yr. |
| No | 97 | 31 | 36 | 28 |
| Yes | 59 | 10 | 12 | 5 |
From this table it is easy to speculate that the proportion of patients attending screening mammography decreases as time since their last office visit increases. The putative linear trend is best visualized via the following conditional relative frequencies:
| Time since Last Visit | ||||
|---|---|---|---|---|
| Attendance | <6 mo. | 6-12 mo. | 1-2 yr. | >2 yr. |
| No | 0.622 | 0.756 | 0.750 | 0.848 |
| Yes | 0.378 | 0.244 | 0.250 | 0.152 |
Now all that is left to do is test for a linear trend in proportions. Using R, a statistical programming language,2 this is done as follows:
# # Create patient observations. # patient.data <- data.frame( Attendance=factor(c(rep("No", 192), rep("Yes", 86))), Last.Visit=ordered(c( rep("<6 mo", 97), rep("6-12 mo", 31), rep("1-2 yr", 36), rep(">2 yr", 28), rep("<6 mo", 59), rep("6-12 mo", 10), rep("1-2 yr", 12), rep(">2 yr", 5)), levels=c("<6 mo", "6-12 mo", "1-2 yr", ">2 yr"))) # # Make a contingency table just like Table 1. # table.1 <- table(patient.data) print(table.1) # # Do a Cochran-Armitage test for a linear trend in proportions. # prop.trend.test( x=table.1["Yes", ], n=margin.table(table.1, margin=2), # sum by column score=c(3, 2, 1, 0))
Running the R code reveals that the observed linear trend is unlikely to be due to chance alone (P=0.004).
One nice thing about the Cochran-Armitage test is that it can be tuned to detect different types of trends. In R, tuning is done by setting the score parameter of the prop.trend.test() function. For instance, score=c(0, 1, 1, 0) would be suitable for detecting a non-monotonic, umbrella-shaped trend. The score parameter is also useful when analyzing a disease-by-genotype table since it allows one to test for an allele's mode of inheritance (e.g. dominant, co-dominant, or recessive).
In general, the Cochran-Armitage test is preferable to the chi-square test when dealing with a suspected trend because it has more power to detect said trend if it exists. The Cochran-Armitage test's additional power arises from the fact that, unlike the chi-square test, it is not sensitive to all departures from a null hypothesis of equal proportions. Thus, be aware that it is possible for the Cochran-Armitage test to miss an association between two variables.
- Armitage, P., Berry, G., & Matthews, J. N. S. (2002). Statistical Methods in Medical Research (4th ed.). Oxford: Blackwell Science. See p. 506.
- R Development Core Team. (2008). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.