Darwin Airport Temperature Data in Context

(download)

Previously, I attempted to recreate Willis Eschenbach's analysis of Darwin, Australia's historical temperature record.1 This recreation confirmed that a slight cooling trend in Darwin's combined temperature record became a pronounced warming trend after the Global Historical Climate Network (GHCN) statistically adjusted a subset of Darwin's individual records. What was left unanswered was whether such adjustments, originally intended to control for changes in land use and measurement practices, were, en masse, introducing or inflating a global warming trend.

In order to quantify the impact of the aforementioned adjustments, I analyze trends in temperature records taken from thousands of weather stations. More specifically, I examine the change in each record's trend due to the GHCN's adjustment process. I then use the results to place the adjustments made to Darwin's temperature records in context. As usual, I rely on R, a statistical programming language, for its ability to automate statistical operations. Follow along, step-by-step, and draw your own conclusions about the GHCN's adjustment process.

Computing the Trends

We begin our analysis with the assumption that raw and adjusted datasets, each containing yearly GHCN mean temperature data, have already been loaded into R.2 Next, we define a function capable of computing a trend for each temperature record present within a specified dataset.

> ComputeTrendsByRecord <- function(data) {
    # Returns a dataset containing the linear trend of each temperature
    # record in the input dataset.
    #
    # data: A dataset containing yearly GHCN mean temperature data.
    #
    trend.data <- data.frame(Id=character(), Rec=factor(), Trend=numeric(),
        Length=numeric())
    for (station.id in unique(data$Id)) {
        station.data <- subset(data, Id == station.id)
        station.recs <- split(station.data, f=station.data$Rec, drop=TRUE)
        for (rec in station.recs) {
            y <- ts(rec$MeanTemp, start=rec[1, "Year"])
            x <- time(y)
            b <- coef(lm(y ~ x))["x"]
            names(b) <- NULL
            trend.data <- rbind(trend.data,
                data.frame(Id=station.id, Rec=rec[1, "Rec"], Trend=b,
                    Length=length(y)))
        }
    }
    return(trend.data)
}

Once the above function is defined, we start computing trends.

> raw.trend.data <- ComputeTrendsByRecord(raw.data)
> adj.trend.data <- ComputeTrendsByRecord(adj.data)

With R's system.time() function, we can measure how long it takes to compute the trends. Doing so leads to the realization that computing thousands of linear regressions is very resource intensive. For instance, on an AMD Turion64 ML-32 CPU at 1.8 GHz with 384 MB system RAM running Ubuntu 8.04 it takes an elapsed time of 40 minutes to compute the raw trends and 18 minutes to compute the adjusted trends. It is faster to compute the adjusted trends because every temperature record has a raw version, but not every record has a corresponding adjusted version.3 Our aim now is to pair the trends by record via a merge.

> paired.trend.data <- merge(raw.trend.data, adj.trend.data,
    by=c("Id", "Rec"))

The head() function gives us a glimpse of what the resultant dataset looks like, for example:

> head(paired.trend.data)
           Id Rec      Trend.x Length.x      Trend.y Length.y
1 10160390000   0 -0.001393259      119  0.015645415      119
2 10160390000   1  0.013603567       64  0.043426557       64
3 10160390000   2  0.021083753       56 -0.006408558       56
4 10160390000   3  0.013808714       42  0.026670411       42
5 10160395001   0  0.038976221       56 -0.001259622       56
6 10160419000   0 -0.018078076       49 -0.018078076       49

Here, columns with .x and .y suffixes denote temperature record attributes calculated on raw and adjusted data, respectively. Our last manipulation is to add a new column that captures how much each record's trend has changed going from a pre-adjusted state to a post-adjusted state.

> paired.trend.data$DeltaTrend <- (paired.trend.data$Trend.y -
    paired.trend.data$Trend.x) * 10 # degC/decade

It should be noted that each change (delta) in trend is multiplied by a factor of 10 so that it is expressed in degrees Celsius per decade. Finally, our approach to computing trends is obviously inefficient. For instance, we expend a lot of wasted effort calculating trends for raw temperature records that have no corresponding adjusted version. Improving the code is left as an exercise for the reader.

Graphing the Trends

The best way to visualize the deltas that we have just calculated is via a histogram.

> hist(paired.trend.data$DeltaTrend,
    las=1,
    main=paste("Histogram of Changes in the Trends of",
        nrow(paired.trend.data), "Records\n",
        "due to the GHCN's Adjustment Process"),
    nclass=200,
    xlab="Degrees Celsius per Decade")

Our histogram is not complete, though. We need to draw a series of vertical lines through it, each line marking the location of a delta associated with one of Darwin Airport's temperature records. At this point, it is important to remember that Darwin Airport's station identifier is "50194120000".

> darwin.paired.trend.data <- subset(paired.trend.data,
    Id == "50194120000")
> abline(v=darwin.paired.trend.data$DeltaTrend,
    col=c("blue", "purple", "red"))
> legend("right",
    fill=c("blue", "purple", "red"),
    inset=0.1,
    legend=darwin.paired.trend.data$Rec,
    title="Darwin Record")

The final product is shown in Fig. 1 above.

Results

From Fig. 1, it is clear that the change in the trend of Darwin Record Zero (blue), Eschenbach's "smoking gun",4 is extreme; however, it is not extremal. There are 574 records with deltas greater than Darwin Record Zero, including Darwin Record Two (red). There are also 453 records that have had their trends adjusted downward more than Darwin Record Zero has had its trend adjusted upward. Of the 6,736 deltas calculated, 3,317 are greater than zero and 2,574 are less than zero. There is evidence that this difference is not due to chance alone (sign test, P<0.001).5 The mean change in trend is +0.017°C per decade (t-test, P<0.001).6 The mean change in trend weighted by record length in years is +0.018°C per decade. Finally, Darwin's temperature records are summarized as follows:

Darwin's Deterministic Trends (°C/dec)
Record Raw Adjusted Delta Years
0 -0.07150741 0.163448332 0.23495574 110
1 0.02683280 0.004114076 -0.02271872 69
2 0.15437369 0.414611679 0.26023798 42
3 0.31134085 NA NA 20
4 -0.41069865 NA NA 9

Our results replicate those found elsewhere.7, 8

An Optimistic Interpretation

An optimistic interpretation of our results is that no "smoking gun" exits. In other words, even if the adjustments to Darwin Record Zero seem bizarre, the fact is that "most adjustment[s] hardly modify the reading, and the warming and cooling adjustments end up compensating each other."9 Furthermore, the whole point of the GHCN's adjustment process is to eliminate biases in the temperature data. In this regard, the mean change in trend of +0.017°C per decade is a sign that the GHCN's adjustment process is doing what it is supposed to do: separate the signal of global warming from the noise of local, non-climatic effects.

A Skeptical Interpretation

A skeptical interpretation of our results is that the GHCN's adjustment process is not just offsetting existing biases, but creating new ones. Thus, the "smoking gun" is not the upward adjustment of Darwin Record Zero's temperature trend but the incongruous treatment of Darwin Record One and Darwin Record Two. As shown in Fig. 2, the raw version of Darwin Record Two (red) is practically a subsequence of Darwin Record One (purple). Yet, after going through the GHCN's adjustment process, the two records barely overlap. If the purpose of the GHCN's adjustment process is to correct for local changes affecting a temperature station, then, in the absence of any station metadata to the contrary, similar records should be adjusted similarly. Moreover, an effect size of +0.017°C per decade is not insignificant as it represents 31% of the +0.054°C per decade global temperature trend from 1850-2005.10

Errata

In working with the GHCN Version 2 database, I came across the following problems:

Gaps/Deletions
Some temperature records have gaps in which whole years are missing. Arguably, when such a gap is present, the offending record should probably be split into two separate records. Also, it should be noted that gaps subvert R's time series facilities since time series objects are supposed to represent observations over consecutive years.
Insertions/Repetitions
Some adjusted temperature records have extraneous observations that do not appear in their corresponding raw versions. See records from stations "30187860001", "40371826000", "40878662001", "41476726000", and "80099910001" for more details.

  1. Markus, M. (2010-01-12). Darwin Airport Temperature Data Revisited. Personal blog.
  2. Markus, M. (2010-01-12). See the section on "The Data Reloaded".
  3. This statement should be true; however, it appears that Kuwait International Airport (i.e. "21240582000") has a ghost. That is, there is a temperature record, Record Three to be exact, for which there is an adjusted version but no original version.
  4. Eschenbach, W. (2009-12-08). The Smoking Gun At Darwin Zero. Watts Up With That?
  5. Whitely, E., & Ball, J. (2002). Statistics review 6: Nonparametric methods. Critical Care, 6, 509-513. [PMID: 12493072]. Here we use a nonparametric sign test (with zeros discarded) lest someone accuse us of completely ignoring the distribution's deviation from normality. In R, use the binom.test() function.
  6. Conducting a Student's t-Test is reasonable given the paired nature of the data, the relative symmetry of the distribution, and the large sample size. In R, use the t.test() function.
  7. Gilestro, G. (2009-12-12). Lots of smoke, hardly any gun. Do climatologists falsify data? Personal blog. See also the accompanying results.txt file.
  8. Stokes, N. (2009-12-21). Darwin and GHCN Adjustments. Moyhu.
  9. Gilestro, G. (2009-12-12).
  10. Trenberth, K. E., et al. (2007). Observations: Surface and Atmospheric Climate Change. In S. Solomon, et al. (Eds.), IPCC Fourth Assessment Report (pp. 235-336). New York: Cambridge University Press. See Table 3.2 on p. 243.