Darwin Airport Temperature Data Revisited

 

(download)

Climatological organizations often statistically adjust temperature records in order to offset local, non-climatic effects. Herein, I use R, a statistical programming language, to recreate Willis Eschenbach's analysis of some questionable adjustments made to the historical temperature record of Darwin, Australia by the Global Historical Climate Network (GHCN).1 This post builds upon my previous work with the GHCN Version 2 database;2 however, I have tried to make it relatively self-contained so that one can proceed without too much backtracking.

The Data Reloaded

Assuming that we have the relevant temperature data files (i.e. v2.mean and v2.mean_adj) in our working directory,3 we begin a new R session. Next, we copy and paste the following function into the R environment:

> LoadGHCN <- function(filename, na.rm=FALSE) {
    # Returns a dataset containing yearly GHCN mean temperature data.
    #
    # filename: The name of a file containing monthly GHCN mean
    #           temperature data.
    #    na.rm: Strip missing values before calculating yearly means?
    #   
    months <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep",
        "Oct", "Nov", "Dec")
    colw <- c(11, 1, 4, rep(5, 12))
    coln <- c("Id", "Rec", "Year", months)
    colc <- c("character", "factor", "numeric", rep("numeric", 12))
    data <- read.fwf(filename, colw, col.names=coln, na.strings="-9999",
        colClasses=colc)
    data$MeanTemp <- rowMeans(data[months], na.rm=na.rm)/10 # Celsius
    return(data[c("Id", "Rec", "Year", "MeanTemp")])
}

This function is just some refactored code taken from my previous post on Darwin and put into a reusable form. It is designed to read monthly GHCN mean temperature data from a file and transform said data into a more utilitarian representation. One important aspect of this function is that it allows one to specify, via the na.rm parameter, how missing monthly values should be handled when computing each yearly mean. Normally, in statistical programming languages like R and SAS, a calculation on a missing value (i.e. NA) results in a missing value. Such a conservative approach is usually desired. In our case, though, such an approach is too conservative since the monthly GHCN mean temperature data is interspersed with missing values. Thus, we load the raw and adjusted temperature data like so:

> raw.data <- LoadGHCN("v2.mean", na.rm=TRUE)
> adj.data <- LoadGHCN("v2.mean_adj", na.rm=TRUE)

It should be noted that there are more intelligent ways to handle missing values. For instance, one could use the complete.cases() function to determine the pattern of missing values per row. One could then opt to assign a missing value to a yearly mean if the row of monthly temperature values upon which it is based contains a sequence of missing values. Improving the LoadGHCN() function is left as an exercise for the reader.

Eschenbach's Darwin

Willis Eschenbach, a climate change skeptic, has created a plot, shown in Fig. 1 above, that depicts the effects of GHCN adjustments to Darwin Airport's combined temperature record.4 Our goal is to use R to recreate this plot. The first obstacle we encounter is the fact that there are several distinct temperature records for Darwin Airport. More specifically, Darwin's raw temperature data consists of five records, three of which are present in homogenized form in its adjusted dataset. We need to compute an average temperature record for Darwin on a per dataset basis. To help us do this, we define the following function:

> AverageTemperatureRecords <- function(data, station.id) {
    # Returns a time series that is the average of all of the temperature
    # records for the given temperature station.
    #
    #       data: A dataset containing yearly GHCN mean temperature data.
    # station.id: An eleven-character temperature station identifier.
    #
    station.data <- subset(data, Id == station.id)
    station.recs <- split(station.data, f=station.data$Rec, drop=TRUE)
    station.ts <- NULL
    for (rec in station.recs) {
        station.ts <- ts.union(station.ts,
            ts(rec$MeanTemp, start=rec[1, "Year"]))
    }
    if (!is.null(dim(station.ts))) {
        station.ts <- ts(rowMeans(station.ts, na.rm=TRUE),
            start=start(station.ts))
    }
    return(station.ts)
}

Now, recalling that Darwin Airport's station identifier is "50194120000", we average the temperature records in each dataset to create two separate time series.

> raw.darwin.ts <- AverageTemperatureRecords(raw.data, "50194120000")
> adj.darwin.ts <- AverageTemperatureRecords(adj.data, "50194120000")

The next problem we must overcome is that the temperature time series in Fig. 1 are presented in terms of temperature anomalies. Temperature anomalies are departures from a common baseline. In climatology, the baseline is usually a global mean temperature calculated over some localized time period (e.g. 1971–2000).5 Nevertheless, in Fig. 1, the time series are normalized with respect to two different baselines so that they both can start from zero. Eschenbach claims that this is done to highlight the effect of the adjustments.

> eschenbach.raw.ts <- raw.darwin.ts - raw.darwin.ts[1]
> eschenbach.adj.ts <- adj.darwin.ts - adj.darwin.ts[1]

Once we have the normalized time series, we can plot them.

> ts.plot(eschenbach.adj.ts, eschenbach.raw.ts,
    col=c("red", "blue"),
    gpars=list(pch=20),
    main="GHCN Raw & Adj Temperatures Darwin Airport",
    type="o",
    ylab="Temperature Anomaly (C)")

The result forms the foundation of Fig. 2 above. We are not done yet, though. We need to add trend lines. To start, we use the lm() function to fit a straight-line model to each time series. In other words, we are just doing some simple linear regressions with time as the independent variable and temperature anomaly as the dependent variable. Note that the lm() function's na.action parameter must be set to NULL to ensure that a regression's residuals and fitted values are returned as time series.

> eschenbach.raw.reg <- lm(eschenbach.raw.ts ~ time(eschenbach.raw.ts),
    na.action=NULL)
> eschenbach.adj.reg <- lm(eschenbach.adj.ts ~ time(eschenbach.adj.ts),
    na.action=NULL)

Using the coef() function on the output of the first regression, we get an estimated slope for the straight-line model of -0.007124129, which translates into a cooling of 0.007° C per year or 0.7° C per century. This agrees with the trend that Eschenbach reports for the raw temperature time series in Fig. 1. The second regression yields an estimated slope of 0.01905713, indicative of a 1.9° C per century warming. Eschenbach, on the other hand, reports a 1.2° C per century warming for the adjusted temperature time series in Fig. 1. Despite this discrepancy, we plot the regressions' fitted values so as to produce the trend lines shown in Fig. 2.

> lines(fitted(eschenbach.adj.reg), col="red")
> lines(fitted(eschenbach.raw.reg), col="blue")

The last thing we have to do is overlay the amount of adjustment.

> lines((eschenbach.adj.ts - eschenbach.raw.ts), col="black")

We finally have Fig. 2, which looks very similar to Fig. 1, Eschenbach's original plot. The biggest difference between the two figures is the fact that, in Fig. 1, the black line tracking the amount of adjustment is plotted against a scale on the right-hand side. One gripe is that Eschenbach should have calculated the amount of adjustment as (adj.darwin.ts - raw.darwin.ts). We can confirm this by looking at Fig. 3, a plot of the original time series, which shows us that the adjusted time series starts off 2.34° C below the raw time series and that the series eventually converge.

A Smoking Gun?

So, are the GHCN's adjustments inappropriately introducing or inflating a global warming trend? Is Darwin a smoking gun? The climate change skeptic may view the adjustments made to Darwin with some suspicion. Yet, there really is no way to contextualize the adjustments made to Darwin without looking at other temperature stations.


  1. Eschenbach, W. (2009-12-08). The Smoking Gun At Darwin Zero. Watts Up With That?
  2. Markus, M. (2009-12-23). Raw Darwin Airport Temperature Data with R. Personal blog.
  3. Markus, M. (2009-12-23). See the section on "Getting the Data".
  4. Eschenbach, W. (2009-12-08). Figure 7.
  5. Global Surface Temperature Anomalies. (2009-12-11). NOAA.