Previously, I attempted to recreate Willis Eschenbach's analysis of Darwin, Australia's historical temperature record.1 This recreation confirmed that a slight cooling trend in Darwin's combined temperature record became a pronounced warming trend after the Global Historical Climate Network (GHCN) statistically adjusted a subset of Darwin's individual records. What was left unanswered was whether such adjustments, originally intended to control for changes in land use and measurement practices, were, en masse, introducing or inflating a global warming trend.
In order to quantify the impact of the aforementioned adjustments, I analyze trends in temperature records taken from thousands of weather stations. More specifically, I examine the change in each record's trend due to the GHCN's adjustment process. I then use the results to place the adjustments made to Darwin's temperature records in context. As usual, I rely on R, a statistical programming language, for its ability to automate statistical operations. Follow along, step-by-step, and draw your own conclusions about the GHCN's adjustment process.
Computing the Trends
We begin our analysis with the assumption that raw and adjusted datasets, each containing yearly GHCN mean temperature data, have already been loaded into R.2 Next, we define a function capable of computing a trend for each temperature record present within a specified dataset.
> ComputeTrendsByRecord <- function(data) {
# Returns a dataset containing the linear trend of each temperature
# record in the input dataset.
#
# data: A dataset containing yearly GHCN mean temperature data.
#
trend.data <- data.frame(Id=character(), Rec=factor(), Trend=numeric(),
Length=numeric())
for (station.id in unique(data$Id)) {
station.data <- subset(data, Id == station.id)
station.recs <- split(station.data, f=station.data$Rec, drop=TRUE)
for (rec in station.recs) {
y <- ts(rec$MeanTemp, start=rec[1, "Year"])
x <- time(y)
b <- coef(lm(y ~ x))["x"]
names(b) <- NULL
trend.data <- rbind(trend.data,
data.frame(Id=station.id, Rec=rec[1, "Rec"], Trend=b,
Length=length(y)))
}
}
return(trend.data)
}
Once the above function is defined, we start computing trends.
> raw.trend.data <- ComputeTrendsByRecord(raw.data)
> adj.trend.data <- ComputeTrendsByRecord(adj.data)
With R's system.time() function, we can measure how long it takes to compute the trends. Doing so leads to the realization that computing thousands of linear regressions is very resource intensive. For instance, on an AMD Turion64 ML-32 CPU at 1.8 GHz with 384 MB system RAM running Ubuntu 8.04 it takes an elapsed time of 40 minutes to compute the raw trends and 18 minutes to compute the adjusted trends. It is faster to compute the adjusted trends because every temperature record has a raw version, but not every record has a corresponding adjusted version.3 Our aim now is to pair the trends by record via a merge.
> paired.trend.data <- merge(raw.trend.data, adj.trend.data,
by=c("Id", "Rec"))
The head() function gives us a glimpse of what the resultant dataset looks like, for example:
> head(paired.trend.data)
Id Rec Trend.x Length.x Trend.y Length.y
1 10160390000 0 -0.001393259 119 0.015645415 119
2 10160390000 1 0.013603567 64 0.043426557 64
3 10160390000 2 0.021083753 56 -0.006408558 56
4 10160390000 3 0.013808714 42 0.026670411 42
5 10160395001 0 0.038976221 56 -0.001259622 56
6 10160419000 0 -0.018078076 49 -0.018078076 49
Here, columns with .x and .y suffixes denote temperature record attributes calculated on raw and adjusted data, respectively. Our last manipulation is to add a new column that captures how much each record's trend has changed going from a pre-adjusted state to a post-adjusted state.
> paired.trend.data$DeltaTrend <- (paired.trend.data$Trend.y -
paired.trend.data$Trend.x) * 10 # degC/decade
It should be noted that each change (delta) in trend is multiplied by a factor of 10 so that it is expressed in degrees Celsius per decade. Finally, our approach to computing trends is obviously inefficient. For instance, we expend a lot of wasted effort calculating trends for raw temperature records that have no corresponding adjusted version. Improving the code is left as an exercise for the reader.
Graphing the Trends
The best way to visualize the deltas that we have just calculated is via a histogram.
> hist(paired.trend.data$DeltaTrend,
las=1,
main=paste("Histogram of Changes in the Trends of",
nrow(paired.trend.data), "Records\n",
"due to the GHCN's Adjustment Process"),
nclass=200,
xlab="Degrees Celsius per Decade")
Our histogram is not complete, though. We need to draw a series of vertical lines through it, each line marking the location of a delta associated with one of Darwin Airport's temperature records. At this point, it is important to remember that Darwin Airport's station identifier is "50194120000".
> darwin.paired.trend.data <- subset(paired.trend.data,
Id == "50194120000")
> abline(v=darwin.paired.trend.data$DeltaTrend,
col=c("blue", "purple", "red"))
> legend("right",
fill=c("blue", "purple", "red"),
inset=0.1,
legend=darwin.paired.trend.data$Rec,
title="Darwin Record")
The final product is shown in Fig. 1 above.
Results
From Fig. 1, it is clear that the change in the trend of Darwin Record Zero (blue), Eschenbach's "smoking gun",4 is extreme; however, it is not extremal. There are 574 records with deltas greater than Darwin Record Zero, including Darwin Record Two (red). There are also 453 records that have had their trends adjusted downward more than Darwin Record Zero has had its trend adjusted upward. Of the 6,736 deltas calculated, 3,317 are greater than zero and 2,574 are less than zero. There is evidence that this difference is not due to chance alone (sign test, P<0.001).5 The mean change in trend is +0.017°C per decade (t-test, P<0.001).6 The mean change in trend weighted by record length in years is +0.018°C per decade. Finally, Darwin's temperature records are summarized as follows:
Darwin's Deterministic Trends (°C/dec)
| Record |
Raw |
Adjusted |
Delta |
Years |
| 0 |
-0.07150741 |
0.163448332 |
0.23495574 |
110 |
| 1 |
0.02683280 |
0.004114076 |
-0.02271872 |
69 |
| 2 |
0.15437369 |
0.414611679 |
0.26023798 |
42 |
| 3 |
0.31134085 |
NA |
NA |
20 |
| 4 |
-0.41069865 |
NA |
NA |
9 |
Our results replicate those found elsewhere.7, 8
An Optimistic Interpretation
An optimistic interpretation of our results is that no "smoking gun" exits. In other words, even if the adjustments to Darwin Record Zero seem bizarre, the fact is that "most adjustment[s] hardly modify the reading, and the warming and cooling adjustments end up compensating each other."9 Furthermore, the whole point of the GHCN's adjustment process is to eliminate biases in the temperature data. In this regard, the mean change in trend of +0.017°C per decade is a sign that the GHCN's adjustment process is doing what it is supposed to do: separate the signal of global warming from the noise of local, non-climatic effects.
A Skeptical Interpretation
A skeptical interpretation of our results is that the GHCN's adjustment process is not just offsetting existing biases, but creating new ones. Thus, the "smoking gun" is not the upward adjustment of Darwin Record Zero's temperature trend but the incongruous treatment of Darwin Record One and Darwin Record Two. As shown in Fig. 2, the raw version of Darwin Record Two (red) is practically a subsequence of Darwin Record One (purple). Yet, after going through the GHCN's adjustment process, the two records barely overlap. If the purpose of the GHCN's adjustment process is to correct for local changes affecting a temperature station, then, in the absence of any station metadata to the contrary, similar records should be adjusted similarly. Moreover, an effect size of +0.017°C per decade is not insignificant as it represents 31% of the +0.054°C per decade global temperature trend from 1850-2005.10
Errata
In working with the GHCN Version 2 database, I came across the following problems:
- Gaps/Deletions
- Some temperature records have gaps in which whole years are missing. Arguably, when such a gap is present, the offending record should probably be split into two separate records. Also, it should be noted that gaps subvert R's time series facilities since time series objects are supposed to represent observations over consecutive years.
- Insertions/Repetitions
- Some adjusted temperature records have extraneous observations that do not appear in their corresponding raw versions. See records from stations "30187860001", "40371826000", "40878662001", "41476726000", and "80099910001" for more details.
- Markus, M. (2010-01-12). Darwin Airport Temperature Data Revisited. Personal blog.
- Markus, M. (2010-01-12). See the section on "The Data Reloaded".
- This statement should be true; however, it appears that Kuwait International Airport (i.e. "21240582000") has a ghost. That is, there is a temperature record, Record Three to be exact, for which there is an adjusted version but no original version.
- Eschenbach, W. (2009-12-08). The Smoking Gun At Darwin Zero. Watts Up With That?
- Whitely, E., & Ball, J. (2002). Statistics review 6: Nonparametric methods. Critical Care, 6, 509-513. [PMID: 12493072]. Here we use a nonparametric sign test (with zeros discarded) lest someone accuse us of completely ignoring the distribution's deviation from normality. In R, use the
binom.test() function.
- Conducting a Student's t-Test is reasonable given the paired nature of the data, the relative symmetry of the distribution, and the large sample size. In R, use the
t.test() function.
- Gilestro, G. (2009-12-12). Lots of smoke, hardly any gun. Do climatologists falsify data? Personal blog. See also the accompanying
results.txt file.
- Stokes, N. (2009-12-21). Darwin and GHCN Adjustments. Moyhu.
- Gilestro, G. (2009-12-12).
- Trenberth, K. E., et al. (2007). Observations: Surface and Atmospheric Climate Change. In S. Solomon, et al. (Eds.), IPCC Fourth Assessment Report (pp. 235-336). New York: Cambridge University Press. See Table 3.2 on p. 243.