Raw Darwin Airport Temperature Data with R

 

(download)

Traditionally, I have done statistical analyses using SAS. SAS provides an analyst with a pretty good environment; however, the system is expensive and programming via the SAS Macro Language leaves much to be desired. R, a freely available statistical programming language, is the open source response to SAS. Up until now, I have not done much programming in R. One reason for my lack of programming in R is that I don't have any interesting data to analyze. Forutnately, a series of tit-for-tat blog posts1,2,3,4 discussing the temperature record for Australia's Darwin Airport has changed this situation. My goal here is to use R to recreate Fig. 1,5 as shown above. I hope that you will follow along.

Getting the Data

First, we need to get the temperature record for Darwin Aiport. This data is contained in the Global Historical Climate Network (GHCN) Version 2 database, which is available for download from the National Climatic Data Center (NCDC). According to the NCDC, the GHCN database "contains historical temperature, precipitation, and pressure data for thousands of land stations worldwide."6 Since we are just interested in the temperature data, all we have to do is download the following files:

v2.mean.Z
A compressed file containing monthly mean station temperatures.
v2.mean_adj.Z
A compressed file containing monthly mean station temperatures statistically adjusted to offset "nonclimatic discontinuities" such as changes in instrument shelters or local station environments.
v2.temperature.inv
An inventory file containing station metadata.

It should be noted that I downloaded all of the above files on December 18th, 2009.

Preparing the Data

Now that we have the temperature data, we need to uncompress it. On UNIX, this can be done using the uncompress command.

> uncompress v2.mean.Z
> uncompress v2.mean_adj.Z

After uncompressing the files, each file should be larger and no longer have the ".Z" ending. Next, we need to find the identifier for the Darwin Airport station by searching the inventory file using the grep command.

> grep "DARWIN" v2.temperature.inv
50194120000 DARWIN AIRPOR                  -12.40  130.87   30   17U   56FLxxCO 3A 1WATER           C

Here, the identifier is the eleven-digit number at the beginning of the row (i.e. "50194120000"). It consists of a country code (3 digits), station number (5 digits), and a modifier (3 digits). We can also confirm that we have the correct station by noting that the Darwin Airport is indeed located at 12°40'S latitude and 130°87'E longitude.

Loading the Data into R

Once we have the station identifier, we can start up our R environment7 and then load the raw temperature data from the v2.mean file into it via the read.fwf command. We can learn more about that command, or any other R command, by prefixing it with a question mark.

> ?read.fwf

Since the v2.mean file has a fixed-width format, we must specify the width, name, and class of each of its columns. Additionally, we need to tell R that missing monthly temperature values are coded as -9999.

> months <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep",
    "Oct", "Nov", "Dec")
> colw <- c(11, 1, 4, rep(5, 12))
> coln <- c("Id", "Rec", "Year", months)
> colc <- c("character", "factor", "numeric", rep("numeric", 12))
> raw <- read.fwf("v2.mean", colw, col.names=coln, na.strings="-9999",
    colClasses=colc)

It may take awhile for R to load all of the data. After R finishes loading the data, we can use the summary and attributes commands to verify that the data was loaded correctly. R reports that the dataset that I am using has 595,759 rows.

Plotting the Data

Before we can actually plot the raw Darwin Airport temperature data, we must first calculate yearly mean station temperatures. We can accomplish this by using the rowMeans command to average all of the monthly temperatures on a per row (i.e. year) basis. It should be noted that each row mean is divided by a factor of 10 in order to convert it into degrees Celsius.

> raw$RawMeanTemp <- rowMeans(raw[, months], na.rm=TRUE)/10
> raw.globe <- raw[, c("Id", "Rec", "Year", "RawMeanTemp")]

Next, we must subset out the temperature data for Darwin Airport, split this data into the five individual records that make up the overall Darwin Airport temperature record, and create time series8 out of each these records.

> raw.darwin <- subset(raw.globe, Id == "50194120000")
> raw.darwin.recs <- split(raw.darwin, f=raw.darwin[, "Rec"], drop=TRUE)
> for (i in 1:length(raw.darwin.recs)) {
    varname <- paste("rec", i, sep=".")
    assign(varname, ts(raw.darwin.recs[[i]][, "RawMeanTemp"],
        start=raw.darwin.recs[[i]][1, "Year"]))
}

Finally, all that's left for us to do is to plot the time series.

> ts.plot(rec.1, rec.2, rec.3, rec.4, rec.5,
    col=c("blue", "purple", "red", "pink", "lightblue"),
    main="Fig.2 - Five Raw Darwin Temperature Records from GHCN",
    type="o",
    ylab="Temp (C)")

The result is shown in Fig. 2 above. It's not pretty, but it looks like it agrees with Fig. 1 and it should provide us with a good starting point for future analyses.


  1. Eschenbach, W. (2009-12-08). The Smoking Gun At Darwin Zero. Watts Up With That?
  2. Scepticism's limits. (2009-12-11). The Economist.
  3. Eschenbach, W. (2009-12-12). Willis: Reply to the Economist. Watts Up With That?
  4. Climate manipulation gun still notably smoke-free. (2009-12-14). The Economist.
  5. Eschenbach (2009-12-08). Figure 5.
  6. GHCN-Monthly Version 2. (2008-08-20). NOAA.
  7. R version 2.6.2 (2008-02-08). On Ubuntu, you can install R via the apt-get utility. You will want to install the r-base and r-base-dev packages.
  8. Shumway, R. H., & Stoffer, D. S. (2006). An R Time Series Tutorial. For, Time Series Analysis and Its Applications: With R Examples (2nd Ed.). New York: Springer-Verlag.