
It's been a while since I uploaded a chart of global temperature data. Not since I made this graph in 2011 and then before that was this graph from 2010. So it's about time for some graphs. Especially since 2016 was the world's warmest year as well as New Zealand's warmest year.
When I made those charts, I had to do some 'data cleaning' to convert the raw data to tidy data (Wickham, H. 2014 Sept 12. Tidy Data. Journal of Statistical Software. [Online] 59:10), where each variable is a column, each observation is a row, and each type of observational unit is a table. And to convert that table from text format to comma separated values format.
I would have used a spreadsheet program to manually edit and 'tidy' the data files so I could easily use them with the R language. As Roger Peng says, the one rule of reproducible research is "Dont do things by hand! Editing spreadsheet data manually is not reproducible".
There is no 'audit trail' left of how I manipulated the data and created the chart. So after a few years even I can't remember the steps I made back then to clean the data! That then can be a disincentive to update and improve the charts.
However, I have found a couple of cool open and 'tidy' data packages of global temperatures that solve the reproducibility problem. The non-profit Open Knowledge International provides these packages as as part of their core data sets.
One package is the Global Temperature Time Series. From it's web page you can download two temperature data series at monthly or annual intervals in 'tidy' csv format. It's almost up to date with October 2016 the most recent data point. So that's a pretty good head start for my R charts.
But it is better than that. The data is held in a Github repository. From there the data package can be downloaded as a zip file. After unzipping, this includes the csv data files, an open data licence, a read-me file, a .json file and a cool Python script that updates the data from source! I can run the script file on my laptop and it goes off by itself and gets the latest data to November 2016 and formats it into 'tidy' csv format files. This just seems like magic at first! Very cool! No manual data cleaning! Very reproducible!
Here is a screen shot of the Python script running in a an X-terminal window on my Debian Jessie MX-16 operating system on my Dell Inspiron 6000 laptop.
The file "monthly.csv" includes two data series; the NOAA National Climatic Data Center (NCDC), global component of Climate at a Glance (GCAG) and the perhaps more well-known NASA Goddard Institute for Space Studies (GISS) Surface Temperature Analysis, Global Land-Ocean Temperature Index.
I just want to use the NASA GISTEMP data, so there is some R code to separate it out into its own dataframe. The annual data stops at 2015, so I am going to make a new annual data vector with 2016 as the mean of the eleven months to November 2016. And 2016 is surprise surprise the warmest year.
# read csv data file into R | |
temps <- read.csv("/home/simon/R/global-temp-master/data/monthly.csv", skip = 0,header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE,colClasses = c("character","character","numeric")) | |
# examine dataframe | |
str(temps) | |
'data.frame': 3285 obs. of 3 variables: | |
$ Source: chr "GISTEMP" "GCAG" "GISTEMP" "GCAG" ... | |
$ Date : chr "2016-11" "2016-10" "2016-10" "2016-09" ... | |
$ Mean : num 0.950 0.730 0.880 0.874 0.900 ... | |
# create new dataframe of GISTEMP data only | |
gistemps<-temps[temps[["Source"]]=="GISTEMP",] | |
# replace dashes in "Date" column with decimal point | |
gistemps[["Date"]] <- gsub("([-])", "\\.", gistemps[["Date"]]) | |
# convert "Date" variable from character to numeric format | |
gistemps[["Date"]] <- as.numeric(gistemps[["Date"]]) | |
# add "Year" variable to dataframe | |
gistemps[["Year"]] <- round(gistemps[["Date"]],0) | |
# create new dataframe of updated annual means from monthly data | |
gistempsyearly <- aggregate(Mean ~ Year, gistemps, mean) | |
# examine the dataframe | |
str(gistempsyearly) | |
'data.frame': 137 obs. of 2 variables: | |
$ Year: num 1880 1881 1882 1883 1884 ... | |
$ Mean: num -0.203 -0.117 -0.102 -0.203 -0.278 ... |
Here is a simple line chart of the annual means.
Here is a another line chart of the annual means with an additional data series, an eleven-year lowess-smoothed data series.
Here is the R code for the two graphs.
# create Plot 1, a line chart | |
png("/home/simon/R/global-temp-master/gistempsyearly-560by420-2016-12.png", bg="white", width=560, height=420,pointsize = 14) | |
par(mar=c(2.7,2.7,1,1)+0.1) | |
plot(gistempsyearly,ylim=c(-0.6,1.2), xlim=c(1878,2016),tck=0.01,axes=FALSE,ann=FALSE, type="n",las=1) | |
axis(side=1, tck=0.01, las=0,tick=TRUE) | |
axis(side=2, tck=0.01, las=1, at = c(-0.6,-0.4,-0.2, 0,0.2,0.4,0.6,0.8,1), labels = TRUE, tick = TRUE) | |
axis(side=4, tck=0.01, at = c(-0.6,-0.4,-0.2, 0,0.2,0.4,0.6,0.8,1), labels = FALSE, tick = TRUE) | |
lines(gistempsyearly,col="#F32424",lwd=3) | |
mtext(side=1,cex=1,line=-1.2,"Data: NASA Goddard Institute for Space Studies\nhttps://github.com/datasets/global-temp") | |
mtext(side=3,cex=1.5, line=-4,expression(paste("GISTEMP Global Land-Ocean Temperature \nAnomaly 1880 2016")) ) | |
mtext(side=2,cex=1, line=-1.3,"Anomaly from 1951-1980 mean (C)") | |
mtext(side=4,cex=0.75, line=0.05,R.version.string) | |
text(2013,1,adj=1,"2016",cex=1) | |
points(gistempsyearly[["Year"]][137],gistempsyearly[["Mean"]][137],col="#000099",pch=19) | |
box() | |
dev.off() | |
# create Plot 2, a line chart with a lowess smoother | |
png("/home/simon/R/global-temp-master/gistempsyearlylowess-560by420-2016.png", bg="white", width=560, height=420,pointsize = 14) | |
par(mar=c(2.7,2.7,1,1)+0.1) | |
plot(gistempsyearly,ylim=c(-0.6,1.2), xlim=c(1878,2016),tck=0.01,axes=FALSE,ann=FALSE, type="n",las=1) | |
axis(side=1, tck=0.01, las=0,tick=TRUE) | |
axis(side=2, tck=0.01, las=1, at = c(-0.6,-0.4,-0.2, 0,0.2,0.4,0.6,0.8,1), labels = TRUE, tick = TRUE) | |
axis(side=4, tck=0.01, at = c(-0.6,-0.4,-0.2, 0,0.2,0.4,0.6,0.8,1), labels = FALSE, tick = TRUE) | |
box() | |
lines(gistempsyearly,col="1",lwd=1) | |
points(gistempsyearly,col="#000099",pch=19) | |
lines(lowess(gistempsyearly[["Year"]],gistempsyearly[["Mean"]],f=0.1),lwd=3,col="#CC0000") | |
mtext(side=1,cex=0.9,line=-1.1,"Data: NASA Goddard Institute for Space Studies\nhttps://github.com/datasets/global-temp") | |
mtext(side=3,cex=1.5, line=-4,expression(paste("GISTEMP Global Land-Ocean Temperature \nAnomaly 1880 2016")) ) | |
mtext(side=2,cex=1, line=-1.3,"Degrees Celsius 0 = 1951 to 1980 mean") | |
legend(1890, 1,bty='n',bg="white", c(paste("Mean", c("annual anomaly", "lowess smoothed \nanomaly 11 years f =0.1"))),pch=c(19,NA),lty=c(1,1),lwd=c(1,3),col=c("#000099","#CC0000")) | |
mtext(side=4,cex=0.75, line=0.05,R.version.string) | |
text(2013,1,adj=1,"2016",col=1,cex=1) | |
dev.off() |
No comments:
Post a Comment