Open Science and climategate: The IPCC/CRU needs to take a leaf out of CERN's Book

This is not the place to debate the immense subject of climate science but it is necessary to say something about "climategate" in order to explain what happens when scientists and politicians collude to distort, hide and even destroy critical (raw) data and methodologies which, unlike the output of CERN, have absolutely colossal financial implications for every man, woman and child on this planet.

Last Year, when CERN was on the eve of launching the Large Hadron Collider (LHC) I wrote an article about how they used free and open source software and applied the philosophy of making their data and experiments, journals, etc freely available. I write this article on the first day of a two-week jamboree (sorry, conference) taking place in Copenhagen on "global warming", a conference which might never have come about if its evangelical camp followers had adhered to the principles and practices of conducting scientific research in an open and free manner. In short, they needed to imitate their colleagues at CERN. Of course, they won't. The scientists and politicians have simply invested too much time, money, pride and reputation to back down now. The loss of face and the humility and integrity required to admit error would be psychologically unbearable. They would implode. But here is why the IPCC and the CRU at the University of East Anglia should have done so.

The background

to admit error would be psychologically unbearable. They would implode

Despite the cabal of global news outlets' failure to cover this issue openly and honestly (especially the BBC and ITV here in the United Kingdom) the internet, as usual, has stepped into the fill the information void and this has been a red hot issue. White hot in fact and the facts are these: The Climate Research Unit (CRU) at the University of East Anglia in my mother's native Norfolk has been a major contributor and collator on climate data and its "findings" have been critical to the work and publications of the International Panel on Climate Change (IPCC).

Many of the main contributors would be considered household names in climate studies: Briffa, Jones, Mann, Santer et al. Now, the CRU has been the subject of a Freedom of Information Act (FOI) request in respect of the raw data and methods it has used to support the IPCC claims of global warming. It has stalled and prevaricated on this FOI request (which was made, incidentally, by Stephen McIntyre who demonstrated, in conjunction with Ross McKittrick, the flawed nature of Mann's infamous "Hockey Stick" graph which has been the poster boy of the IPCC's main reports and their Summary for Policy Makers (SPM), but now relegated deep in the documents).

Now, the subject of this FOI request has been "hacked" and put up on a Russian server and I, like many others, found the files on Wikileaks--another subject I covered a while back. I say hacked but there is a growing body of opinion that this was a whistleblower's leak (It is deeply gratifying to note in this link that much of the detective work has been based on Unix tools like grep and Bash by a Canadian Unix admin). Proof of insider leaking is critically important because if that analysis is correct it destroys the McCarthyite smears that it was hacked by unethical criminals with ulterior motives.

The nature of the internet itself ensured that the leak spread faster than the Bubonic plague

Either way, leaked or hacked, the files found their way onto the open, porous frontiers of the internet (a deliciously appropriate irony, considering the original uses by the military on ARPANET). The nature of the internet itself ensured that the leak spread faster than the Bubonic plague. It's suppression was beyond the control of any government on the planet. The Genie was well and truly out of the bottle.

That said, there have been suggestions that Google may have be censoring search results for "climategate" if one compared the search results for that term. The ratio was five to one in favour of Microsoft's Bing search engine. However, I could not reproduce that ratio. In fact, it was the opposite by a much bigger margin. Perhaps someone reminded Google of their ten things promise and they got a bit windy--but I'm not betting my pension on that!

What's been revealed?

Once the compressed files are extracted the end result is 120MB of files in two folders, one for documents and one for e-mails. It is a Herculean task to wade through them but the community out there in cyberspace is nothing if not resourceful and already it is possible to search the material from a web interface because some kind soul has put them into a searchable database. If you want see the gob smacking contents of some of these e-mails without having to do a time consuming search then see this ready made list. The content of many of these e-mails is deeply disturbing, giving an insight into what the scientists were doing to massage and delete data as well as mounting a very effective campaign to marginalize or silence criticism--even to the extent of getting hostile reviewers sacked and scientific journals effectively black balled and redefining the very meaning of peer review itself. What the leaks have revealed is a peer review process that was and is terminally dysfunctional to the point of being quite broken. Much of this was directly attributable to the relative secrecy in which the research was done (and not just by the CRU).

Part of the problem is that, unlike CERN, the research is global in its implications and therefore immense. If the science underpinning the policies is found wanting because of a lack of due diligence, a corrupted peer review process (amounting to a cosy clique of mutually supporting scientists) and a driven political agenda then the room for admitting basic error and flawed process is deeply compromised. Even scientists have their pride and don't like to be humiliated but the stakes are simply too high to consider their finer feelings.

So far so general. I don't want to get into too many specifics at this point as this website is not the place for a detailed discussion, but I will mention a few to illustrate why and where applying open standards might have saved the scientists (and possibly the politicians too, although they are probably beyond the pale) from their own hubris and best of all might have prevented them from behaving just like... politicians.

if the Data Commissioner is doing their job, investigation and possible prosecutions should follow

Their first instinct, when faced with a FOI request, was to hunker down and go into denial mode followed by deleting the data which was the subject of the original FOI request. For the benefit of non-British readers, this is a criminal offense under the FOI Act and if the Data Commissioner is doing their job, investigation and possible prosecutions should follow, where appropriate. What was behind this illegal deletion was the opinion of the scientists at CRU that the FOI requester was malicious in intent, that he only wanted to access the data in order to disprove it! That was a quite simply extraordinary accusation.

Even if the request was maliciously intended, if the information was out in the open it would have been possible for anyone else to take it up and test it to destruction too and either confirm or refute it. For God's sake, that's how proper science functions. It cuts through all cultural and ethnic differences, personal prejudices, and all ideologies. I may as well claim that I've done research on the climate proving that the global mean surface temperature has dropped by five degrees Centigrade but I will not release the data and methods because the persons requesting it only want to prove it wrong. If it's right, it's right. It will withstand the severest scrutiny. Imagine for a moment if you will that a user of GNU/Linux had developed a piece of mission critical software which didn't function properly and that the developer refused to admit there was a fault at all, then, when further pressed refused to reveal his programming language or methods and finally, declined to release the source code. The last of course would be a violation of the GPL but that notwithstanding such behaviour would not long endure inspection and criticism.

When is peer review not peer review?

One the mantras repeated ad nauseam by the CRU and the IPCC is that everything is beyond reproach because all the science is peer reviewed. Superficially, that seems plausible until you actually examine what they mean by this phrase. Even before the leaked files and e-mails grave concerns were being raised about the way the science was being done. The leaks have confirmed the worst suspicions. Social network analysis reveals that the whole process was in fact thoroughly incestuous with CRU/IPCC scientists peer reviewing each other's papers and ensuring the exclusion of anything critical of the orthodox consensus. This simply cannot happen in the open source community because all information is free and freely available. Attempts to collude and or exclude leads only to projects forking and taking new and potentially creative directions. Yes, it can be a bit of a fractious jumble but freedom is a happy mess. Exasperating as it can be, fragmentation can often be freedom's best defense.

The author is anonymous to the reviewer and the reviewer is anonymous to the author. That's the way to do it

What is required is the wholesale adoption of the standards applied in the pharmaceutical industry drug trials: double blind trials (aka peer reviewing). The author is anonymous to the reviewer and the reviewer is anonymous to the author. There is no possibility of complicity to reinforce each other and at a the same time prevent the exclusion of any other climate research which does not fit the "consensus". That's the way to do it. As things stand though, the average member of the public hasn't the faintest idea what peer reviewed science is or how it operates and the CRU and the IPCC exploit that massive ignorance. Even if the process was wholly open and transparent it would not benefit the proverbial passenger on the Clapham omnibus--but it would benefit those with the necessary experience, expertise and training to bench test the claims, methods and data sources of climate scientists.

Spaghetti and code

What many people do not realize is that the CRU and the IPCC have relied heavily on contentious temperature proxies and combined various data sets in even more contentious computer models (the computer that has given us Mann's deceased Hockey stick graph was the same computer that was used to predict a barbecue summer in the UK this year. It never materialised.) The leaked files make for very interesting reading and new stuff is emerging all the time.

McKittrick and Ross did a devastating demolition job on the infamous graph, showing it to be fundamentally flawed as it was possible to reproduce it using stock market data or just noise. With the leaks we now know more and what they reveal is that insular scientists do not make the best computer programmers or software engineers. Once you've looked at a few code snippets you begin to understand that the guys at the CRU would have benefited from contracting it out to the free software community. We all know the mantra from the culture of the bazaar, that many eyes makes for shallow bugs. Pity the hubris of the scientists didn't put out a call to the open source community for freely given expertise. It could have saved them grief.

Instead of grabbing all the help they could they squandered vital time, energy and resource finding reasons to with hold data--on the grounds that it was confidential. (The same thing happened with land-based surface temperature stations. When the data was exposed as flawed the station data was withdrawn and requests for it refused on the grounds that it was confidential and that releasing it would damage international relations!)

I'm not a fully fledged programmer yet, if ever, (I'm on a journey with Python) but my attention was caught by the awful pickle "Harry" got himself into when coding the data. Coding was done in Fortran and IDL If you view any of the code, first, there's a hell of a lot of it (about 15,000 lines) and a lot of commenting of the sort you really wouldn't expect (it doesn't use Basic-style Rem or # as in many other languages. It uses a semicolon).

the now infamous HARRY_READ_ME.TXT has so much "documentation" in it that it gives the lie to the canard that writing code can't be funny

To date the emphasis has been on the contents of the leaked e-mails but as recently as last Friday, the normally IPCC-compliant BBC carried a feature by British computer programmer [John Graham-Cumming]( on Newsnight describing the quality of computer coding at the CRU as "below the standards in any commercial software". It looks like a spaghetti tangle and whilst every good programmer documents their code to jog their memories three months down the road and for the benefit of other viewers and users of their code the now infamous HARRY_READ_ME.TXT has so much "documentation" in it that it gives the lie to the canard that writing code can't be funny.

First, pick your coding language

As I said earlier, it's written in Fortran and IDL, both of which are used in scientific enterprises but the thing which strikes you, looking at the actual code that calculation is minimal and the bulk of the code is concerned with processing text. If they wanted a better coding language for manipulating text perhaps Python might have been a better choice--or even those good old scripting standbys like Awk, Sed and grep. Some of the coding is so dubious that some people will doubtless feel constrained to submitting it here. Even XML wold have made a better fist of processing raw data--and it's free and has open standards. (At least using XML would have avoided a program continuing to run in the presence of an error.) In short "Public data should be in standard public formats". If Harry had followed this mantra his porting problems and resulting code tangle might never have caused him twelve months of commented anguish.

How it should be done has been beautifully summarised by Dude, with Keyboard: secure and traceable data flow, demonstrable source code control, demonstrate change management with a document and source code audit trail, version control and user history, some sort of electronic signature control mechanism, all work processes fully documented with regards to system access and system usage.

Additionally, there seems to be no evidence of code reviews or anything resembling versioning either. Other bits of coding continued blithly on their way despite error messages which ought to have stopped further execution and dumped the error output for debugging. Also, if the programmer's comments are to be believed, much of the data was run with manual and semi-automated intervention which seems very odd indeed unless a predetermined outcome was being sought. Welcome to the world of sub prime coding.

Instead of allowing the data to form the basis of Carbon Emissions Trading, which will make the unaccountable fraud of the European Union's Common Agricultural Policy (CAP) and Fisheries Policy look like a poor mans' version of Papal indulgences, CRU's coders should perhaps consider offsetting their digital sins by purchasing a Bad code Offset. They will be able to offset their bad code footprint. As the good folk at CodeOffsets put it, Bad code weakens the utility delivered by these applications causing business loss, user dissatisfaction, accidents, disasters and, in general, sucks. Just remember to keep your tongue firmly in your cheek.

If you are starting to experience traumatic flashback that is because you've probably been here before with Microsoft's FUD

Alright, it's unfair to slate scientists for not being software engineers but for the love of mike why did it never occur to them to bring in the professional programmers, especially as they could have called upon a pool of distributed free talent out there in Unixland? But that is probably being naive as the whole process has been tainted by politics and ideology. Like proprietary software, where control and profit are the final arbiters, the CRU/IPCC are fatally holed below the waterline (by a non-melting iceberg?) because they are working to an agenda set by and for politicians. If you are starting to experience traumatic flashback at this point that is because you've probably been here before with Microsoft's FUD, except this time we're dealing with a scientific body funded by the taxpayer, withholding, distorting and deleting data. This is not so much FUD as "water boarding the data". Well, at least Obama pledged to close down Guantanamo.

It is one of the most extraordinary anomalies of our time that the missing code analysis and the critical scientific peer reviewing has been left to unpaid bloggers and retired scientists (who are beyond the reach of threats and sackings). The internet, most of it founded on the principles and practices of free and open methodology and software, has become the last best and only defense against corporate schills, closed methodologies and vested interests. It's far from from perfect but it's the best we've got. Yes, you'll have to wade through a lot of dross to find the nuggets of truth, expertise and wisdom but it's there.


True science values scepticism, religion hates it and punishes it as heresy with the rack and the Inquisition

Whether it's bad science and worse coding larded with poor control processes and lacking understanding of open source methodologies, Climategate, as it has been dubbed (null point for originality) represents a mother of all messes. Normally, when I'm in a hole I stop digging but the more I dig into this the worse it gets. Climate science looks more and more like a reductio ad ignorantum and a cross between the pseudo science of Sociology and a cargo cult. Big science is virtually impossible to decouple from big government and he who pays the piper. I love science, the scientific enterprise and computer technology. As Michio Kaku asked rhetorically: what has science done for us? Well, just about everything. I am not, as Gordon Brown has accused, an anti science flat Earther, so my pain at having to witness the corruption of science, peer review process and computer modeling is all the more painful. I am not attacking science, the IPCC is attacking science. True science values scepticism, religion hates it and punishes it as heresy with the rack and the Inquisition.

In 1841 Charles Mackay wrote a book called "Extraordinary Popular Delusions and the Madness of Crowds". He observed: Men, it has been well said, think in herds; it will be seen that they go mad in herds, while they only recover their senses slowly, and one by one. Is this a description of what will happen at the CRU and the IPCC?

Self serving, incestuous cliques are not unique to proprietary software or big science. Spats in open source projects are known but unlike the CRU or the IPCC, the data and code is free and open and no exclusive, corrupt peer review process can long hide facts. This article has only touched the tip of the iceberg (no pun intended). This matter is properly the subject of whole books and websites. My intention has been to see what is wrong with the process from the perspective of the culture and practice of open source methodology. I suspect though, that matters have degenerated so far that the best practice of free software geeks would not have saved the CRU from itself. What a mess. What a bloody mess.


Verbatim copying and distribution of this entire article are permitted worldwide, without royalty, in any medium, provided this notice is preserved.