Visualizing Climate Data – A Better Approach
My internship/research this summer involves a great deal of plotting data from giant netCDF files, generated from runs of NCAR’s CCSM3. Initially, I used legacy scripts written in IDL to create the necessary figures for my professor’s analysis. Those scripts had been collected over the years and while performing the job adequately, I was unhappy to find them poorly commented or documented. Thus, I persuaded my professor to let me explore alternative ways to accomplish our plotting needs.
IDL is a great language for plotting meteorological/climatological data. Its native netCDF and postscript support makes for easy I/O, and there is a great catalogue of code snippets available on the internet. It is especially useful for designing pages with multiple plots. But IDL has a huge flaw – it’s not free, and licensing can be expensive.
Having been weened on open-source, free programming tools, working in IDL was, well, traumatic. The language itself wasn’t terrible; at times it can be cumbersome, but it’s relatively easy to script in. However, I knew that there had to be a better way of accomplishing my scripting needs. So, I began researching alternative solutions.
My initial search turned up a few tools and scripting languages. The NCAR Command Language came up first, and while powerful it seemed difficult to learn and to use, even with the copious amounts of provided documentation. Another tool, VCDAT, looked promising as well. Its GUI was easy to work with, but my research necessitates building many, many plots; I need to be able to write scripts to automate my tasks, and the CDAT scripting interface supplied with VCDAT was just as cumbersome as NCL.
I noticed, though that VCDAT and its associated scripting language were based in Python. Now, I really, really love Python; it is an incredibly easy language to work with, and it is extremely powerful as a ‘glue’ to hook up all sorts of modules for completing nearly any task. It made sense to explore the options that Python afforded for my work.
At first, I explored a solution I was already familiar with – RPy. RPy is a python interface to the R statistical language. When it comes to performing statistical solutions, RPy is great. But I find its plotting functions difficult to control from within Python. It is easy to make pretty plots, but my resulting code is always convoluted and even with strong documentation, I feel that it reads poorly.
Finally, I stumbled upon Matplotlib. This library is used as the backend for a myriad plotting and visualization tools, and has an incredibly rich documentation. Matplotlib and its toolkits afforded me the power to do everything I wanted, and then some with the help of SciPy, a suite of scientific tools.
I plan on writing a series of posts in the near future documenting how easy it is to use Matplotlib and SciPy to generate concise, publication-quality plots of meteorological/climatological data. Furthermore, if time permits, I would like to compare Matplotlib to other popular solutions such as GrADS, NCL, and CDAT. Although these languages offer unique sets of tools for accomplishing certain tasks, I believe that by building off of Matplotlib, SciPy, and other libraries, a climate scientist can have better and more direct control over their plotting and data analysis environment, and that control is afforded in the easy-to-learn and use Python language.