Visualising scientific data

Friends of tinySG,

As the last project in 2013, a visualisation plugin for NetCDF data made it into the plugin library of the tinySG scenegraph editor. If you are searching for large datasets to explore, this one is for you! Have you asked yourself what earth would look like if sea level rises by 1 meter? How deep the ocean is at the north pole or what Antarctica would look like without ice? Did you ever try to navigate along red route one in "Hunt for Red October" yourself? Read on!

There is a vast number of free interesting, scientific datasets on the web - you only need to find them (which is probably not hard if you are a member of a geo-sience research project. But what if you are not?).

The NetCDF file format

The NetCDF format is used by scientists to store arbitrary array data. It includes a header that describes the structure of the array data blocks (dimensions, data types, physical units, etc.) - the format is extremely flexible. To understand the basics it is probably easiest to look at two example headers, decoded with the help of the ncdump tool:

   netcdf ETOPO1_Ice_g_gdal {
   dimensions:
     side = 2 ;
     xysize = 233312401 ;
   variables:
     double x_range(side) ;
         x_range:units = "Latitude" ;
     double y_range(side) ;
         y_range:units = "Latitude" ;
     double z_range(side) ;
         z_range:units = "meters" ;
     double spacing(side) ;
     int dimension(side) ;
     int z(xysize) ;
         z:scale_factor = 1. ;
         z:add_offset = 0. ;
         z:_FillValue = -2147483648 ;
         z:node_offset = 0 ;

   // global attributes:
         :title = "ETOPO1_Ice_g_gmt4.grd" ;
         :source = "grdreformat -V ETOPO1_Ice_g_gmt4.grd 
                   ETOPO1_Ice_g_gdal.grd=ci" ;
   }

Monterey bay at 1.8km per grid point (y coordinates are stretched). Relief data taken from Etopo1 netCDF dataset, texture extracted from World Wind (click to enlarge).

This header has been extracted from the ETOPO1 relief dataset, which contains high resolution relief data of the entire planet. It contains two dimension definitions (size of arrays in a particular dimension) and six variables (data arrays). While the first four variables are small 1D arrays with just two elements, the variable z is a huge 1D array with over two billion (xysize) elements. It's values are heights above or below sea level, measured at grid points. The variables dimension and spacing teach you about the 2D positions (lattitude and longitude) of each height value: Dimensions tells you how to transform the 1D z array into a 2D grid layout, the values of spacing report the angle between two adjacent heights in longitude and latitude direction.

Although this data layout avoids storing the lon/lat coordinates for each height, it is a bit complicated to handle because of the necessary 1D->2D mapping. The next example shows an organisation which encodes a 2D array directly and thus is easier to work with:

   netcdf oscar_vel2012 {
   dimensions:
      time = 61 ;
      year = 61 ;
      depth = 1 ;
      latitude = 481 ;
      longitude = 1201 ;
   variables:
      int time(time) ;
         time:units = "day since 1992-10-05 00:00:00" ;
         time:long_name = "Day since 1992-10-05 00:00:00" ;
      float year(year) ;
         year:units = "time in years" ;
         year:long_name = "Time in fractional year" ;
      float depth(depth) ;
         depth:units = "meter" ;
         depth:long_name = "Depth" ;
      double latitude(latitude) ;
         latitude:units = "degrees-north" ;
         latitude:long_name = "Latitude" ;
      double longitude(longitude) ;
         longitude:units = "degrees-east" ;
         longitude:long_name = "Longitude" ;
      double u(time, depth, latitude, longitude) ;
         u:units = "meter/sec" ;
         u:long_name = "Ocean Surface Zonal Currents" ;
         u:missing_value = 0. ;
      double v(time, depth, latitude, longitude) ;
         v:units = "meter/sec" ;
         v:long_name = "Ocean Surface Meridional Currents" ;
         v:missing_value = 0. ;
      double um(time, depth, latitude, longitude) ;
         um:units = "meter/sec" ;
         um:long_name = "Ocean Surface Zonal Currents Maximum Mask" ;
         um:missing_value = 0. ;
      double vm(time, depth, latitude, longitude) ;
         vm:units = "meter/sec" ;
         vm:long_name = "Ocean Surface Meridional Currents Maximum Mask" ;
         vm:missing_value = 0. ;

   // global attributes:
         :NC_GLOBAL.VARIABLE = "Ocean Surface Currents" ;
         :NC_GLOBAL.DATATYPE = "1/72 YEAR Interval" ;
         ...
   }

Each current sample vector is split into two coordinates, u and v (NetCDF stores only arrays of scalar values, so vector arrays are split into multiple scalar arrays). However, as opposed to the previous example, both u and v use indices in four dimensions: Time, depth, u- and v-component of a 2D current vector. It is pretty intuitive to gather all components for a given point in time, depth and location:

      double coord[2];    // lat, lon;
      double veloVec[2];  // velocity at coord[]
      size_t latIdx=20, lonIdx=317;
      size_t indices[] = { 23 /*time*/, 0/*depth*/, latIdx, lonIdx };	 
      // as stated in the dimensions section, latIdx is valid in 0..480
      // and lonIdx is valid in 0..1200. Use API for value retrieval:
      nc_get_var1_double (netCDFHandle, 3, latIdx, &coord[0]);
      nc_get_var1_double (netCDFHandle, 4, lonIdx, &coord[1]);
      nc_get_var1_double (netCDFHandle, 5, indices, &vec[0]);
      nc_get_var1_double (netCDFHandle, 6, indices, &vec[1]);
      
      // ...  veloVel now holds the current vector at coords[] (the geographic location)

So, time- and depth-varying values can be fetched for a given location using 4D indices and the real sphere coordinates are also fetched using the same index.

As with MPI, the NetCDF API is pretty large and powerful, but for basic use cases only a hand full of functions are really needed.

A NetCDF importer for tinySceneGraph

The examples in the previous section have shown that data organisation in NetCDF files can differ quite significantly from one file to the other. The only common ground of netCDF data is the array as the central data structure.

Semantics of NetCDF variables usually differ as well: Ocean currents, height maps, temperature, gradients of entities, time varying data, etc. Each individual entity may be a scalar (like pressure values) or a vector entity (like a current vector).

To visualise these datasets, a number of techniques come to mind:

Use textures to color-code global measurements, like temperature distributions, ocean saline or currents.
Encode values in textures to process them arbitrarily by a shader using the parallel power of modern GPUs.
Create geometry grids from relief data.
Use particles animations to trace currents over time.
Visualise vector fields using illuminated lines.

Import workflow:

Once the user has selected a netCDF file, the plugin presents an overview of it's internal structure. Thus, the user can check what data is provided and how it is organised.

The transformation of NetCDF variable(s) to tinySG scenegraph nodes involves selection of a destination node type and, most often some sort of dimension mapping, color encoding, and scaling. For example, you may want to map the two components of a velocity vector to the three color channels of a rgb texture.
Instead of providing a full-featured user interface, the NetCDF plugin trades simplicity for maximum flexibility: The user is required to select a node (mesh, texture, volume, lineset or particle node) and write a small callback function in Python that reads data from the NetCDF files variable(s) she's interested in. To do so, the plugin establishes a Python wrapper around the native C API of NetCDF.

The code editor provides a skeleton right from the start, so the user just has to set correct variable indices, read the variables value and process it (scaling, color encoding, etc.) When the callback function is complete, the Execute button sends it to a Python interpreter (another plugin...) that runs the script and feeds the created node with the callbacks return data.

As an example, the SetTexel() callback function in the following Python script extracts velocity vectors from an OSCAR dataset. The Python framework takes it's return values to set the texels of a 2D texture. Mapping speeds to hsv color space, high current speeds are colored red/bright, low currents are colored blue/dark:

idx = netCDF.SizeTArray(4) varID = 5 texWidth = 1200 texHeight= 480 numChan = 3 # Called for every texel of texture: def SetTexel (tex, x,y): global idx idx[0] = 18 idx[1] = 0 idx[2] = y idx[3] = texWidth-x sat = val = 0.0 u = gFile.GetValAsDouble (varID, idx) v = gFile.GetValAsDouble (varID+1, idx) if not (math.isnan(u) or math.isnan(u)): sat = 1.0 else: u = v = 0 speed = math.sqrt (u*u + v*v) val = 0.1 + min (speed, 0.9) hue = 260-240*speed hue = max(min(hue, 300), 0) xShifted = (x+540)%1080 tex.SetPixelHSV (xShifted,y,0, hue, sat, val, 1.0)

Image composed of a NASA earth texture and (ocean-) texels taken from color-coded NetCDF data. The gulf stream and the equatorial counter current are clearly visible in this texture extracted from the April 2011 OSCAR dataset (click on the image to enlarge).

The lines in grey italic font are created automatically as part of the code skeleton: The first global variable block is initialised according to user input when creating the texture. The function definition is to be filled by the actual data mapping (gFile.GetValAsDouble() is one of the Python wrapper functions to access NetCDF data).
Lines in typewriter font are filled in by the user, assigning and interpreting netCDF data to the given texture node. The texture node again is a Python proxy object providing an sdkITexture interface to the real texture scenegraph node.
The SetTexel() function is repeatedly called by a hidden Python framework looping over the textures dimensions.

The images below are created from the Reynolds sea surface temperature dataset, which provides data at one quarter degree resolution over the globe.

Temperature in the north Atlantic ocean, January 2013. Texture created with the tinySG netCDF plugin. Click image to enlarge. Moving forward in time to end of August 2013. The northern hemisphere is heating up. Click image to enlarge.

Keep rendering,
Christian

Acknowledgements and links:

Oscar (Ocean Surface Current Analysis Real-time) contains near-surface ocean current estimates derived from various data sources. It is a time varying dataset on currents at the ocean surface, with either one or three samples per degree latitude/longitude.
NetCDF data is available at JPL.
ETOPO1 is a 1 arc-minute global relief model of Earth's surface. This sums up to about 233.3 million height points.
Textures are exported from World Wind by courtesy of NASA.
Argo is an international project using thousands of float to collect data on ocean temperature, salinity and currents from approximately 2000m depth up to the surface. It contributes on monitoring global climate changes.
The high resolution sea surface temperature (Reynolds-SST) dataset is available at podaac-ftp and climate data center, University of Hamburg, Germany.