Visualising scientific data
   
    
   Friends of tinySG,
   
    
   
    
   As the last project in 2013, a visualisation plugin for NetCDF data
	made it into the plugin library of the tinySG scenegraph editor. If you are
	searching for large datasets to explore, this one is for you! Have you asked 
	yourself what earth would look like if sea level rises by 1 meter? How deep 
	the ocean is at the north pole or what Antarctica would look like without ice?
	Did you ever try to navigate along red route one in "Hunt for Red October" 
	yourself? Read on!
	 	
	There is a 
	vast number of free interesting, scientific datasets on the web - you only 
	need to find them (which is probably not hard if you are a member of a 
	geo-sience research project. But what if you are not?).
     
    
    
	
  
     The NetCDF file format
	The NetCDF format is used by scientists to store arbitrary array data. It 
	includes a 
	header that describes the structure of the array data blocks (dimensions, 
	data types, physical units, etc.) - the format is extremely flexible. 
   To understand the basics it is probably easiest to look at two example 
   headers, decoded with the help of the ncdump tool:
   
	
   netcdf ETOPO1_Ice_g_gdal {
   dimensions:
     side = 2 ;
     xysize = 233312401 ;
   variables:
     double x_range(side) ;
         x_range:units = "Latitude" ;
     double y_range(side) ;
         y_range:units = "Latitude" ;
     double z_range(side) ;
         z_range:units = "meters" ;
     double spacing(side) ;
     int dimension(side) ;
     int z(xysize) ;
         z:scale_factor = 1. ;
         z:add_offset = 0. ;
         z:_FillValue = -2147483648 ;
         z:node_offset = 0 ;
   // global attributes:
         :title = "ETOPO1_Ice_g_gmt4.grd" ;
         :source = "grdreformat -V ETOPO1_Ice_g_gmt4.grd 
                   ETOPO1_Ice_g_gdal.grd=ci" ;
   }
    
    |    
   
     
      
     
      
      Monterey bay at 1.8km per grid point (y coordinates are stretched). 
         Relief data taken from Etopo1 netCDF dataset, texture extracted from 
         World Wind (click to enlarge).
    |    
    
    
    
	This header has been extracted from the ETOPO1 relief dataset, which
   contains high resolution relief data of the entire planet. It contains 
   two dimension definitions (size of arrays in a particular dimension) and
   six variables (data arrays). While the first four variables are small
   1D arrays with just two elements, the variable z is a huge 1D 
   array with over two billion (xysize) elements. It's values are heights
   above or below sea level, measured at grid points. The 
	variables dimension and spacing teach you about the 2D 
	positions (lattitude and longitude) of each height value: Dimensions
	tells you how to transform the 1D z array into a 2D grid layout, 
   the values of spacing report the angle between two adjacent heights in 
   longitude and latitude direction.
	
	Although this data layout avoids storing the lon/lat coordinates for each 
   height, it is a bit complicated to handle because of the necessary 
   1D->2D mapping. The next example shows an organisation which encodes a 
   2D array directly and thus is easier to work with:
    
	
   netcdf oscar_vel2012 {
   dimensions:
      time = 61 ;
      year = 61 ;
      depth = 1 ;
      latitude = 481 ;
      longitude = 1201 ;
   variables:
      int time(time) ;
         time:units = "day since 1992-10-05 00:00:00" ;
         time:long_name = "Day since 1992-10-05 00:00:00" ;
      float year(year) ;
         year:units = "time in years" ;
         year:long_name = "Time in fractional year" ;
      float depth(depth) ;
         depth:units = "meter" ;
         depth:long_name = "Depth" ;
      double latitude(latitude) ;
         latitude:units = "degrees-north" ;
         latitude:long_name = "Latitude" ;
      double longitude(longitude) ;
         longitude:units = "degrees-east" ;
         longitude:long_name = "Longitude" ;
      double u(time, depth, latitude, longitude) ;
         u:units = "meter/sec" ;
         u:long_name = "Ocean Surface Zonal Currents" ;
         u:missing_value = 0. ;
      double v(time, depth, latitude, longitude) ;
         v:units = "meter/sec" ;
         v:long_name = "Ocean Surface Meridional Currents" ;
         v:missing_value = 0. ;
      double um(time, depth, latitude, longitude) ;
         um:units = "meter/sec" ;
         um:long_name = "Ocean Surface Zonal Currents Maximum Mask" ;
         um:missing_value = 0. ;
      double vm(time, depth, latitude, longitude) ;
         vm:units = "meter/sec" ;
         vm:long_name = "Ocean Surface Meridional Currents Maximum Mask" ;
         vm:missing_value = 0. ;
   // global attributes:
         :NC_GLOBAL.VARIABLE = "Ocean Surface Currents" ;
         :NC_GLOBAL.DATATYPE = "1/72 YEAR Interval" ;
         ...
   }
	
	
	
	Each current sample vector is split into two coordinates, u
   and v (NetCDF stores only arrays of scalar values, so 
   vector arrays are split into multiple scalar arrays). However, 
   as opposed to the previous example, both u and v use 
   indices in four dimensions: Time, depth, u- and v-component of a 2D
   current vector. It is pretty intuitive to gather all components for 
   a given point in time, depth and location:
	
	
      double coord[2];    // lat, lon;
      double veloVec[2];  // velocity at coord[]
      size_t latIdx=20, lonIdx=317;
      size_t indices[] = { 23 /*time*/, 0/*depth*/, latIdx, lonIdx };	 
      // as stated in the dimensions section, latIdx is valid in 0..480
      // and lonIdx is valid in 0..1200. Use API for value retrieval:
      nc_get_var1_double (netCDFHandle, 3, latIdx, &coord[0]);
      nc_get_var1_double (netCDFHandle, 4, lonIdx, &coord[1]);
      nc_get_var1_double (netCDFHandle, 5, indices, &vec[0]);
      nc_get_var1_double (netCDFHandle, 6, indices, &vec[1]);
      
      // ...  veloVel now holds the current vector at coords[] (the geographic location)
	
	
	
	So, time- and depth-varying values can be fetched for a given location 
   using 4D indices and the real sphere coordinates are also fetched using 
   the same index. 
	 
	As with MPI, the NetCDF API is pretty large and powerful, but 
	for basic use cases only a hand full of functions are really needed.
	
	
	
	
	
 
  
    A NetCDF importer for tinySceneGraph
   
    
   
	The examples in the previous section have shown that data organisation 
	in NetCDF files can differ quite significantly from one file to the other. 
   The only common ground
	of netCDF data is the array as the central data structure.
	
	Semantics of NetCDF variables usually differ as well: Ocean currents, 
	height maps, temperature, gradients of entities, time varying data, etc.
    Each individual entity may be a scalar (like pressure values) or a vector
    entity (like a current vector). 
      
	To visualise these datasets, a number of techniques come to mind:
     
        -  Use textures to color-code global measurements, like temperature 
		     distributions, ocean saline or currents.
        
 -  Encode values in textures to process them arbitrarily by a 
		     shader using the parallel power of modern GPUs.
        
 -  Create geometry grids from relief data.
		
 -  Use particles animations to trace currents over time.
		
 -  Visualise vector fields using illuminated lines.
    
  
    
    
   Import workflow:
   
	Once the user has selected a netCDF file, the plugin presents
   an overview of it's internal structure. Thus, the user can check what 
   data is provided and how it is organised.
   
   The transformation of NetCDF variable(s) to tinySG scenegraph nodes involves 
   selection of a destination node type and, most often some sort of dimension mapping, 
   color encoding, and scaling. For example, you may want to map the two
   components of a velocity vector to the three color channels of a rgb texture.
    
   Instead of providing a full-featured 
   user interface, the NetCDF plugin trades simplicity for maximum flexibility: The 
	user is required to select a node (mesh, texture, volume, 
   lineset or particle node)
   and write a small callback function in Python that reads data from the NetCDF 
   files variable(s) she's interested in. To do so, the plugin establishes a 
   Python wrapper around the native C API of NetCDF.
	 
   The code editor provides a skeleton right from the start, so the user just
   has to set correct variable indices, read the variables value and process it
   (scaling, color encoding, etc.)
   When the callback function is complete, the  Execute button sends it to 
   a Python interpreter (another plugin...) that runs the script and feeds the 
   created node with the callbacks return data.
    
   As an example, the SetTexel() callback function in the following 
   Python script extracts velocity vectors from 
   an OSCAR dataset. The Python framework takes it's return values to set the 
   texels of a 2D texture. Mapping speeds to hsv color space, high current speeds
   are colored red/bright, low currents are colored blue/dark:
    
   
   
   
   
   idx      = netCDF.SizeTArray(4)
   varID    = 5
   texWidth = 1200
   texHeight= 480
   numChan  = 3
   # Called for every texel of texture:
   def SetTexel (tex, x,y):
      global idx
      idx[0] = 18
      idx[1] = 0
      idx[2] = y
      idx[3] = texWidth-x
      sat = val = 0.0
      
      u = gFile.GetValAsDouble (varID, idx)
      v = gFile.GetValAsDouble (varID+1, idx)
      
      if not (math.isnan(u) or math.isnan(u)):
         sat = 1.0
      else:
         u = v = 0
         
      speed = math.sqrt (u*u + v*v)
      val = 0.1 + min (speed, 0.9)
      hue = 260-240*speed
      hue = max(min(hue, 300), 0)
      xShifted = (x+540)%1080
      tex.SetPixelHSV (xShifted,y,0, hue, sat, val, 1.0)
   
   
    | 
   
      
       
      
      
      Image composed of a NASA earth texture and (ocean-) texels taken
      from color-coded NetCDF data. The gulf stream and the equatorial counter 
      current are clearly visible in this texture extracted from the April 
      2011 OSCAR dataset (click on the image to enlarge).
     | 
    
    
	The lines in grey italic font are created automatically as part of 
   the code skeleton: The first global variable block is initialised according 
   to user input when creating the texture. The function definition is to be 
   filled by the actual data mapping (gFile.GetValAsDouble() is one of the 
   Python wrapper functions to access NetCDF data). 
    
   Lines in typewriter font are filled in by the user, assigning and 
   interpreting netCDF data to the given texture node. The texture node again 
   is a Python proxy object providing an sdkITexture interface to the 
   real texture scenegraph node.
    
   The SetTexel() function is repeatedly called by a hidden Python 
   framework looping over the textures dimensions.
    
	
	The images below are created from the Reynolds sea surface temperature
   dataset, which provides data at one quarter degree resolution over the 
   globe.
    
    
      
         
            
             
            
          | 
         
            
             
            
          | 
       
      
         | 
            Temperature in the north Atlantic ocean, January 2013. Texture created 
            with the tinySG netCDF plugin. Click image to enlarge.
          | 
         
            Moving forward in time to end of August 2013. The northern hemisphere is heating up.
            Click image to enlarge.
          | 
       
    
   
 	
	Keep rendering, 
	Christian
   
     
     
     
      
    Acknowledgements and links:
    
     
      -  Oscar
		(Ocean Surface Current Analysis Real-time) contains near-surface 
		ocean current estimates derived from various data sources. It is a 
		time varying dataset on currents at the ocean surface, with either 
		one or three samples per degree latitude/longitude.
 
		NetCDF data is available at JPL.
		
       -  ETOPO1
	    is a 1 arc-minute global relief model of Earth's surface. This sums 
       up to about 233.3 million height points.
	  
 -  Textures are exported from 
       World Wind 
       by courtesy of NASA.
	  
 -  Argo is an international project
       using thousands of float to collect data on ocean temperature, salinity
       and currents from approximately 2000m depth up to the surface. It 
       contributes on monitoring global climate changes.
     
 -  The high resolution sea surface temperature (Reynolds-SST) dataset is available at
     podaac-ftp
     and climate data center, 
     University of Hamburg, Germany.
    
  
        
     
     
     Copyright by Christian Marten, 2009-2014 
    Last change: 11.04.2014 
     
    
     |