Visualising scientific data
Friends of tinySG,
As the last project in 2013, a visualisation plugin for NetCDF data
made it into the plugin library of the tinySG scenegraph editor. If you are
searching for large datasets to explore, this one is for you! Have you asked
yourself what earth would look like if sea level rises by 1 meter? How deep
the ocean is at the north pole or what Antarctica would look like without ice?
Did you ever try to navigate along red route one in "Hunt for Red October"
yourself? Read on!
There is a
vast number of free interesting, scientific datasets on the web - you only
need to find them (which is probably not hard if you are a member of a
geo-sience research project. But what if you are not?).
The NetCDF file format
The NetCDF format is used by scientists to store arbitrary array data. It
includes a
header that describes the structure of the array data blocks (dimensions,
data types, physical units, etc.) - the format is extremely flexible.
To understand the basics it is probably easiest to look at two example
headers, decoded with the help of the ncdump tool:
netcdf ETOPO1_Ice_g_gdal {
dimensions:
side = 2 ;
xysize = 233312401 ;
variables:
double x_range(side) ;
x_range:units = "Latitude" ;
double y_range(side) ;
y_range:units = "Latitude" ;
double z_range(side) ;
z_range:units = "meters" ;
double spacing(side) ;
int dimension(side) ;
int z(xysize) ;
z:scale_factor = 1. ;
z:add_offset = 0. ;
z:_FillValue = -2147483648 ;
z:node_offset = 0 ;
// global attributes:
:title = "ETOPO1_Ice_g_gmt4.grd" ;
:source = "grdreformat -V ETOPO1_Ice_g_gmt4.grd
ETOPO1_Ice_g_gdal.grd=ci" ;
}
|
Monterey bay at 1.8km per grid point (y coordinates are stretched).
Relief data taken from Etopo1 netCDF dataset, texture extracted from
World Wind (click to enlarge).
|
This header has been extracted from the ETOPO1 relief dataset, which
contains high resolution relief data of the entire planet. It contains
two dimension definitions (size of arrays in a particular dimension) and
six variables (data arrays). While the first four variables are small
1D arrays with just two elements, the variable z is a huge 1D
array with over two billion (xysize) elements. It's values are heights
above or below sea level, measured at grid points. The
variables dimension and spacing teach you about the 2D
positions (lattitude and longitude) of each height value: Dimensions
tells you how to transform the 1D z array into a 2D grid layout,
the values of spacing report the angle between two adjacent heights in
longitude and latitude direction.
Although this data layout avoids storing the lon/lat coordinates for each
height, it is a bit complicated to handle because of the necessary
1D->2D mapping. The next example shows an organisation which encodes a
2D array directly and thus is easier to work with:
netcdf oscar_vel2012 {
dimensions:
time = 61 ;
year = 61 ;
depth = 1 ;
latitude = 481 ;
longitude = 1201 ;
variables:
int time(time) ;
time:units = "day since 1992-10-05 00:00:00" ;
time:long_name = "Day since 1992-10-05 00:00:00" ;
float year(year) ;
year:units = "time in years" ;
year:long_name = "Time in fractional year" ;
float depth(depth) ;
depth:units = "meter" ;
depth:long_name = "Depth" ;
double latitude(latitude) ;
latitude:units = "degrees-north" ;
latitude:long_name = "Latitude" ;
double longitude(longitude) ;
longitude:units = "degrees-east" ;
longitude:long_name = "Longitude" ;
double u(time, depth, latitude, longitude) ;
u:units = "meter/sec" ;
u:long_name = "Ocean Surface Zonal Currents" ;
u:missing_value = 0. ;
double v(time, depth, latitude, longitude) ;
v:units = "meter/sec" ;
v:long_name = "Ocean Surface Meridional Currents" ;
v:missing_value = 0. ;
double um(time, depth, latitude, longitude) ;
um:units = "meter/sec" ;
um:long_name = "Ocean Surface Zonal Currents Maximum Mask" ;
um:missing_value = 0. ;
double vm(time, depth, latitude, longitude) ;
vm:units = "meter/sec" ;
vm:long_name = "Ocean Surface Meridional Currents Maximum Mask" ;
vm:missing_value = 0. ;
// global attributes:
:NC_GLOBAL.VARIABLE = "Ocean Surface Currents" ;
:NC_GLOBAL.DATATYPE = "1/72 YEAR Interval" ;
...
}
Each current sample vector is split into two coordinates, u
and v (NetCDF stores only arrays of scalar values, so
vector arrays are split into multiple scalar arrays). However,
as opposed to the previous example, both u and v use
indices in four dimensions: Time, depth, u- and v-component of a 2D
current vector. It is pretty intuitive to gather all components for
a given point in time, depth and location:
double coord[2]; // lat, lon;
double veloVec[2]; // velocity at coord[]
size_t latIdx=20, lonIdx=317;
size_t indices[] = { 23 /*time*/, 0/*depth*/, latIdx, lonIdx };
// as stated in the dimensions section, latIdx is valid in 0..480
// and lonIdx is valid in 0..1200. Use API for value retrieval:
nc_get_var1_double (netCDFHandle, 3, latIdx, &coord[0]);
nc_get_var1_double (netCDFHandle, 4, lonIdx, &coord[1]);
nc_get_var1_double (netCDFHandle, 5, indices, &vec[0]);
nc_get_var1_double (netCDFHandle, 6, indices, &vec[1]);
// ... veloVel now holds the current vector at coords[] (the geographic location)
So, time- and depth-varying values can be fetched for a given location
using 4D indices and the real sphere coordinates are also fetched using
the same index.
As with MPI, the NetCDF API is pretty large and powerful, but
for basic use cases only a hand full of functions are really needed.
A NetCDF importer for tinySceneGraph
The examples in the previous section have shown that data organisation
in NetCDF files can differ quite significantly from one file to the other.
The only common ground
of netCDF data is the array as the central data structure.
Semantics of NetCDF variables usually differ as well: Ocean currents,
height maps, temperature, gradients of entities, time varying data, etc.
Each individual entity may be a scalar (like pressure values) or a vector
entity (like a current vector).
To visualise these datasets, a number of techniques come to mind:
- Use textures to color-code global measurements, like temperature
distributions, ocean saline or currents.
- Encode values in textures to process them arbitrarily by a
shader using the parallel power of modern GPUs.
- Create geometry grids from relief data.
- Use particles animations to trace currents over time.
- Visualise vector fields using illuminated lines.
Import workflow:
Once the user has selected a netCDF file, the plugin presents
an overview of it's internal structure. Thus, the user can check what
data is provided and how it is organised.
The transformation of NetCDF variable(s) to tinySG scenegraph nodes involves
selection of a destination node type and, most often some sort of dimension mapping,
color encoding, and scaling. For example, you may want to map the two
components of a velocity vector to the three color channels of a rgb texture.
Instead of providing a full-featured
user interface, the NetCDF plugin trades simplicity for maximum flexibility: The
user is required to select a node (mesh, texture, volume,
lineset or particle node)
and write a small callback function in Python that reads data from the NetCDF
files variable(s) she's interested in. To do so, the plugin establishes a
Python wrapper around the native C API of NetCDF.
The code editor provides a skeleton right from the start, so the user just
has to set correct variable indices, read the variables value and process it
(scaling, color encoding, etc.)
When the callback function is complete, the Execute button sends it to
a Python interpreter (another plugin...) that runs the script and feeds the
created node with the callbacks return data.
As an example, the SetTexel() callback function in the following
Python script extracts velocity vectors from
an OSCAR dataset. The Python framework takes it's return values to set the
texels of a 2D texture. Mapping speeds to hsv color space, high current speeds
are colored red/bright, low currents are colored blue/dark:
idx = netCDF.SizeTArray(4)
varID = 5
texWidth = 1200
texHeight= 480
numChan = 3
# Called for every texel of texture:
def SetTexel (tex, x,y):
global idx
idx[0] = 18
idx[1] = 0
idx[2] = y
idx[3] = texWidth-x
sat = val = 0.0
u = gFile.GetValAsDouble (varID, idx)
v = gFile.GetValAsDouble (varID+1, idx)
if not (math.isnan(u) or math.isnan(u)):
sat = 1.0
else:
u = v = 0
speed = math.sqrt (u*u + v*v)
val = 0.1 + min (speed, 0.9)
hue = 260-240*speed
hue = max(min(hue, 300), 0)
xShifted = (x+540)%1080
tex.SetPixelHSV (xShifted,y,0, hue, sat, val, 1.0)
|
Image composed of a NASA earth texture and (ocean-) texels taken
from color-coded NetCDF data. The gulf stream and the equatorial counter
current are clearly visible in this texture extracted from the April
2011 OSCAR dataset (click on the image to enlarge).
|
The lines in grey italic font are created automatically as part of
the code skeleton: The first global variable block is initialised according
to user input when creating the texture. The function definition is to be
filled by the actual data mapping (gFile.GetValAsDouble() is one of the
Python wrapper functions to access NetCDF data).
Lines in typewriter font are filled in by the user, assigning and
interpreting netCDF data to the given texture node. The texture node again
is a Python proxy object providing an sdkITexture interface to the
real texture scenegraph node.
The SetTexel() function is repeatedly called by a hidden Python
framework looping over the textures dimensions.
The images below are created from the Reynolds sea surface temperature
dataset, which provides data at one quarter degree resolution over the
globe.
|
|
Temperature in the north Atlantic ocean, January 2013. Texture created
with the tinySG netCDF plugin. Click image to enlarge.
|
Moving forward in time to end of August 2013. The northern hemisphere is heating up.
Click image to enlarge.
|
Keep rendering,
Christian
Acknowledgements and links:
- Oscar
(Ocean Surface Current Analysis Real-time) contains near-surface
ocean current estimates derived from various data sources. It is a
time varying dataset on currents at the ocean surface, with either
one or three samples per degree latitude/longitude.
NetCDF data is available at JPL.
- ETOPO1
is a 1 arc-minute global relief model of Earth's surface. This sums
up to about 233.3 million height points.
- Textures are exported from
World Wind
by courtesy of NASA.
- Argo is an international project
using thousands of float to collect data on ocean temperature, salinity
and currents from approximately 2000m depth up to the surface. It
contributes on monitoring global climate changes.
- The high resolution sea surface temperature (Reynolds-SST) dataset is available at
podaac-ftp
and climate data center,
University of Hamburg, Germany.
Copyright by Christian Marten, 2009-2014
Last change: 11.04.2014
|