Partially Resident TexturesIntroductionHuge resolution textures become more and more famous in games and other applications. ID Softwares Tech 5 engine calls them MegaTextures, google maps uses them to zoom into satellite imagery, just to mention two examples.The Southern Islands graphic chips of Radeon 7000 series boards and FirePro W-series Workstation cards support a new OpenGL extension called AMD_sparse_texture that allows to download only parts of a texture image (tiles), rather than providing the entire image all at once. The driver will commit memory only for the downloaded tiles, and the application may decide to unload tiles again if they are no longer needed. This way, the graphics memory can serve as a texture tile cache, indexing parts of vast textures in main memory or even on disk. The entire procedure is pretty similar to allocating a buffer object and then just filling parts of it via glBufferSubData(). A difference however is that with sparse textures, the tiles need to be aligned to virtual pages, and memory is only allocated for specified tiles.
The screenshot above shows an example from the FirePro SDK, adapted to tinySG:
The original shader code from the example has been time-warped back to
GLSL 1.0, stripped down to the bare minimum and is still working with PRT.
The sparse texture extension requires quite some effort on the applications side, and
documentation on the net is still poor. Lets have a look at some rarely documented
fast facts on the extension before we dive into code snippets:
The first challenge in the table above came as a major disappointment at a first glance. However, if you have a closer look at the memory footprint of a 16kx16k RGBA texture, you already end up at 1GB. Handling much larger textures would probably require special care anyway, as the main memory soon becomes limiting as well on most machines. tinySG works around this limit by splitting larger textures into layers of a 2D_array texture. The W8000 supports up to 8192 layers, so this is good for textures up to 2,097,152 x 1,048,576 texels (which translates to 8TB for uncompressed rgba data!). The Southern Island GPUs maintain texture tiles in chunks of 64kb in a page table. Each tile will require 32bytes of management information, and the texture size needs to be a multiple of the tile size (so all tiles are of equal size). Sparse Texture Fetches in a GLSL ShaderLets start by having a look at the shader code needed to access a sparse texture. Note that the only major difference to what you are usually doing is the replacement of a regular texture2D() call by it's PRT version, sparseTexture(). sparseTexture() takes an inout argument texel to provide a texels value and returns a success- or cacheMiss-code. This new sampler function works with both PRTs and traditional textures (with the latter to always succeed). Even the usual texture sampler functions are supposed to work if the requested texel is resident. The shader used in the introduction image is represented by a csgShaderProgam node and looks exactly like this:// **** Vertex Shader **************************************** void main () { // just pass the required parameters on to the // fragment stage - nothing unusual here: gl_Position = ftransform(); gl_TexCoord[0] = gl_MultiTexCoord0; } // **** Fragment Shader ************************************** #extension GL_AMD_sparse_texture : enable uniform sampler2D Diffuse; void main() { vec4 texel = vec4(0); // Use "sparseTexture" instead of the standard "vec4 texture2D (sampler, coord)". Note that // the return value is a result code. The texel to fetch ends up in the 3rd inout parameter: int result = sparseTexture(Diffuse, gl_TexCoord[0].st, texel); // The return code indicates cache misses. If it is flagged to be fine, then just use the // fetched texel. Otherwise, render a "fail color" for demo purposes: if (sparseTexelResident(result)) gl_FragColor = texel; else // render texture lookup failure color. gl_FragColor = vec4(1.0, 0.5, 0.0, 1.0); }The example shows very clearly the effect of cache misses (and the example code from the FirePro SDK sets the texture up to specifically provoke lookup misses by committing only parts of the complete texture to OpenGL). In a real application, you'd want to replace the else-branch to do something more clever, like looking up a lower mipmap level and sending information about the miss back to the application (i.e. through another render target or a shader_load_store_image texture). Setting up the ApplicationTo get started, it is useful to again look at the bare minimum of what is required. This is a (working) code snipped from the very first experiment in tinySG. The code is called from OnRender() when the traverser encounters a csgSparseAMDTexture node marked dirty. The texture object has already been generated and bound by the caller (glGenTextures (1, &pipeIDs [pipeID]); glBindTexture (target, glID);):void csgSparseAMDTexture::DownloadImage () { // The texture object has already been bound by calling csgTexture::OnRender()! glPixelStorei (GL_UNPACK_ALIGNMENT, 1); GLenum format; unsigned int width, height, depth, dim; // img is an object holding pixels (bmp), width, height, depth, dimension and // format of a 2D image. The following two lines fetch these parameters: img->GetImgParams (width, height, depth, dim); img->GetGLFormat (format); GLsizei const Size = 4096; GLsizei const depth= 1; GLsizei const layers = 1; GLsizei Width = Size / width; GLsizei Height = Size / height; // Specify the storage requirements for texture object: For demo purposes Size x Size // is used for width x height, which is larger than the actual texImage.... glTexStorageSparseAMD(GL_TEXTURE_2D, format, Size, Size, depth, layers, GL_TEXTURE_STORAGE_SPARSE_BIT_AMD); // Page-in every second tile of our (Size x Size) texture (always using the complete // texImage as a tile for demo purposes): for(GLsizei j = 0; j < Height; ++j) for(GLsizei i = 0; i < Width; ++i) glTexSubImage2D (GL_TEXTURE_2D, 0, i * GLsizei(width), j * GLsizei(height), GLsizei(width), GLsizei(height), GL_BGRA, GL_UNSIGNED_BYTE, // DEMO: Only provide every other tile, leave others for intended misses: (i+j)%2 ? img->Data () : NULL); // Since mipmaps are required, we want to ensure using them: Specify index of // lowest defined mipmap level (GL default is 0): glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_BASE_LEVEL, 0); // Sets index of highest defined mipmap level (default would be "1000"): glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAX_LEVEL, GLint(m_numLevels - 1)); // Setup filtering to use mipmaps: glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_LINEAR); // Generate mipmaps based on parameters just set: glGenerateMipmap(GL_TEXTURE_2D); // Enforce mipmap filter for next render call: minFilter = GL_LINEAR_MIPMAP_LINEAR; }Partial resident textures are represented in tinySG by the node type csgSparseAMDTexture, derivate of csgTexture. Support is still somewhat experimental - another option would be to support it directly in the texture nodes for 2D-, 3D- and cube textures (an option which may be a good idea once the extension is EXT or ARB...).
tinySGs goal is to support extremely large textures, like satellite images
of the earth. This data tends to be so huge that it does not even fit into main
memory. As a result, calculating mipmap levels becomes difficult and is not
supported by tinySG in the initial implementation.
The code snippet above was just to get you started and show something on the screen, but does not support the dynamic loading of tiles. Whenever a texel fetch fails in the shader, it is up to the shader to report information on the missing texel back to the application. The application has to upload the missing part of the texture and render the scene again, if it, like tinySGs traversers, did not anticipate the required texture tiles in advance. Paging StrategiesSplitting huge resolutionstinySG sparseTexture nodes hide the maximum resolution limit enforced by OpenGL and allow to address textures up to 2Mx1M texels, coming from an arbitrary number of files. Internally, texture sizes beyond the limit are split recursively into layers of a 2D array texture until the limits are met.2D texture coordinates have to be mapped onto 3D coordinates to provide the layer ID as well. As an example, if a texture has a horizontal resolution of 32k texels, it is split into two layers or 16k texels. A texture s-coordinate of 0.75 would end up in the middle of the right layer. Mathematically, this is simply multiplying the texture coordinate with the number of layers and then using the integer part as the layer index and the fraction part as the offset into that layer (0.75 * 2 = 1.5, so layer ID is 1 and s-coordinate 0.5). Doing the same for the t-coordinate gives another layer coordinate and offset, arranging the layers row by row, the layerID finally becomes layerID = (layerY*layersPerRow+layerX).
Reporting texture fetch misses back to the applicationtinySG uses image buffers from the ARB_shader_image_load_store extension to report texture misses back from the PRT shader to the traverser. Image buffers provide random access to memory, as well as atomic operations.Since the GPU is an embarrassingly parallel device, it is useful to already determine the missing texture tile in the shader, rather than just passing back the failing texture coordinates. In the latter approach, the CPU would need to deal with hundreds of misses sequentially, most of them ending up in the same virtual page anyway.
Fortunately, modern OpenGL (4.2 and up) supports random access to buffers
and atomic read-modify-write operations on buffers. tinySG uses an image
buffer as a bitmask for missing tiles: Each tile ID corresponds to a bit
in that buffer. If a texture fetch miss is detected, the PRT shader
calculated the tile ID that texel should have come from and sets the
corresponding bit in the feedback buffer, like this (find the entire
shader here):
// Retrieve feedback on missing tiles from last frame: glBindBuffer(GL_TEXTURE_BUFFER, feedbackBufferID); buf = glMapBuffer(GL_TEXTURE_BUFFER, GL_READ_WRITE); // buf contains bits of missing tiles. Convert bits to std::vector of tile IDs: TileVec missingTiles; int numMissing = GetMissingTileIDs (buf, missingTiles); // Clear buffer for next renderings: long numTiles, tileSize, tpr; tileCache->GetTilingInfo (tileSize, numTiles, tpr); memset (buf, 0, numTiles/8); glUnmapBuffer(GL_TEXTURE_BUFFER); glBindBuffer(GL_TEXTURE_BUFFER, 0); // Upload missing tiles to sparse texture: if (numMissing) { // tiles missing -> request another frame: trav->SetRedraw(CSG_REDRAW_SHADER); // upload missing tiles, as indicated in bitmap: int numUploaded = UploadTiles (missingTiles); } // Setup texture state: GLint sparseTexID = m_sparseGLIDs [trav->pipeID]; glActiveTexture (GL_TEXTURE0+sparseTexUnit); glBindTexture (GL_TEXTURE_2D_ARRAY, sparseTexID); GLint lowresTexID = m_loresGLIDs [trav->pipeID]; glActiveTexture (GL_TEXTURE0+lowresTexUnit); glBindTexture (GL_TEXTURE_2D_ARRAY, lowresTexID); // Setup Shader (ShaderNode has been visited bz the traverser before, shader is active!) GLuint H = shader->GetShaderProgID (pipeID); GLint loc = glGetUniformLocation(H, "data"); glUniform1i(loc, bufIndex); glBindImageTextureEXT(bufIndex, bufTexture, 0, GL_FALSE, 0, GL_READ_WRITE, format); loc = glGetUniformLocation (H, "sparseTex"); glUniform1i(loc, sparseTexUnit); loc = glGetUniformLocation (H, "lowResTex"); glUniform1i(loc, &m_unit); loc = glGetUniformLocation (H, "numLayers"); glUniform2fv(loc, 1, m_layerLayout); loc = glGetUniformLocation (H, "imageSize"); glUniform2fv(loc, 1, iSize); loc = glGetUniformLocation (H, "tileSize"); glUniform1f(loc, fTileSize); The sparse textures implementation for tinySG was motivated by the wish to make use of textures from google earth or the fantastic images available from NASA, which provide earth imagery resolutions up to 500m/pixel. Update 27.06.2015: The notes above deal with an AMD-specific extension. In August 2013, the OpenGL ARB released a similar cross-vendor extension to add sparse texture support to OpenGL. Find more information on the ARB version in the tech guide on ARB_sparse_texture.
Keep rendering,
Acknowledgements:
Copyright by Christian Marten, 2012 Last change: 15.08.2015 |