Category Archives: Programming

A cumbersome bug in the Catalyst 7.12

The latest Catalyst version is the 7.12 (the Cat7.12 internal number is 8.442.0.0). But exactly like the Cat7.11, these drivers have a bug in the management of dynamic lights in GLSL. But this time, I searhed for the source of bug because this bug is a little bit cumbersome in Demoniak3D demos. And we can’t use a previous version since Cat7.11+ are required to drive the radeon 3k (HD 3870/3850). Then I’ve coded a small Demoniak3D script that shows the bug. This script displays a mesh plane lit by a single dynamic light. The key SPACE allows to switch the GLSL shader: from the bug-ridden to the fixed and inversely.

– The following image shows the plane enlightened with the fixed shader:

– The following image shows the plane lit with the bug-ridden shader:

Okay that’s cool, but where does the bug come from ? After a little time spent on shaders tweaking, my conclusion is that the bug is localized in the value of the built-in uniform variable gl_LightSource[0].position. In the vertex shader, this variable should contain the light position in camera space. It’s OpenGL that does this transformation, and we, poor developers, just need to specify somwhere in the C++ app the light position in world coordinates. In the vertex shader, gl_LightSource[0].position helps us to get the light direction used later in the pixel shader:

	lightDir = gl_LightSource[0].position.xyz - vVertex;

With the Catalyst 7.11 and 7.12, the value stored in gl_LightSource[0].position is wrong. Then, one workaround, until the ATI driver team fix the bug, is to manually compute the light pos in camera space by passing to the vertex shader the camera view matrix and the light pos in world coord:

	vec3 lightPosEye = vec3(mv * vec4(-150.0, 50.0, 0.0, 1.0));
	lightDir = lightPosEye - vVertex;

mv is the 4×4 view matrix and vec4(-150.0, 50.0, 0.0, 1.0) is the hard coded light pos in world coord.

In the fixed pipeline, dynamic lights are well handled as shown in the next image:

In the Demoniak3D demo, the bug-ridden GLSL shader is called OneDynLightShader and the fixed one OneDynLightShader_Fixed. The demo source code is localized in the OneDynLightTest.xml file. To start the demo, unzip the archive in a directory and launch
DEMO_Catalyst_Bug.exe.

The demo is downloadable here: Demoniak3D Catalyst 7.11/7.12 Bug

This bug seems to affect all radeons BUT under Windows XP only. Seems as if ATI is forcing people to switch to Vista. Not cool… Or maybe ATI begins to implement OpenGL 3.0 in the Win XP drivers. Do not forget that with OpenGL 3.0 as with DX10, the fixed functions of the 3D pipeline like the management of dynamic lights will be removed.

Les Catalyst 7.12 toujours à la sauce “Bug-Inside”

Les derniers pilotes Catalyst ont la version 7.12 (le numéro interne des Cat7.12 est le 8.442.0.0 – c’est pas un téléphone ok!). Mais exactement comme les 7.11, ces drivers ont un bug dans la gestion des lumières dynamiques en GLSL. Mais cette fois-ci, je me suis mis à la recherche du bug car il est un peu, voire très génant pour les demos Demoniak3D. J’ai donc pondu un petit script Demoniak3D qui met en évidence ce bug. Ce script montre un mesh plan éclairé par une lumière dynamique. Un appui sur la touche SPACE permet de changer de shader GLSL: on passe du shader bogué au shader corrigé et inversement.

– L’image suivante montre le plan éclairé avec le shader corrigé:

– L’image suivante montre le plan éclairé avec le shader bogué:

okay tout ceci est bien, mais d’oû vient le bug? Après avoir passé un peu de temps à tweaker les shaders, j’en suis arrivé à la conclusion que le bug se situe au niveau de la valeur contenue dans la variable uniforme built-in gl_LightSource[0].position. Au niveau du vertex shader, cette variable contient la position de la lumière exprimée l’espace de la caméra. C’est OpenGL qui effectue cette transformation, à notre niveau il suffit de spécifier la position de la lumière en coordonnées du monde. Au niveau du vertex shader, gl_LightSource[0].position nous permet de calculer la direction de la lumière utilisée plus tard dans le pixel shader:

	lightDir = gl_LightSource[0].position.xyz - vVertex;

Avec la Radeon HD 3870 et les Catalyst 7.11 et 7.12, la valeur contenue dans gl_LightSource[0].position est fausse.
Donc le workaround que je propose en attendant que la driver team d’ATI corrige le bug, est de passer au vertex shader la position de la lumière exprimée dans les coordonnées du monde ainsi que la matrice de vue de la camera et de faire la transformation à la main:

	vec3 lightPosEye = vec3(mv * vec4(-150.0, 50.0, 0.0, 1.0));
	lightDir = lightPosEye - vVertex;

mv représente la matrice 4×4 de vue de la caméra et vec4(-150.0, 50.0, 0.0, 1.0) représente la position de la lumière en coordonnées du monde.

Au niveau pipeline fixe, les lumières dynamiques sont bien gérées comme le montre l’image suivante:

Au niveau de la démo Demoniak3D, le shader GLSL bogué est appelé OneDynLightShader et celui corrigé OneDynLightShader_Fixed. Le code source de la démo Demoniak3D se trouve dans le fichier OneDynLightTest.xml. Pour lancer la demo, dézippez l’archive dans un répertoire
et lancez DEMO_Catalyst_Bug.exe.

La démo est téléchargeable ici: Demoniak3D Catalyst 7.11/7.12 Bug

Ce bug affecte toutes les radeons MAIS sous Windows XP uniquement. On dirait qu’ATI nous force un peu la main pour passer sous Vista. Pas trop sympa… Ou alors ATI commence à implémenter OpenGL 3.0 dans les drivers XP. Car n’oublions pas qu’avec OpenGL 3.0, tout comme avec DX10, les fonctions fixes du pipelines 3D comme la gestion des lumières dynamiques sont supprimées.

Je voudrais remercier la communauté WorldPCSpecs pour les tests. Merci les gars!

GLSL: ATI vs NVIDIA

Today two new differences between Radeon and Geforce GLSL support.

1 – float2 / vec2
vec2 is the GLSL type to hold a 2d vector. vec2 is supported by NVIDIA and ATI. float2 is a 2d vector but for Direct3D HLSL and for Cg. The GLSL compilation for Geforce is done via the NVIDIA Cg compiler. Here is the GLSL version displayed by GPU Caps Viewer: 1.20 NVIDIA via Cg compiler. That explains why a GLSL source that contains a float2 is compilable on NVIDIA hardware. But the GLSL compiler of ATI is strict and doesn’t recognize the float2 type.

2 – the following line:

vec2 vec = texture2D( tex, gl_TexCoord[0].st );

is valid for NVIDIA compiler but produces an error with ATI compiler. One again, the ATI GLSL compiler has done a good job. By default, texture2D() returns a 4d vector. The right syntax is:

vec2 vec = texture2D( tex, gl_TexCoord[0].st ).xy;

Conclusion: always test your shaders on both ATI and NVIDIA platforms unless you target one platform only.

R600 is VTF-capable

“All of the fetch and filtering capabilities are available to each thread type, making the samplers completely agnostic about what’s using them.”

This line from Beyond3D article on R600 means that vertex, geometry and pixel shaders can access to texture samplers. So Vertex Texture Fetching is now available with Radeon 2k series :thumbup: What’s more, the R600 can handle very large texture up to 8192×8192 just like the G80.

Embedded Your Shader Souce Code In Your C/C++ Apps

The NVIDIA developer blog shows a way to include shaders codes to your
windows exe: blogs.nvidia.com/developers/2007/03/inlining_shader.html.

But this example is not fully operational. I slightly modified the code to make it totally operational (I compiled it on vc++ 6.0):

1) Add a define to your resource.h file:
#define IDF_SHADEFILE 1000

2) Add an entry in your resource.rc file:
IDF_SHADERFILE RCDATA DISCARDABLE "myShader.glsl"

3) Use the resource in your code:
HMODULE hModule = GetModuleHandle(NULL);
HRSRC hResource = FindResource(hModule, (LPCTSTR)IDF_SHADERFILE, RT_RCDATA);
if(hResource) 
{
  DWORD dwSize = SizeofResource(hModule, hResource);
  HGLOBAL hGlobal = LoadResource(hModule, hResource);
  if(hGlobal) 
  {
    LPVOID pData = LockResource(hGlobal);
    if(pData) 
    {
	// Cast pData to a char * and you have your shader
	char *shader_code = (char *)pData;
	
        // Now do whatever you want with shader_code pointer. 
	// Do not forget that shader_code is not a zero-terminated string!
	// Use dwSize to handle that.
			
    }
  }
}

Ageia PhysX SDK for free

Ageia has announced new licensing terms, allowing its PhysX SDK to be used and its runtime components distributed in all commercial and non-commercial PC projects for free.

This is a really good news for the community and for Hyperion! I filled up the register form and now I hope to receive the download link quickly.

Depth Map Filtering – ATI vs NVIDIA

Really ATI has some problems with OpenGL. Now I’m working on soft shadows and my tmp devstation has a Radeon X700 (not the top-notch I know but an enough powerful CG). With my X700 (Catalyst 6.6) the soft shadow edges are rendered as follows:

And on my second CG, a nVidia 6600gt (forceware 91.31), the soft shadows are as follows:

The GLSL shaders are the same, a 5×5 bluring kernel, with a shadow map (or depth map as you want) of 1024×1024 (via a FBO) with a linear filtering. Now if I set the nearest filtering mode, I get the following results for the X700:

and for the 6600gt:

It seems as if the Radeon GPU has a bug in the filtering module when the gpu has to apply a linear filter on a depth map. Very strange.
I’m not satisfied by this explanation but it’s the only I see for the moment.

This kind of problem shows how it’s important for a graphics developer to have at least 2 workstations, one with a nVidia board and the other with an ATI CG. I tell you, realtime 3D is made of blood, sweat and screams! :winkhappy:

NPOT Textures

It’s nice to come back to code!

I’m currently working on a new and simple framework for my OpenGL experimentations before implementing the algorithms in the oZone3D Engine . RaptorGL is a little bit too heavy for simple tests so for the moment I drop it. This new framework I called XPGL (eXPerimental Graphics Library), allows me to quickly test the new algos I’m working on. Every time I have to code a little but fully operational 3D demo in c++/opengl, I spend lot of time for a small result. In these moments, I say to myself that Hyperion is a very cool tool.

Okay, let’s see a weird behavour of radeon gpu. At the moment, my graphics controller is a Radeon X700. With the latest catalyst drivers (6.6), this graphics board should be an OpenGL 2.0 compliant CG. A little check to the GL_VERSION tells me the X700 is GL2 compliant. Then the X700 should handle non power of two texture since this feature is part of the OpenGL 2.0 core. But the GL_ARB_texture_non_power_of_two string is not found in the GL_EXTENSIONS. Maybe ATI does not mention the extensions that are part of the core. Anyways, I loaded a 600×445 npot-texture on a mesh plane and the X700 seems to support this texture. But with a ridiculous fps of 1… Software codepath? I think so! So I decided to load the same texture with power of two dims (512×512) and the fps is become decent again. With my gf6600gt (with the forceware 91.31) I never noticed this effect/bug because the GL2-support is better and nVidia gpus correctly support non power of two texture. You can download the demo with the npot and pot texture (the one mapped onto the mesh plane) hereafter and do the test for yourself. Feel free to drop me a feedback if you wish.

NPOT_Demo.zip (659k)

But keep in mind that graphics hardware is optimized for POT textures. Try to use POT textures in order to maximize your chances to see your demo running everywhere.

Compilation of Ogre3D v1.2.1

Just for fun (and for benchmarking too), I’ve done a compilation of the latest Ogre3D engine (v1.2.1).

For the sake of the test I used vc2005 and the two following files available on www.ogre3d.org:
– OGRE 1.2.1 Source for Windows
– Dependencies 1.2.0 for Visual C++ 2005(8.0)

Unpack both archives and put the content of Dependencies archive into ogre folder. Now you’re ready to load the ogre_vc8.sln solution. Once done, you have to enable the compiler timer with:
[i]Tools->Options->Projects->VC++ Build and setting Build Timing to Yes[/i].

Okay, everything is ready for the compilation of the OgreMain project.

With my workstation (p4 3.6GHz, 2Go DDR2 533, hdd WD 200Go SATA1), the compilation and linking took: Build Time 12:04 (read 12 min 04 sec!) :raspberry:

Blue Screen of Death

The famous Blue Screen of Death is back:

It’s a long time I didn’t see it. I’m working on VBO in a new eXPerimental 3D engine and I
certainly must have passed a wrong face offset to the index buffer. Little bug in my side, no doubt. Bug in NVIDIA Forceware side: I don’t know how the drivers have to behave when wrong parameters are sent to them, but I don’t think they have to freeze your devstation!

More on this bug later…

ATI X1900XTX and VTF

I’ve just received an email from an user saying that he was’nt able to run the demos of the Vertex Displacement Mapping Tutorial on his brand new Radeon X1900XTX. VTF or Vertex Texture Fetching is a cool feature of high-end graphics chipsets and it’s part of Shader Model 3.0. The X1900 series is based on the R500 chipset (R580) that is a SM3.0 complient GPU. But in OpenGL side and especially in GLSL, VTF is not supported. The OpenGL query done with GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS_ARB always returns 0. That means that no texture units are available in the vertex shader.

ATI confirms this fact in one of its whitepapers shipped with the ATI SDK (ATI OpenGL Programming and Optimization Guide.pdf). At the page 11, we can read this: [i]”All ATI graphics HW have a few items that deserve special consideration when using GLSL. The first major item of note is the absence of vertex texture units. This means that vertex texturing is never available, and all shaders attempting to use texture functions in the vertex shader will fail to link.”[/i]. I know, this is a rude reality. The R580 GPU is really powerful and it’s a pity that ATI does not support VTF in his chipsets. I don’t know how the R580 behaves in D3D side but I can suppose the GPU has the same limitations. VTF is currently supported by Geforce chipset from 6600 to 7900. Conclusion: if you wish to play with VTF, use a nVidia board.

Maybe, all these problems will be solved with the SM4.0. I hope! :winkhappy: