Mathematics of the depth metric when generating shadow maps and rendering with shadows

This article has been written to keep track of how the depth metric is computed when generating shadow maps and rendering shadows in Babylon.js, as understanding how we ended up using the formula described below can be hard at times, especially when you throw into the mix the support of the reverse depth buffer and the reduced NDC Z range that WebGPU is using...

Generating the shadow map

Shadow maps are generated by the ShadowGenerator class for standard shadows, and by the CascadedShadowGenerator class for cascaded shadow maps.

The idea is to generate a texture that contains the depth of the geometry which is closest to the light when rendering the scene from the light point of view. Basically, this depth is the z coordinate of the 3D point when transformed into the view space of the light (in this space, the Z axis is going forward). So, if we have two points A and B, if zA < zB in this space, A is closer to the light than B and it is zA which will be written to the shadow map.

Babylon.js is not doing anything fancy here and is simply using the transformation matrix (view x projection) of the light to render the shadow casters and generate the shadow map. However, there are several cases to consider.

PCF and PCSS filtering

When using PCF (Percentage Closer Filtering) and PCSS (Percentage Closer Soft Shadows) to render actual shadows, we are using as our shadow map the depth texture generated by the GPU when rendering the shadow casters. So, this texture is automatically generated as part of the rendering and we have nothing specific to do, except applying the bias value defined in the shadow generator. The shader code looks like this (in the shadowMapVertexMetric.fx file):

#if SM_DEPTHTEXTURE == 1
#ifdef IS_NDC_HALF_ZRANGE
#define BIASFACTOR 0.5
#else
#define BIASFACTOR 1.0
#endif
#if SM_USE_REVERSE_DEPTHBUFFER == 1
gl_Position.z -= biasAndScaleSM.x * gl_Position.w * BIASFACTOR;
#else
gl_Position.z += biasAndScaleSM.x * gl_Position.w * BIASFACTOR;
#endif
#endif

SM_DEPTHTEXTURE is set to 1 only when using PCF/PCSS filtering. biasAndScaleSM.x is the bias value (note that the normal bias is applied earlier and modifies the world position of the 3D point).

We are multiplying by gl_Position.w because the GPU, as part of its computations, will do gl_Position.z / gl_Position.w before writing the value to the depth texture: by pre-multiplying by gl_Position.w, we make sure the final result is simply biased by a constant biasAndScaleSM.x * BIASFACTOR value.

When the NDC space has a 0..1 Z range (meaning IS_NDC_HALF_ZRANGE is defined), we use a bias factor of 0.5 so that the final bias applied to the position has the same scale than when the range is -1..1.

Note that in the standard case (when not using the reverse depth buffer), we add the bias to the position, so we move a little farther the depth value / the geometry. There is another strategy that would be to not apply the bias in the shadow map but at the shadow rendering stage. In that case, we would subtract the bias from the current depth (from the light) of the pixel to achieve the same result.

Finally, when using the reverse depth buffer we simply reverse (negate) the bias offset as now bigger z values means nearer geometries.

Generating a depth metric

When not using PCF / PCSS modes (actually, we also need the depth metric described here in PCSS mode), we need to generate a depth metric, which is the depth value we will use when doing depth comparisons to compute the shadow level of a given pixel.

The computation we are doing to generate this value is (in the shadowMapVertexMetric.fx file):

#if SM_USE_REVERSE_DEPTHBUFFER == 1
vDepthMetricSM = (-gl_Position.z + depthValuesSM.x) / depthValuesSM.y + biasAndScaleSM.x;
#else
vDepthMetricSM = (gl_Position.z + depthValuesSM.x) / depthValuesSM.y + biasAndScaleSM.x;
#endif

The aim is to generate a normalized value between 0..1 from the gl_Position.z value, which is the z component of the 3D vertex after the transformation matrix (view x projection) has been applied.

In the next sections, we are going to explain which values to set in depthValuesSM.x and depthValuesSM.y to achieve this goal.

As a preamble, we will only focus on the projection matrix because we are only interested in how the projection remaps the z values to the NDC space and the view matrix does not come into play in this computation.

Notes:

  • we could have used the depth texture described in the previous section in all cases to retrieve the depth values we need and avoid having to deal with this depth metric, but for historical reasons and because WebGL1 does not support depth textures, we need this depth metric.
  • this section assumes the NDC Z range is -1..1. We will handle the 0..1 range later.
  • the reverse depth buffer case is handled simply by swapping the near and far planes of the light in the projection matrix
  • the projection matrices we are dealing with are for a left handed coordinate system but the results are the same for a right handed system
  • the depth renderer is also using the same computation to generate the depth texture, so what we are describing below for the spotlight (perspective projection) is applicable to the depth renderer (the light being replaced by the camera).

Directional light

Directional lights are using an orthographic projection to transform points to NDC space (clip space to be precise). This projection is:

formula

n and f are the near and far planes of the light (light.shadowMinZ / light.shadowMaxZ if defined, camera.minZ / camera.maxZ if not) respectively. Note that we are only interested in the transformation of the z coordinate, so we don't need the a, b, i0 and i1 values:

formula

formula

It's a linear function of z (which is something we want), but the range is not 0..1 when z takes values between n and f:

formula

formula

So the range is -1..1. To remap this range to 0..1 we can simply add 1 to z and divide everything by 2.

Looking back at how vDepthMetric is defined:

vDepthMetricSM = (gl_Position.z + depthValuesSM.x) / depthValuesSM.y + biasAndScaleSM.x;

(don't forget that z_ortho = gl_Position.z)

We simply need to have depthValuesSM.x = 1 and depthValuesSM.y = 2.

In the javascript code, the depthValuesSM shader variable is set like this:

effect.setFloat2("depthValuesSM", this.getLight().getDepthMinZ(scene.activeCamera), this.getLight().getDepthMinZ(scene.activeCamera) + this.getLight().getDepthMaxZ(scene.activeCamera));

So:

depthValuesSM.x = this.getLight().getDepthMinZ(scene.activeCamera);
depthValuesSM.y = this.getLight().getDepthMinZ(scene.activeCamera) + this.getLight().getDepthMaxZ(scene.activeCamera);

Which means that for directional lights, getDepthMinZ must return 1 and getDepthMaxZ must also return 1.

In the reverse depth buffer case:

formula

formula

This time the range is 1..-1. However, in the shader, for the reverse depth buffer case we have:

vDepthMetricSM = (-gl_Position.z + depthValuesSM.x) / depthValuesSM.y + biasAndScaleSM.x;

which means z_ortho is multiplied by -1 before the addition with depthValuesSM.x. So, 1..-1 is becoming -1..1 and we are now back to the same case than previously, so we need the same values in depthValuesSM.x and depthValuesSM.y (that is, 1 and 2 respectively).

Spot light

Spot lights are using a perspective projection to transform points to NDC space (clip space to be precise). This projection is:

formula

formula

formula

Regarding the range when z takes values between n and f:

formula

formula

formula

formula

The range is -n..f, which means that for spot lights we need getDepthMinZ to return n and getDepthMaxZ to return f to remap this range to 0..1 once we apply the computation (recall that depthValuesSM.x = light.getDepthMinZ() and depthValuesSM.y = light.getDepthMinZ() + light.getDepthMaxZ()):

vDepthMetricSM = (gl_Position.z + depthValuesSM.x) / depthValuesSM.y + biasAndScaleSM.x;

As for the directional light case, the reverse depth buffer range is negated compared to the normal case, but because of the minus sign in front of gl_Position.z in the vDepthMetricSM formula, getDepthMinZ and getDepthMaxZ must return the same values.

Point light

Point lights are using shadow maps that are storing the distance of the geometry to the light. This distance is computed as length(position - lightPosition), which is then remapped to the 0..1 range (in the shadowMapFragment.fx file):

depthSM = (length(vPositionWSM - lightDataSM) + depthValuesSM.x) / depthValuesSM.y + biasAndScaleSM.x;

It's the same computation than previously described except that we are using the distance to the light instead of the depth. vPositionWSM is the world position of the point and lightDataSM the world position of the light. There's no specific case for the reverse depth buffer mode as it is irrelevant: we are computing a distance, not a depth.

Note that in reality we are not remapping to 0..1 with this formula because length(vPositionWSM - lightDataSM) has no maximum bound, it can go to +infinity: vPositionWSM is constrained to be in the view frustum but the light can be positioned anywhere in the world. So, to simplify things, we setup the getDepthMinZ and getDepthMaxZ functions to return the same values than in the spot light case, meaning n and f respectively. It's not really important that we are not remapping strictly to 0..1 as long as we use the same computation when rendering shadows, so that both values can be compared.

Notes:

  • Even if length(position - lightPosition) can go to +infinity in theory, the point light is generally not too far from the geometry which is currently in the view frustum because for lights that would be too far their contributions would be very small (or 0) and the light would not cast shadows anyway (every (point) light as a maximum distance after which it falls to 0 intensity)
  • we could remove the remapping altogether and simply use length(position - lightPosition) as the depth metric, but that would require using a float texture in all cases. When in WebGL1 mode and if the float texture extension is not supported, we are using a UNORM 8 bits texture, so we need a 0..1 remapping

Generating a depth metric (NDC 0..1 Z range)

When using a NDC space where the z coordinate is in the 0..1 range, the orthographic and perspective projection matrices do change. Let's see how it changes the results from the previous section.

Directional light

formula

formula

formula

formula

formula

formula

formula

We can see that in the non reverse depth buffer case the remapping is already 0..1, so getDepthMinZ should return 0 and getDepthMaxZ should return 1.

In the reverse depth buffer case, as we have a negation of z in the vDepthMetric formula, the z_ortho range is -1..0. We need to add 1 to remap to 0..1. To do that, we can simply have getDepthMinZ return 1 and getDepthMaxZ return 0.

Spot light

formula

formula

formula

formula

formula

formula

formula

In the non reverse depth buffer case, we need to remap 0..f to 0..1: we need to divide by f. To do that, getDepthMinZ should return 0 and getDepthMaxZ should return f.

In the reverse depth buffer case, we need to remap -n..0 (don't forget that when using the reverse depth buffer we have -gl_Position.z in the vDepthMetric formula, not gl_Position.z) to 0..1: we need to add n and divide by n. To do that, getDepthMinZ should return n and getDepthMaxZ should return 0.

Point light

Point lights are no different than in the NDC -1..1 range case because we are exclusively dealing with distances, the NDC z range is irrelevant.

Shadow rendering

There's not much to say regarding the shadow rendering part: we simply have to make sure we use the exact same formula to compute the depth metric of the current pixel than the ones used to generate the shadow maps. The shader code used to compute the vDepthMetric value is in this case (in the shadowsVertex.fx file):

#if USE_REVERSE_DEPTHBUFFER
vDepthMetric{X} = (-vPositionFromLight{X}.z + light{X}.depthValues.x) / light{X}.depthValues.y;
#else
vDepthMetric{X} = (vPositionFromLight{X}.z + light{X}.depthValues.x) / light{X}.depthValues.y;
#endif

So, we must pass in light{X}.depthValues.x and light{X}.depthValues.y the same values that we passed in the depthValuesSM.x and depthValuesSM.y parameters when generating the shadow maps.

To sum up

First recall that:

depthValues.x = light.getDepthMinZ(camera);
depthValues.y = light.getDepthMinZ(camera) + light.getDepthMaxZ(camera);

and that n is the near plane distance and f the far plane distance (light.shadowMinZ / light.shadowMaxZ if defined, camera.minZ / camera.maxZ else):

Directional lightSpot lightPoint light
NDC -1..1
Directional light
minZ=1
Spot light
minZ=n
Point light
minZ=n
 
Directional light
maxZ=1
Spot light
maxZ=f
Point light
maxZ=f
NDC -1..1 + reverse depth buffer
Directional light
minZ=1
Spot light
minZ=n
Point light
minZ=n
 
Directional light
maxZ=1
Spot light
maxZ=f
Point light
maxZ=f
NDC 0..1
Directional light
minZ=0
Spot light
minZ=0
Point light
minZ=n
 
Directional light
maxZ=1
Spot light
maxZ=f
Point light
maxZ=f
NDC 0..1 + reverse depth buffer
Directional light
minZ=1
Spot light
minZ=n
Point light
minZ=n
 
Directional light
maxZ=0
Spot light
maxZ=0
Point light
maxZ=f

In this table, minZ is for getDepthMinZ and maxZ is for getDepthMaxZ.