Unity的GPU Instancing


GPU Instancing可以用来批量绘制大量相同几何结构相同材质的物体,以降低绘制所需的batches。要想在Unity中使用,首先需要至少在shader的某个pass中加上#pragma multi_compile_instancing。由于instancing的每个物体所需要的绘制数据可能各不相同,因此还需要在shader中传递一个instanceId:

struct VertexData {
	UNITY_VERTEX_INPUT_INSTANCE_ID
	float4 vertex : POSITION;
	…
};

UNITY_VERTEX_INPUT_INSTANCE_ID宏定义如下:

// - UNITY_VERTEX_INPUT_INSTANCE_ID     Declare instance ID field in vertex shader input / output struct.
#   define UNITY_VERTEX_INPUT_INSTANCE_ID DEFAULT_UNITY_VERTEX_INPUT_INSTANCE_ID

#if defined(UNITY_INSTANCING_ENABLED) || defined(UNITY_PROCEDURAL_INSTANCING_ENABLED) || defined(UNITY_STEREO_INSTANCING_ENABLED)
    #ifdef SHADER_API_PSSL
        #define DEFAULT_UNITY_VERTEX_INPUT_INSTANCE_ID uint instanceID;
    #else
        #define DEFAULT_UNITY_VERTEX_INPUT_INSTANCE_ID uint instanceID : SV_InstanceID;
    #endif

#else
    #define DEFAULT_UNITY_VERTEX_INPUT_INSTANCE_ID
#endif

其实就是在启用gpu instancing时定义一个instanceID。

除此之外,我们需要在shader的开头部分使用UNITY_SETUP_INSTANCE_ID宏进行设置:

InterpolatorsVertex MyVertexProgram (VertexData v) {
	InterpolatorsVertex i;
	UNITY_INITIALIZE_OUTPUT(Interpolators, i);
	UNITY_SETUP_INSTANCE_ID(v);
	i.pos = UnityObjectToClipPos(v.vertex);
	…
}

UNITY_SETUP_INSTANCE_ID宏展开如下:

// - UNITY_SETUP_INSTANCE_ID        Should be used at the very beginning of the vertex shader / fragment shader,
//                                  so that succeeding code can have access to the global unity_InstanceID.
//                                  Also procedural function is called to setup instance data.
#   define UNITY_SETUP_INSTANCE_ID(input) DEFAULT_UNITY_SETUP_INSTANCE_ID(input)

#define DEFAULT_UNITY_SETUP_INSTANCE_ID(input)          { UnitySetupInstanceID(UNITY_GET_INSTANCE_ID(input)); UnitySetupCompoundMatrices(); }

这个宏主要做了两件事,第一是设置全局的unity_InstanceID变量,该变量用于索引shader用到的各类内置矩阵(例如object to world)的数组:

void UnitySetupInstanceID(uint inputInstanceID)
    {
        #ifdef UNITY_STEREO_INSTANCING_ENABLED
            #if defined(SHADER_API_GLES3)
                // We must calculate the stereo eye index differently for GLES3
                // because otherwise,  the unity shader compiler will emit a bitfieldInsert function.
                // bitfieldInsert requires support for glsl version 400 or later.  Therefore the
                // generated glsl code will fail to compile on lower end devices.  By changing the
                // way we calculate the stereo eye index,  we can help the shader compiler to avoid
                // emitting the bitfieldInsert function and thereby increase the number of devices we
                // can run stereo instancing on.
                unity_StereoEyeIndex = round(fmod(inputInstanceID, 2.0));
                unity_InstanceID = unity_BaseInstanceID + (inputInstanceID >> 1);
            #else
                // stereo eye index is automatically figured out from the instance ID
                unity_StereoEyeIndex = inputInstanceID & 0x01;
                unity_InstanceID = unity_BaseInstanceID + (inputInstanceID >> 1);
            #endif
        #else
            unity_InstanceID = inputInstanceID + unity_BaseInstanceID;
        #endif
    }

第二就是重新定义常用的矩阵:

void UnitySetupCompoundMatrices()
        {
            unity_MatrixMVP_Instanced = mul(unity_MatrixVP, unity_ObjectToWorld);
            unity_MatrixMV_Instanced = mul(unity_MatrixV, unity_ObjectToWorld);
            unity_MatrixTMV_Instanced = transpose(unity_MatrixMV_Instanced);
            unity_MatrixITMV_Instanced = transpose(mul(unity_WorldToObject, unity_MatrixInvV));
        }

注意这里的unity_ObjectToWorldunity_WorldToObject也已经被重新定义过了:

#define unity_ObjectToWorld     UNITY_ACCESS_INSTANCED_PROP(unity_Builtins0, unity_ObjectToWorldArray)
        #define MERGE_UNITY_BUILTINS_INDEX(X) unity_Builtins##X
        #define unity_WorldToObject     UNITY_ACCESS_INSTANCED_PROP(MERGE_UNITY_BUILTINS_INDEX(UNITY_WORLDTOOBJECTARRAY_CB), unity_WorldToObjectArray)

        inline float4 UnityObjectToClipPosInstanced(in float3 pos)
        {
            return mul(UNITY_MATRIX_VP, mul(unity_ObjectToWorld, float4(pos, 1.0)));
        }
        inline float4 UnityObjectToClipPosInstanced(float4 pos)
        {
            return UnityObjectToClipPosInstanced(pos.xyz);
        }
        #define UnityObjectToClipPos UnityObjectToClipPosInstanced

开启gpu instancing时,这里实际上就是用instanceId去对应的矩阵数组中进行索引。

Unity GPU Instancing unity gpu instancing 代码_#define

正是因为每次batch都需要传递给gpu的是矩阵数组而不是矩阵本身,batch的大小需要进行限制,即最多一次只会将有限数量的几何体合并到一个batch进行gpu instancing。unity定义了一个UNITY_INSTANCED_ARRAY_SIZE宏来表示最大数量的限制。

gpu instancing同样支持阴影和多光源的情况。对于阴影,只需要在shadow caster的pass中加上对应的instancing声明即可:

#pragma multi_compile_shadowcaster
#pragma multi_compile_instancing

struct VertexData {
	UNITY_VERTEX_INPUT_INSTANCE_ID
};

InterpolatorsVertex MyShadowVertexProgram (VertexData v) {
	InterpolatorsVertex i;
	UNITY_SETUP_INSTANCE_ID(v);
}

Unity GPU Instancing unity gpu instancing 代码_ci_02

对于多光源的情况,则需要使用延迟渲染路径:

Unity GPU Instancing unity gpu instancing 代码_unity_03

然而,默认的gpu instancing只能支持相同材质,这在使用时会很不方便,有时候可能仅仅想要修改材质的某个属性,例如这里修改不同球体的颜色,会导致instancing失效:

Unity GPU Instancing unity gpu instancing 代码_图形学_04

我们可以使用MaterialPropertyBlock来避免修改颜色时创建出新的材质:

MaterialPropertyBlock properties = new MaterialPropertyBlock();
			properties.SetColor(
				"_Color", new Color(Random.value, Random.value, Random.value)
			);
			t.GetComponent<MeshRenderer>().SetPropertyBlock(properties);

为了在shader代码中使用到此属性,需要在instancing buffer中对其定义:

UNITY_INSTANCING_BUFFER_START(InstanceProperties)
	UNITY_DEFINE_INSTANCED_PROP(float4, _Color)
#define _Color_arr InstanceProperties
UNITY_INSTANCING_BUFFER_END(InstanceProperties)

对宏进行展开,可以发现就是定义了一个包含struct数组的cbuffer,其中struct中定义了我们新增的属性:

#define UNITY_INSTANCING_BUFFER_START(buf)      UNITY_INSTANCING_CBUFFER_SCOPE_BEGIN(UnityInstancing_##buf) struct {
    #define UNITY_INSTANCING_BUFFER_END(arr)        } arr##Array[UNITY_INSTANCED_ARRAY_SIZE]; UNITY_INSTANCING_CBUFFER_SCOPE_END
    #define UNITY_DEFINE_INSTANCED_PROP(type, var)  type var;

如果要把vertex shader中使用的instanceId传递到fragment shader,可以使用unity提供的UNITY_TRANSFER_INSTANCE_ID

InterpolatorsVertex MyVertexProgram (VertexData v) {
	InterpolatorsVertex i;
	UNITY_INITIALIZE_OUTPUT(Interpolators, i);
	UNITY_SETUP_INSTANCE_ID(v);
	UNITY_TRANSFER_INSTANCE_ID(v, i);
	…
}

这个宏定义很简单:

#define UNITY_TRANSFER_INSTANCE_ID(input, output)   output.instanceID = UNITY_GET_INSTANCE_ID(input)

那么最终要如何正确读取这个cbuffer的属性呢?这里Unity也提供了配套的宏:

float3 GetAlbedo (Interpolators i) {
	float3 albedo =
		tex2D(_MainTex, i.uv.xy).rgb * UNITY_ACCESS_INSTANCED_PROP(_Color_arr, _Color).rgb;
	...
}

这个宏定义也很简单,就是从之前定义的struct数组中,根据instanceId进行索引,再取出对应的变量:

#define UNITY_ACCESS_INSTANCED_PROP(arr, var)   arr##Array[unity_InstanceID].var

经过修改之后,再次运行,可以发现batch降低了,instancing生效了:

Unity GPU Instancing unity gpu instancing 代码_游戏引擎_05