Unity的GPU Instancing
GPU Instancing可以用来批量绘制大量相同几何结构相同材质的物体,以降低绘制所需的batches。要想在Unity中使用,首先需要至少在shader的某个pass中加上#pragma multi_compile_instancing
。由于instancing的每个物体所需要的绘制数据可能各不相同,因此还需要在shader中传递一个instanceId:
struct VertexData {
UNITY_VERTEX_INPUT_INSTANCE_ID
float4 vertex : POSITION;
…
};
UNITY_VERTEX_INPUT_INSTANCE_ID
宏定义如下:
// - UNITY_VERTEX_INPUT_INSTANCE_ID Declare instance ID field in vertex shader input / output struct.
# define UNITY_VERTEX_INPUT_INSTANCE_ID DEFAULT_UNITY_VERTEX_INPUT_INSTANCE_ID
#if defined(UNITY_INSTANCING_ENABLED) || defined(UNITY_PROCEDURAL_INSTANCING_ENABLED) || defined(UNITY_STEREO_INSTANCING_ENABLED)
#ifdef SHADER_API_PSSL
#define DEFAULT_UNITY_VERTEX_INPUT_INSTANCE_ID uint instanceID;
#else
#define DEFAULT_UNITY_VERTEX_INPUT_INSTANCE_ID uint instanceID : SV_InstanceID;
#endif
#else
#define DEFAULT_UNITY_VERTEX_INPUT_INSTANCE_ID
#endif
其实就是在启用gpu instancing时定义一个instanceID。
除此之外,我们需要在shader的开头部分使用UNITY_SETUP_INSTANCE_ID
宏进行设置:
InterpolatorsVertex MyVertexProgram (VertexData v) {
InterpolatorsVertex i;
UNITY_INITIALIZE_OUTPUT(Interpolators, i);
UNITY_SETUP_INSTANCE_ID(v);
i.pos = UnityObjectToClipPos(v.vertex);
…
}
UNITY_SETUP_INSTANCE_ID
宏展开如下:
// - UNITY_SETUP_INSTANCE_ID Should be used at the very beginning of the vertex shader / fragment shader,
// so that succeeding code can have access to the global unity_InstanceID.
// Also procedural function is called to setup instance data.
# define UNITY_SETUP_INSTANCE_ID(input) DEFAULT_UNITY_SETUP_INSTANCE_ID(input)
#define DEFAULT_UNITY_SETUP_INSTANCE_ID(input) { UnitySetupInstanceID(UNITY_GET_INSTANCE_ID(input)); UnitySetupCompoundMatrices(); }
这个宏主要做了两件事,第一是设置全局的unity_InstanceID
变量,该变量用于索引shader用到的各类内置矩阵(例如object to world)的数组:
void UnitySetupInstanceID(uint inputInstanceID)
{
#ifdef UNITY_STEREO_INSTANCING_ENABLED
#if defined(SHADER_API_GLES3)
// We must calculate the stereo eye index differently for GLES3
// because otherwise, the unity shader compiler will emit a bitfieldInsert function.
// bitfieldInsert requires support for glsl version 400 or later. Therefore the
// generated glsl code will fail to compile on lower end devices. By changing the
// way we calculate the stereo eye index, we can help the shader compiler to avoid
// emitting the bitfieldInsert function and thereby increase the number of devices we
// can run stereo instancing on.
unity_StereoEyeIndex = round(fmod(inputInstanceID, 2.0));
unity_InstanceID = unity_BaseInstanceID + (inputInstanceID >> 1);
#else
// stereo eye index is automatically figured out from the instance ID
unity_StereoEyeIndex = inputInstanceID & 0x01;
unity_InstanceID = unity_BaseInstanceID + (inputInstanceID >> 1);
#endif
#else
unity_InstanceID = inputInstanceID + unity_BaseInstanceID;
#endif
}
第二就是重新定义常用的矩阵:
void UnitySetupCompoundMatrices()
{
unity_MatrixMVP_Instanced = mul(unity_MatrixVP, unity_ObjectToWorld);
unity_MatrixMV_Instanced = mul(unity_MatrixV, unity_ObjectToWorld);
unity_MatrixTMV_Instanced = transpose(unity_MatrixMV_Instanced);
unity_MatrixITMV_Instanced = transpose(mul(unity_WorldToObject, unity_MatrixInvV));
}
注意这里的unity_ObjectToWorld
和unity_WorldToObject
也已经被重新定义过了:
#define unity_ObjectToWorld UNITY_ACCESS_INSTANCED_PROP(unity_Builtins0, unity_ObjectToWorldArray)
#define MERGE_UNITY_BUILTINS_INDEX(X) unity_Builtins##X
#define unity_WorldToObject UNITY_ACCESS_INSTANCED_PROP(MERGE_UNITY_BUILTINS_INDEX(UNITY_WORLDTOOBJECTARRAY_CB), unity_WorldToObjectArray)
inline float4 UnityObjectToClipPosInstanced(in float3 pos)
{
return mul(UNITY_MATRIX_VP, mul(unity_ObjectToWorld, float4(pos, 1.0)));
}
inline float4 UnityObjectToClipPosInstanced(float4 pos)
{
return UnityObjectToClipPosInstanced(pos.xyz);
}
#define UnityObjectToClipPos UnityObjectToClipPosInstanced
开启gpu instancing时,这里实际上就是用instanceId去对应的矩阵数组中进行索引。
正是因为每次batch都需要传递给gpu的是矩阵数组而不是矩阵本身,batch的大小需要进行限制,即最多一次只会将有限数量的几何体合并到一个batch进行gpu instancing。unity定义了一个UNITY_INSTANCED_ARRAY_SIZE
宏来表示最大数量的限制。
gpu instancing同样支持阴影和多光源的情况。对于阴影,只需要在shadow caster的pass中加上对应的instancing声明即可:
#pragma multi_compile_shadowcaster
#pragma multi_compile_instancing
struct VertexData {
UNITY_VERTEX_INPUT_INSTANCE_ID
};
InterpolatorsVertex MyShadowVertexProgram (VertexData v) {
InterpolatorsVertex i;
UNITY_SETUP_INSTANCE_ID(v);
}
对于多光源的情况,则需要使用延迟渲染路径:
然而,默认的gpu instancing只能支持相同材质,这在使用时会很不方便,有时候可能仅仅想要修改材质的某个属性,例如这里修改不同球体的颜色,会导致instancing失效:
我们可以使用MaterialPropertyBlock
来避免修改颜色时创建出新的材质:
MaterialPropertyBlock properties = new MaterialPropertyBlock();
properties.SetColor(
"_Color", new Color(Random.value, Random.value, Random.value)
);
t.GetComponent<MeshRenderer>().SetPropertyBlock(properties);
为了在shader代码中使用到此属性,需要在instancing buffer中对其定义:
UNITY_INSTANCING_BUFFER_START(InstanceProperties)
UNITY_DEFINE_INSTANCED_PROP(float4, _Color)
#define _Color_arr InstanceProperties
UNITY_INSTANCING_BUFFER_END(InstanceProperties)
对宏进行展开,可以发现就是定义了一个包含struct数组的cbuffer,其中struct中定义了我们新增的属性:
#define UNITY_INSTANCING_BUFFER_START(buf) UNITY_INSTANCING_CBUFFER_SCOPE_BEGIN(UnityInstancing_##buf) struct {
#define UNITY_INSTANCING_BUFFER_END(arr) } arr##Array[UNITY_INSTANCED_ARRAY_SIZE]; UNITY_INSTANCING_CBUFFER_SCOPE_END
#define UNITY_DEFINE_INSTANCED_PROP(type, var) type var;
如果要把vertex shader中使用的instanceId传递到fragment shader,可以使用unity提供的UNITY_TRANSFER_INSTANCE_ID
:
InterpolatorsVertex MyVertexProgram (VertexData v) {
InterpolatorsVertex i;
UNITY_INITIALIZE_OUTPUT(Interpolators, i);
UNITY_SETUP_INSTANCE_ID(v);
UNITY_TRANSFER_INSTANCE_ID(v, i);
…
}
这个宏定义很简单:
#define UNITY_TRANSFER_INSTANCE_ID(input, output) output.instanceID = UNITY_GET_INSTANCE_ID(input)
那么最终要如何正确读取这个cbuffer的属性呢?这里Unity也提供了配套的宏:
float3 GetAlbedo (Interpolators i) {
float3 albedo =
tex2D(_MainTex, i.uv.xy).rgb * UNITY_ACCESS_INSTANCED_PROP(_Color_arr, _Color).rgb;
...
}
这个宏定义也很简单,就是从之前定义的struct数组中,根据instanceId进行索引,再取出对应的变量:
#define UNITY_ACCESS_INSTANCED_PROP(arr, var) arr##Array[unity_InstanceID].var
经过修改之后,再次运行,可以发现batch降低了,instancing生效了: