whoimi

A geek blog

View on GitHub

UE4 GPU profile

This post describe Pipeline and Render Bottlenecks of UE4.

Frame

To display a frame both calculations on the GPU and CPU need to be finished.

After, All the game code finished and all the pixel shaded, we can dispatch a frame to the screen.

The time of cost is the bigger number between GPU and CPU.

GPU cost

On the GPU, we have parallelism and a pipeline.

Parallel - many core work at same time. Core are best utilized when working on same task.

Pipeline- frame divided into steps. Pixel Shader can’t continue before it’s fed with data from Vertex Shader.

Vertex Shader -> Tesselation -> Geometry -> Pixel Shader.

Draw call

CPU sends commands - “draw calls “ to control the GPU.

more mesh. more materials == More draw calls.

If you want to draw a triangle set with a different material, you have to dispatch a GPU command from CPU to the GPU first, and goes through the driver , and then is translated, and only the transmitted to the GPU.

Pixel- bound sources of trouble

Resolution of screen.

Pixel most slowest part of the pipeline.

Depend resolution of screen.

Heavy shader, post processing, translucency

Quad overDraw

Quad means a block of 4 pixel(2*2)

Many operations on the GPU are done on full quads or even bigger tiles-not single pixels

Small polygon waste GPU time. More small or more thin then triangle the bigger the problem.

Triangle cunt is rarly a problem.

Avoiding small and thin triangle : Use Lods. Keep polygons bigger and even.

Triangle Count

explode after tessllation.

shadow casting need copy a mesh for drawing shadow mask.

Hard edges ,UV splits,Smooth group

add vertex count.

Texture Sample

Too many texture sampler use up the bandwidth.

Texture packing and Compression.

Texture Cache

fetch data from nearby cache. Need UV continue.

UV noise is bad for cache.

Streaming cache

Good for memory. It’s not related to the GPU cache.