Optimisation & Performance

The following materials are derived from the Optimisation & Performance lecture. The video lecture is included at the bottom of the page.

Explain the role of optimisation in the development of software
Identify the role of hardware in performance.
Assess various techniques to optimise your code
Apply these methods using specific tools.

Introduction

One of the important aspect of programming is optimising for performance. We need to understand the hardware our products will be deployed onto, We need to understand the programming languages we use. We also need to understand the frameworks, platforms or game engines we develop for. And finally we need to understand the tools we can use to tune performance.

Memory

The most fundamental factor in understanding how we can fine tune our projects is to identify the role played by memory.

Memory in most modern programming languages is allocated in two spaces

Dynamic memory (allocated with new) is allocated on the Heap and will grow in size
Stack memory (everything that doesn’t use new) is allocated on the Stack and is fixed size

Let’s further clarify the differences between the two types:

STACK	HEAP
The memory is allocated at the compile time.	The memory is allocated at the runtime.
In static memory allocation, while executing a program, the memory cannot be changed.	In dynamic memory allocation, while executing a program, the memory can be changed.
Static memory allocation is preferred in an array.	Dynamic memory allocation is preferred in the linked list.

Stack vs Heap

fig.1 - Visualising Address Space

This is a visualisation of the 2 types of memory, However you should be aware that his doesn’t reflect any real space. Stack and Heap are memory abstractions, there is no physical difference between them. However in order to address the allocation of memory - address space was created. Each type of memory has its own address and is divided into different segments, hence the diagram we are going to look at.

Stack deals with the removal and addition of objects from the top this is why it is referred to as a stack and this is why we have put it at the bottom of the diagram here. You will remember the principle of LIFO (last in first out) from our discussion of data structures.
Heap memory on the other hand is dynamic memory and it changes as the program runs. Its only limitation is the available free space.
Heap is slower than stack because it has to use a pointer which is stored on the stack to locate the stored object in heap.
This is known as a reference type.
An object in stack holds it’s own reference type and is known as a value type.
Finally we have the code that is not making use of memory at runtime.

STACK - Impacts on Programming

When you allocated values types (int, float, bool, short, char etc), these allocated on the stack
Values allocated on the stack are local, these are deallocated when they drop out of scope
Values passed into functions are copied onto the stack
The stack is of fixed size – 1MB for C#

HEAP - Code Example

void  Update() {
	int  x = 10;
	int  y = 10;
	Vector2  pos = Vector2(x, y);
} //<-- x, y and pos drop out of scope here

HEAP - Impacts on Programming

Heap memory is allocated dynamically
Any type allocated using the new keyword are allocated on the heap
We as programmers have responsibility for allocating on the heap
But … in C# the Heap Memory is managed by the **Garbage Collector
**– In C++ we have to allocate and deallocate on the Heap!

HEAP - Code Example

public  class  MonsterStats {
	private  int  health;
	private  int  strength;
	public  MonsterStats ( ) {
		health = 100;
		strength = 10; 
	}
	public  void  ChangeHealth (int  h) {
		health += h ;
	}//<- h drops out of scope here
	void  ChangeStrength(int  s ) {
		strength += s;
	}//<- s drops out of scope here
}

void Start( ) {
	//Create an instance of the class on the Heap
	MonsterStats  new  stats = MonsterStats();
	stats.ChangeHealth(10);
	stats.ChangeStrength(-2);
}

Data Types and Memory in C#

Values types such as int, float, etc are allocated on the Stack
Struct’s are custom values types so are allocated on the Stack (except on a few cases)
Reference Types are allocated on the Heap and include class, interface and delegate types

Strings

Strings act and look like value types but are actually reference types

This means we need to be careful in allocating new strings
And each time we create a new string using concatenation (+)
If we are creating lots of new strings we should use the StringBuilder class

String Builder - Code Example

//We need to use the namespace - System.Text
using  namespace  System.Text
//Create the string builder with a capacity of - 1024 and max capacity of 1024
StringBuilder  sb = new  StringBuilder(1024,1024);
//Append some text
sb.Append("Name: ");
sb.Append("Meera");
sb.Append("Health: ");
sb.Append(100);
//Get the String from the String Builder
string  s = sb.ToString();

Memory Management

Garbage Collection

C# uses garbage collection to clean up deallocated objects that have been allocated on the heap

This is an automatic process and has been tuned for maximum performance

However you should understand how this process works and create code which ensures that garbage collection only runs when needed

How Garbage Collection Works

When garbage collection is triggered the garbage collector deems every object in the graphs as garbage. The garbage collector then recursively traverses each graph looking for reachable objects. Every time the garbage collector visits an object, it tags it as reachable. Because the graphs represent the relationships between clients and objects, when the garbage collector is done traversing the graphs, it knows which objects were reachable and which were not. Reachable objects should be kept alive. Unreachable objects are considered deallocated and therefore garbage, and therefore destroying them does no harm.

fig.2 - Visualising Garbage Collection

Next, the garbage collector scans the managed heap and disposes of the unreachable objects by compacting the heap and overwriting the unreachable objects with reachable one. The garbage collector moves reachable objects down the heap, writing over the garbage, and thus frees more space at the end for new object allocations. All unreachable objects are purged from the graphs.

Caching

BAD Code Example

void  Update() {
	//Get Health Component and check health
	Health health = GetComponent<Health>();
	if (health.IsDead()) {
		//Do Something
	}
}

The above code allocates on the heap and gets deallocated every update, causing not only unnecessary allocation but deallocation via the Garbage Collector.

If our code repeatedly calls expensive functions that return a result and then discards those results, this may be an opportunity for optimization. Storing and reusing references to these results can be more efficient. This technique is known as caching.

If you call functions which allocate memory on the heap: Find() GetComponent() Object.FindObjectOfType

Note: Consider moving these out of Update functions and retrieve in the Start function.

GOOD Code Example

private  Health  health;  
void Start() {
	health = GetComponent<Health>();
}
void  Update() {
	if (health.IsDead()) {
		//Do Something
	}
}

Garbage Collection Tips

Allocation

Don’t allocate on the heap in Update functions (use caching)
Also consider calling function on a timer if you need to allocate frequently, this will reduce the amount of allocations in update

Update vs Timer - Call only when needed

void  Update()
{
	ExampleExpensiveFunction();	
}

private  int  interval = 3;
void  Update()
{
	if (Time.frameCount % interval == 0)
	{
		ExampleExpensiveFunction();
	}
}

In the first example this might be a memory intensive function we are calling but we don’t need to call it every frame so we can use modulo % to call it every 3 seconds this makes performance better.

Reuse Collections

Don’t initialise collections using the new keyword in the Update function

• Initialise on the Start function and call the Clear function of the collection if you need to fill with new data

• This all holds true for some Unity functions that return arrays such as FindGameObjectsWithTag

Unity Performance Tips

Limit the use of Loops

void  Update()
{
	for (int i = 0; i < myArray.Length; i++)
	{
		if (exampleBool)
		{
			ExampleFunction(myArray[i]);
		}
	}
}

In this example a for loop is being executed unnecessarily while a boolean remains false.

void  Update()
{
	if (exampleBool)
	{
		for (int i = 0; i < myArray.Length; i++)
		{
			ExampleFunction(myArray[i]);
		}
	}
}

If we switch the for loop with the if statement we achieve much more efficient code.

Object Pooling

Object pooling is a technique where, instead of creating and destroying instances of an object, objects are temporarily deactivated and then recycled and reactivated as needed. Although well known as a technique for managing memory usage, object pooling can also be useful as a technique for reducing excessive CPU usage.

Object Pooling
Fig. 3 - Keep many objects in the pool and respawn

The object pool p attern is a software creational design pattern that uses a set of initialised objects kept ready to use – a “pool” – rather than allocating and destroying them on demand. In this example you can imitate the effect of infinite objects by moving object that are out of the camera a view into the pool and simply re spawning them in a new location to fall back into the scene

Culling

Unity contains code that checks whether objects are within the frustum of a camera. If they are not within the frustum of a camera, code related to rendering these objects does not run. The term for this is frustum culling.

We can take a similar approach to the code in our scripts.

void  Update()
{
	UpdateTransformPosition();
	UpdateAnimations();
}

In the following simplified example code, we have an example of a patrolling enemy. Every time Update() is called, the script controlling this enemy calls two example functions: one related to moving the enemy, one related to its visual state.

private  Renderer  myRenderer;
void  Start()
{
	myRenderer = GetComponent<Renderer>();
}
void  Update()
{
	UpdateTransformPosition();
	if (myRenderer.isVisible)
	{
		UpateAnimations();
	}
}

In the following code, we now check whether the enemy’s renderer is within the frustum of any camera. The code related to the enemy’s visual state runs only if the enemy is visible.

Level of Detail (LOD)

enter image description here Fig. 4 - LOD system for graphics and models in Unity

Level of detail, also known as LOD, is another common rendering optimisation technique. Objects nearest to the player are rendered at full fidelity using detailed meshes and textures. Distant objects use less detailed meshes and textures. A similar approach can be used with our code. For example, CLICK we may have an enemy with an AI script that determines its behaviour. Part of this behaviour may involve costly operations for determining what it can see and hear, and how it should react to this input. We could use a level of detail system to enable and disable these expensive operations based on the enemy’s distance from the player. In a Scene with many of these enemies, we could make a considerable performance saving if only the nearest enemies are performing the most expensive operations.

Unity’s CullingGroup API allows us to hook into Unity’s LOD system to optimise our code. CullingGroup group = new CullingGroup();

Profiler

Unity Profiler Unity has it’s own Profiler tool you can use to get performance information about your application. You can connect it to devices on your network or devices connected to your machine to test how your application runs on your intended release platform. You can also run it in the Editor to get an overview of resource allocation with specific reference to the CPU, memory, renderer, and audio while you’re developing your application.

I won’t go through it here but it is recommended you explore it during the process of developing your projects. You can find more information about using it here.

Test Project with Profiler

I have also included an example test project repo that demonstrates the use of profiler: https://github.com/Falmouth-Games-Academy/COMP140-Profiler-Test

Optimisation in Arduino

Size of data types in Arduino

Type	Size (bits)	Values
bool	8	1 or 0
char	8	-128 to 127
unsigned char, byte	8	0 to 255
short, int	16	-327768 to 32767
unsigned int, word	16	0 to 65535
long	32	-2147483648 to 2147483648
unsigned long	32	0 to 4294967295
float, double	32	1.175e-38 to 3.402e38

Even though Arduino is a simpler platform and we don’t have to manage visual object management, there are still some savings we can make. Managing our data types is really important. Only use the data type needed to satisfy the memory for the content of the variable.

Remember the same relationship of data types to heap and stack apply in Arduino as well.

Use Functions

void  setup()
{
	pinMode(LED_BUILTIN, OUTPUT);
}
void  loop()
{
	dot(); dot(); dot();
	dash(); dash(); dash();
	dot(); dot(); dot();
	delay(3000);
}
void  dot()
{
	digitalWrite(LED_BUILTIN, HIGH);
	delay(250);
	digitalWrite(LED_BUILTIN, LOW);
	delay(250);
}
void  dash()
{
	digitalWrite(LED_BUILTIN, HIGH);
	delay(1000);
	digitalWrite(LED_BUILTIN, LOW);
}

This may seem obvious but try to implement your digitalWrites into functions. Using functions in this way saves 200 bytes of memory. This is mainly because a single instance of the function is created in memory. And when the function is called again, the CPU loads the code again from its location in memory, without having to re-create the variables.

Managing Strings in Arduino

Serial.print("Optimizing Code");

As with Unity we can manage the deployment of strings more effectively. Printing a lot of strings on the serial monitor or to an LCD display consumes a lot of RAM. To save the precious RAM, such strings can be saved on the Flash memory instead. To achieve this, the Arduino employs the F() macro.

Serial.print(F("Optimizing Code"));

This simple, yet powerful solution forces the compiler to put the enclosed string in PROGMEM.

Including such a macro on these two words Optimizing Code , can save as much as 16 bytes.

PROGMEM

const int16_t chars[] = {200, 101, 521, 24, 892, 3012, 100};

In general, the Arduino stores variables in SRAM. The size of these variables can change during program execution. To avoid running out of RAM, we need to control the data that goes into this memory block. To achieve this, we use the PROGMEM keyword to store the data in program memory instead of RAM. It is particularly useful for data that never changes, such as constants. The drawback, however, is that this is slower; but the bigger picture is that we save RAM. Here is an example of PROGMEM implementation.

const int16_t chars[] PROGMEM = {200, 101, 521, 24, 892, 3012, 100};
void  ReadData() {
	unsigned int displayInt;
	for (byte k = 0; k < (sizeof(chars) / sizeof(chars[0])); k++) {
		displayInt = pgm_read_word_near(chars + k);
		Serial.println(displayInt);
	}
	Serial.println();
}

Conclusion

The role of memory and data allocation based on stack and heap
Using string managers can help with memory saving in both Arduino and Unity - C++ and C#
How ordering the execution of your code and using caching can ensure memory improvements.
How Garbage Collection works in Unity
How using culling. Object pooling and LOD can make the visual elements of a game run more efficiently
I have also highlighted how Unity’s profiler can be used to manage and optimise your applications

Lecture 10

Optimisation & Performance

Introduction

Memory

Stack vs Heap

STACK - Impacts on Programming

HEAP - Code Example

HEAP - Impacts on Programming

HEAP - Code Example

Data Types and Memory in C#

Strings

String Builder - Code Example

Memory Management

Garbage Collection

How Garbage Collection Works

Caching

BAD Code Example

GOOD Code Example

Garbage Collection Tips

Allocation

Update vs Timer - Call only when needed

Reuse Collections

Unity Performance Tips

Limit the use of Loops

Object Pooling

Culling

Level of Detail (LOD)

Profiler

Test Project with Profiler

Optimisation in Arduino

Use Functions

Managing Strings in Arduino

PROGMEM

Conclusion

Further Reading

Video Lecture