Data Structures

The following materials are derived from the Data Structures lecture. The video lecture is included at the bottom of the document.

In programming we have the concept of reusable data structures which can be used to build applications
These can be used in order to build larger systems (e.g. Inventory Systems, AI Navigation etc)
Most programming languages have these built in
Before writing any system you should always examine these data structures and pick the appropriate one for your Use Case

Big ‘O’ Notation

“Measuring programming progress by lines of code is like measuring aircraft building progress by weight.” Bill Gates

This famous statement highlights How we want to define our code is based on two simple premises: Simplicity and Purpose.

What is Big ‘O’?

The efficiency of an algorithm can be gauged by how long it takes
This is known as Time Complexity
Big O Notation is used to describe this

One of the ways to make our code efficient is by optimising the search for data within a data set. Let’s consider how we search a data set of 100 items for a single data point. For example in a data set of 100 items, you are searching for an ‘0’ amongst ‘X’s.

Fig. 1 - 100 data points in a data set

The best case for finding the data is 1 iteration
The worst case is a 100 iterations

Big ‘O’ is always based on worse case scenario.

So 100 records would take 100 iterations
Therefore N records would take N iterations.
This can be written like this:

\[O(N)\]

Performance is represented by the O and the N represents execution time or space used - Hence the term Big ‘O’.

Different Notation in Big ‘O’

The previous example is known as:

Linear Notation - \(O(N)\)

Pseudo Code

FUNCTION  linearSearch(list, value)
	FOR EACH element IN list
		IF (element == value)
			RETURN true
		END IF
	NEXT
		RETURN false
END FUNCTION

Graph Fig. 2 - Linear Notation Graph

Constant Notation - \(O(1)\)

Pseudo Code

FUNCTION getFirstElement(list)
	RETURN list[0]
END FUNCTION

Graph Fig. 3 - Constant Notation Graph

The constant notation describes an algorithm that will always execute in the same execution time regardless of the size of the data set. For instance, an algorithm to retrieve the first value of a data set, will always be completed in one step, regardless of the number of values in the data set.

Hashing Tables

A hashing algorithm is an \(O(1)\) algorithm that can be used to very effectively locate/search a value/key when the data is stored using a hash table.

Fig. 4 - Hashing Tables

Source: https://it.wikipedia.org/wiki/Hash_table

A hash table locates the actual data at an address called an index that is always a whole number and therefore is always associated with that record.

A bit like a book in a library its location is always known because of it’s numbering system. It resides at a fixed location no matter if the book is replaced with a newer copy. No sorting is required.

Polynomial Notation - \(O(N^2)\)

Pseudo Code

PROCEDURE emptyChessboardGrid()
	FOR row FROM 0 to 7
		FOR col FROM 0 to 7
			grid[row][col] = 0
		NEXT col
	NEXT row
END PROCEDURE

Graph Fig. 5 - Polynomial Notation Graph

\(O(N2)\) represents an algorithm whose performance is directly proportional to the square of the size of the data set. Algorithms which are based on nested loops are more likely to have a quadratic \(O(N2)\) and so are 2 dimensional arrays like the one above. Where we are plotting the grid of a chessboard.

Exponential Notation - \(O(2^N)\)

Pseudo Code

FUNCTION  fibonacci(number)
	IF  (number <= 1)
		RETURN  number
	END  IF
	RETURN fibonacci(number - 2) + fibonacci(number -  1)
END FUNCTION

Graph Fig. 6 - Exponential Notation Graph

The exponential notation O(2N) describes an algorithm whose growth doubles with each addition to the data set. An example of an \(O(2^N)\) function is the recursive calculation of Fibonacci numbers:

Fibonacci Numbers Fig. 7 - Fibonacci Number Set

Logarithmic Notation - \(O(log(N))\)

Pseudo Code

? // Difficult to define in code. See explanation below.

Logarithmic notation is difficult to explain Graph Fig. 8 - Logarithmic Notation Graph

Logarithmic notation works on the basis of elimination. Getting rid of the things we know we are not looking for, that way we can arrive at the item we are searching for much quicker.

In a data set of 9 items, you are searching for a 6

Fig. 9 - Logarithmic Sample Set

Linear notation would take 9 iterations

First sort the data.
In this case position 4 (fifth item in the set) which has the value: 3

Logarithmic eliminates data that we don’t need by choosing the mid point in the data.

We then check to see if the value searched is higher than the midpoint.
If true we remove all values below the midpoint.
If false we remove all values above the midpoint.
Continue finding the midpoint and repeat until the value is found
In the example the search is completed in 3 iterations using logarithmic notation

Logarithmic Sample Set Fig. 10 - Logarithmic Sort and Search

We can better understand the process using a flow chart to explain the logic loop.

LogarithmicFlow Chart Fig. 11 - Flow of a Logarithmic Sort

In computer games and other realtime interactive systems there is high frequency tracking of moving objects and statistics on those objects. This requires that we add, update, search, sort and delete data in the scene in real time. As we have previously seen there is a cost in time and memory of all this collection processing.

Big Sort

Fig. 12 - A lot of sorting is requred in realtime applications

Data structures form the container for performing and managing these operations.

Types of Data Structure

We are going to look at Linear Data structures - a data structure is said to be linear if its elements combine to form any specific order.

Data Structure Types

Dynamic Arrays
Linked Lists
Queue
Stack
Associative Arrays
Operations on Collection

These are typical to C# but many are also found or have equivalents in other languages.

We are primarily using examples in Unity game engine using C#. We will look at specific problems and how a specifc class of data structure can provide a solution through explanation and example.

1. Dynamic Arrays

The Problem

Arrays in C# are fixed in size. During development you need to know exactly how many item are going to be in the array. If you need to add elements and you don’t have enough space, you will need to carry out the following:

Create a new array of the appropriate size
Copy elements from the old array into this new one
Destroy the old array
Add in the new element

The above process can be quite costly.

The Solution - Dynamic Arrays (Lists)

Luckily in most programming languages we have a Data Structure which grows in size when we require it. In C# we have the List class. These classes have the same properties as an array

Items are located contiguously in memory
We can randomly access elements using an index
We can iterate through each element

You should consider using a Dynamic Array over a normal array. One caveat - Dynamic Arrays are slightly more expensive!

Game Development Use Cases

Manage enemies as they are spawned into the scene
Keep track of players as they are added into the game
Manage inventory systems

Fig. 13 - Manage Spawning Enemies

C# List Example

List<int> scores = new List<int>();
scores.Add(100);
scores.Add(200);
foreach(int  score  in  scores)	{
	Debug.Log("Score is "+score.ToString());
}
int player1Score = scores[0];
scores.Remove(100);

Addtional Notes on Lists

Try to avoid insertion/deletion in the middle of the collection
Searching the collection is linear and will increase as more elements are added \(O(n)\)
Insertion/deleting at the end of the collection is constant in performance \(O(1)\)

Notes on Generic Types

Generic Programming is where you write one piece of code which operates on many different types. This uses a concept called Templates which act in proxy for the type. The Compiler then generates the code which uses the actual type

Applying Generic Types to List

In the previous section you would have noticed the following:

List<int>
These are known as generic parameters and you should insert the data type that the collection will handle (including your own data types aka classes and structs)
Example: TypeName<T> where T is a type parameter

You can write your own generic classes and functions but this is beyond the scope of this class. Here are some C# examples - http://www.tutorialsteacher. com/csharp/csharp-generics

Word of warning, it is often difficult to write generic code. If you have errors they are often difficult to isolate as the compiler messages are so cryptic

2. Linked Lists

The Problem

You have started using a dynamic array and you have noticed performance is poor on adding/removing. You then realise that you are adding/removing elements from the middle of the collection. You also realise that you don’t require random access to elements in the collection.

The Solution

In this case a Linked List would be a better choice. In C# we have the LinkedList class. Linked lists contain elements (called Nodes) which usually have a reference to the previous and next Node in the list. This means that there is a slight increase in memory needed when working with lists.

Use Case

If an AI character has to visit a series of waypoints, these could be stored in a list
Your Player has a number of quests they can try and complete
If the AI/Player carries an action and a number of systems need to be notified of the event

Fig. 14 - Visiting Waypoints

C# Linked List Example

LinkedList<Transform> waypoints=new  LinkedList<Transform>();

waypoints.AddLast(GameObject.Find("Waypoint1").Transform);
waypoints.AddLast(GameObject.Find("Waypoint2").Transform);
waypoints.AddLast(GameObject.Find(“Waypoint3”).Transform);

foreach(Transform  t  in  waypoints) {
	Debug.Log("Waypoint Locations “ + t.position.ToString());
}

waypoints.AddFirst(GameObject.Find("Waypoint0").Transform);
LinkedListNode<Transform> waypoint2Node = linked.Find(GameObject.Find("Waypoint2"));
waypoints.AddAfter(waypoint2Node,GameObject.Find("SpecialQuest"));

LinkedList give you control over where new items are placed in the list. For instance using methods like AddLast, AddFirst, AddBefore and AddAfter. You have a wide variety fo methods to control the placement of items.

Addtional Notes on Linked Lists

Linked Lists usually support constant time insertions and deletions in the collection \(O(1)\)
Also perform better than dynamic arrays for moving elements around the collection
This feature means that Linked Lists are a good data structure if you need to sort your data
Main drawback of Linked Lists is that you can’t have direct access to elements in the list, it takes linear time \(O(n)\) to access

3. Queue

The problem

If you need to visit items in a certain order (e.g front to back). Examples of this could be waypoints or commands to an AI character.

The Solution - Queue

In C# we have the Queue class
This is First-In-First-Out (FIFO) data structure
You add elements to the end of the queue and you remove elements from the start

Use Case

An Real Time Strategy game where you can add orders to a unit, these are then carried out in sequence
An RTS where you have a base which produces units
A spawning system, where you have to defeat enemies in a specific order

Fig. 15 - Orders to a Unit

C# Queue Example

Queue<GameObject> unitsToBuild = new  Queue<GameObject>();

unitsToBuild.Enqeue(soliderPrefab);
unitsToBuild.Enqeue(builderPrefab);
unitsToBuild.Enqeue(tankPrefab);

foreach(GameObject  i  in  unitsToBuild)
{
	Debug.Log("Units to build “ + i.name);
} 
GameObject  nextUnitToBuild = unitsToBuild.Peek();
unitsToBuild.Dequeue();

Enqueue() adds an object to the queue. Dequeue() is a method to remove from the beginning of the Queue. The trivial drawback of this system is writing variations on the word Queue with it’s proliferation of U’s and E’s. Typos are always a pain!

4. Stack

The Problem

If you need to manage the state of an AI character
If you need to implement a Undo system

The Solution - Stack

A Stack would be a good choice
In C# we have the Stack class
This is Last-In-First-Out data structure
You add elements to the top of the stack and you remove elements from the top

C# Stack Example

Stack<Command> issuedCommands=new  Stack<Command>();
issuedCommands.Push(new  Command(“Edit"));
issuedCommands.Push(new  Command("Create"));
issuedCommands.Push(new  Command(“Update”));  
Debug.Log(“First Command Added” + issuedCommands.Peek());

Stack has 3 principle methods

Push: add an object on the stack.
Pop: remove and return the first object on the stack.
Peek: get the first object without removing it.

5. Associative Arrays

The Problem

If you need to store one unique copy of an element
You want to access an element via a key
You are doing lots of searches for an element

The Solution

You should use an Associative array
In C# we have the Dictionary class
These data structures are structured as key-value pair eg. - Name & Age
It allows you to retrieve the items via the key
This makes it a good choice for looking up large data sets

Use Case

If you are creating a resource management system for handling textures, models or other assets
Localisation system, each language is stored in an Associative Array
Unit Manager, a class to manage units created in the game
Save Game System

Fig. 16 - Save Game System

C# Dictionary Example

Dictionary<string, int> highScoreTable = new  Dictionary<string, int>();
highScores.Add(“Laura",200);
highScores.Add(“Olu",2000);
highScores[Laura]= 4000;

foreach(KeyValuePair<string, int> pair  in highScoreTable)
{
	Debug.Log("High Score "+pair.Key+" “+pair.Value);
}

if (highScores.ContainsKey("Laura")) {
	int  score = highScores[“Laura”];
}
highScores.Remove(“Laura”);

Additional Notes

Iterating over a map has a slightly annoying syntax
Associative Arrays tend to have good performance for retrieval \(O(log(N))\)
If you add an item and its key already exists it may overwrite the value

6. Operations on Collections

Sorting

Sorting is where we order the items in a collection in a specific order
There are a whole bunch of sorting algorithms including; Insertion sort, Heap sort, Quick sort (please read about these!)
In C#, the best sorting algorithm will be picked depending on the size of the collection
Most of the common data types don’t need additional work
For custom classes, we have to write our own sorting algorithm

Sorting C#

There are few ways to sort a collection
1. Provide a custom delegate function for the sort
2. Provide a custom class which inherits from IComparer
3. Your own class has to inherit from IComparable
Often you will use option 3 as the default sort
Which can then be overridden by option 1

C# Example 1 - Sorting with Delegate

struct  Character {
	string  name;
	int  health;
	int  strength;
}
//Adding omitted!
List<Character> characters = new  List<Character>();

//Sort by health
characters.Sort(delegate (Character  c1, Character  c2) {
	return (c1.health.CompareTo(c2.health));
});

Note for those unfamiliar with structs. Structs can be used to hold small data values that do not require inheritance, e.g. coordinate points, key-value pairs. Here you can see that the delegate sorts the list of struct Characters by their health using CompareTo.

The following article helps to explains in more detail how to utilise delegates to sort lists: https://www.techiedelight.com/sort-list-of-objects-csharp

C# Example 2 - Sorting with IComparable

struct  Character:IComparable<Character>
{
	string  name;
	int  health;
	int  strength;
	// sort by name
	public  int  CompareTo(Character  compareCharacter)
	{
		return  name.CompareTo(compareCharacter.name);
	}
}
//Adding omitted!
List<Character> characters=new  List<Character>();

//Sort will use the CompareTo in the struct or class
characters.Sort()

Another approach is to use the interface of IComparable which imposes a natural order on things. In IComparable the CompareTo method is overridden

as in the method which compares the current object with the specified object, as we saw in the previous approach. In IComparable the value now returned by the compareTo() method decides the relative order of the objects in the sorted list. A negative, zero and a positive value represents if the compared object is less than, equal to or more than the specified object. More on that shortly.

In the following code, the Character class implements the IComparable interface and overrides its CompareTo() method. The List of Characters objects is then sorted using the no-arg Sort method.

Points to note in C#

The CompareTo function returns an int which can be the following

Less than zero: The instance precedes the one passed in
Zero: The objects are in the same order
Greater than zero: The instance follows the one passed in
- 1: Swap
- 0: Keep
- -1: Don’t swap

Another way to look at it is like a deck of cards that we shuffling into order based on higher lower principle.

ICompare vs IComparable

The principle reason for choosing the different options is based on access. If you don’t have access to modify a specific class then implementing through the ICompare interface is recommended.

Find out more: https://dev.to/digionix/icomparable-vs-icomparer-274f

Conclusion

I have outlined the mathematical theory of big O notation which is used to classify algorithms according to how their run time or space requirements grow as the input size grows. I have also outlined a range of example data structures with use cases and examples that you can deploy in your own code and projects.

Here is some addtitional reference material to support this lecture:

Lists and Dictionaries in Unity https://unity3d.com/learn/tutorials/modules/intermediate/scripting/lists-and-dictionaries

Understanding Big ‘O’ Notation https://www.101computing.net/big-o-notation http://bigocheatsheet.com/

Visualise different sorting methods in C# https://www.codeproject.com/Articles/132757/Visualization-and-Comparison-of-sorting-algorithms

Lecture 7

Data Structures

Big ‘O’ Notation

What is Big ‘O’?

Different Notation in Big ‘O’

Linear Notation - \(O(N)\)

Constant Notation - \(O(1)\)

Hashing Tables

Polynomial Notation - \(O(N^2)\)

Exponential Notation - \(O(2^N)\)

Logarithmic Notation - \(O(log(N))\)

Types of Data Structure

Data Structure Types

1. Dynamic Arrays

The Problem

The Solution - Dynamic Arrays (Lists)

Game Development Use Cases

C# List Example

Addtional Notes on Lists

Notes on Generic Types

2. Linked Lists

The Problem

The Solution

Use Case

C# Linked List Example

Addtional Notes on Linked Lists

3. Queue

The problem

The Solution - Queue

Use Case

C# Queue Example

4. Stack

The Problem

The Solution - Stack

C# Stack Example

5. Associative Arrays

The Problem

The Solution

Use Case

C# Dictionary Example

Additional Notes

6. Operations on Collections

Sorting

Sorting C#

C# Example 1 - Sorting with Delegate

C# Example 2 - Sorting with IComparable

Points to note in C#

ICompare vs IComparable

Conclusion

Video Lecture

Part 1

Part 2