Infinite Grids: Expandable Arrays

Grids are very common when it comes to computers. Take, for example, your computer screen. Did you know your screen is divided into tiny little squares called pixels? A pixel can only be a single color at a time, but they are so tiny that when put together, you can see things like curves and letters. Your screen, therefore, is just a giant grid of colors to a computer.

We can represent a grid using an array. Let's say we wanted to simulate a 4 x 3 grid of colors. On a computer, a color is just an integer value, and in a 4 x 3 grid, there are exactly 12 cells. Therefore, to represent this grid, we need an array of 12 integers. In Java, you can easily make an array of 12 ints with the following code:

int[] grid = new int[12];

**Extra Bits**

You might be wondering how we can use a one-dimensional structure (the array) to represent a two-dimensional structure (the grid). It's actually rather simple. In our 4 x 3 grid above, we have an array of 12 cells. The first set of three cells represents the first row, the second set of three cells represents the second row, and so forth.

Any finite grid can be represented with an array, or perhaps an array of arrays for really big grids. But what if you wanted an **infinite** grid?

So, the fundamental problem is that computers do not have infinite memory. For an infinite grid, there would be infinite cells, and this means we need an infinitely large array. But of course, that's infinitely impossible to have in a non-infinite world.

Therefore, we need to somehow **simulate** an infinite grid using only finite structures and a few rules. Let's take a look at three possibilities and weigh their pros and cons. For all of these examples, we are going to assume we want a grid of colors where by default, cells are black unless changed to be otherwise.

# The Expanding Array

One strategy is to simply use a finite grid and just make it bigger every time it needs to become bigger. For example, let's say that at the moment, every non-default cell resides in a rectangle formed by the points (0, 0) and (2, 1). To represent this, you only need a 6-cell array since this rectangle only has 6 cells. But what if the cell (2, 2) is accessed? At this point, we would need to expand the array to 9 cells in order to include it.

In the diagram below, we illustrate the above scenario. The grid on the left is the original array containing six cells. Three of the cells actually have a color assigned to them. If we want to assign a color to a cell outside of the current range, then the array needs to expand its size to 9 cells.

DIAGRAM

This looks pretty good. But to see just how good it is, we want to analyze how easy or hard it is to perform some basic operations, namely:

- Can we
**read**the color of a cell? Are we able to find out what a cell's color is? - Can we
**write**a value to a cell? Can we assign a color to a cell?

This is where we need to put on our computer science hats!

Reading a cell's color is very easy. Given a coordinate (x, y), you only need to do two steps. First, find out if the (x, y) pair is in the current array or not. If it isn't, then it must automatically be black (or whatever the default value is). If it is, then we simply need to convert the (x, y) pair into an index for the array. Since accessing an element in an array is effectively instant on a computer, reading a cell takes very few steps. In Big-O notation, reading is a $O(1)$ operation, the best it can possibly be. This is great!

What about writing? Unfortunately, assigning a color to a cell is not as straightforward. Again, consider an (x, y) pair. If the coordinate is already in the array, then writing is easy. For example, consider the diagram above. Let's say we wanted to change the blue cell's color to yellow. Since this cell is already in the array, we can convert the (x, y) pair into an array index and directly access the cell without any problems. This is, like reading, an $O(1)$ operation, meaning it's very fast.

However, what if the cell **doesn't** exist in the array yet? What if we need to expand the array, as in the diagram above? This is where things become slow. On a computer, we can't just make arrays bigger. To expand an array, we actually need to **create an entirely new array** which is bigger than the first, and then we copy the old values into the new array. Once the new array has been properly made and cloned from the old one, we can then assign our new cell its color.

Think about the diagram above again. This means that to assign the yellow square to coordinate (2, 2), we need to copy the old grid into the new grid. Since the old grid has 6 cells, it takes 6 steps to perform the copy. If the old grid had had a million cells, it would have taken a million steps to perform the clone. In other words, assigning a color to a new cell requires steps equivalent to the number of cells in the old array. In Big-O notation, we say this requires $O(N)$ operations, where $N$ is the number of cells in the old array.

So in summary, this means assigning a color can be $O(1)$ at best, but $O(N)$ at worst. This is not very desirable. Can we do better?

**Extra Bits**

Actually, the expanding array has other problems. Think about the following scenario: Cell (0, 0) is red, and cell (1000000, 1000000) is blue. Every other cell is black. There are only two colored cells, but in order to represent them in an array, we need at least a 1000000 x 1000000 grid, or basically a **terabyte** of data (a trillion cells). All just to represent two colored cells! In other words, the expanding array is very **memory-inefficient**.

# The Linked List

Another strategy is to use something called a linked list. If you are not sure what a linked list is in programming, you can think of it just like a chain. For more detail, I recommend watching this video about linked lists.

Here, instead of having an array represent the grid, we are going to use **only** a list of cells that are not the default color (black). In the list, each cell will be labeled with its color and its coordinate. Furthermore, the order of the list does not matter. In the diagram below, we have the same grids as in the previous section. However, instead of using an array, we use a linked list to represent which cells have a color.

In the grid on the left, there are only three cells with non-black color. Therefore, the linked list only has three cells in it. In order to know which coordinates those cells have in the grid, they need to be labeled as well. In the grid on the right, we decide to add another color. This time, rather than making a new array that has three additional cells, we only needed to add a single cell to the list and label it with the coordinate.

DIAGRAM

Immediately, we see one advantage over the expanding array. The linked list method does not waste any space! Note that in the previous array strategy, we needed slots for cells that weren't being used. In a linked list, though, we only need to memorize exactly the number colored cells, so we aren't wasting any of the computer's finite memory on unused cells. You can see this difference by analyzing the size of the array in the first diagram (9 cells) with the size of the linked list here (4 cells).

But what about our two operations? Recall that we need to analyze the following two operations in order to gauge just how good the linked list strategy is:

- Can we
**read**the color of a cell? Are we able to find out what a cell's color is? - Can we
**write**a value to a cell? Can we assign a color to a cell?

Finding the color of a particular cell is actually not very easy anymore! Recall that the order of our list does not matter. This means that for a given (x, y) coordinate, we do not know where its cell is in the list. In order to find it, we would have to check each and every cell in the list one at a time until we found the matching coordinate. If we cannot find the coordinate, then the cell is not in the list and hence must be black.

This means that to check the color of a cell, you would have to traverse $n$ cells, where $n$ is the number of cells in the list. In Big-O notation, this is $O(n)$, and this is considerably worse than $O(1)$. To put this into perspective, if we had a million colored cells in the list, we would need about a million steps to find the right cell. In the above array example, reading was in $O(1)$ time, meaning that regardless of the size of the array, we only need a single step to find the right cell. When you compare a million steps with a single step, the difference can be huge.

How about assigning a color? Is this any better? Unfortunately, it isn't. Note that if we need to add a new cell to the list, that's actually very easy. We can simply add a new cell to the end of the current linked list (like in the diagram above). The problem is that we don't already know whether or not a coordinate is in the list. We really do not want duplicate coordinates in the list, because that would imply that a single cell in the grid can have more than one color. Therefore, in order to assign a color, we first need to see if the coordinate is in the list. If it is, then we can change the color of that cell, and if it isn't, we can just add the cell to the end of the list.

Unfortunately, since we still end up having to look at every cell in the list to find if the (x, y) pair already exists, the write operation, like the read operation, takes $O(n)$ time.

This does not mean linked lists are useless though. Recall that the linked list method does not waste any space. If this is important, then this method may in fact be best.

So now we have a trade off. We can either use arrays which are fast but waste space, or we could use linked lists which are slow but don't waste any space. Is there a way we can combine the best of both methods?

# The Chunk List

What if instead of representing the infinite grid with a single array, we used a bunch of smaller arrays? Let's have the following rules:

- A
**chunk**is an array of cells. - Each chunk will be the same size, a square with length $w$
- We maintain a linked list of all the chunks currently in the grid
- Each chunk knows what range of coordinates it supports

This differs from the linked list method above since now, instead of linking individual cells, we are now linking **chunks** of cells. This kind of chunk list structure looks like the below diagram:

DIAGRAM

The goal in using this structure is to achieve some of the speed of arrays while not suffering from their space inefficiency. If you recall, the previous two structures had the following strengths and weaknesses:

Expanding Array | Linked List | |
---|---|---|

Strength | Very fast access time | Slow access time |

Weakness | Can have a ton of unused cells | No unused cells |

SOMETHIGN HERE

So like we did with the other two strategies, we need to analyze how well the chunk list performs when reading the color of a cell and writing a color to a cell. What we find may be slightly surprising.

To find out the color of cell (x, y), we first need to find the chunk which contains the (x, y) pair. We have a linked list of chunks, so all we need to do is traverse the list (inspect each chunk) to see which one contains the (x, y) pair. This requires a number of steps equal to the number of chunks in the list. Once the correct chunk is found, getting the appropriate cell from the chunk is easy since the chunk is just an array. In other words, finding the cell from the chunk takes $O(1)$ time.

However, how long does it take to find the right chunk from our list of all possible chunks? It really depends on the number of chunks we need to represent all of the cells we care about. If you remember in the array example, we used $N$ to denote the total number of cells currently in the grid, regardless of whether the cell is actually used or not. Since the chunk list does use arrays, some cells are going to be unused, so again we have that $N$ is equal to the total number of cells in the grid. The question is, given $N$, how many chunks do we expect there to be in the list?

This depends on the size of the chunks. In the diagram below, we show two chunk lists. The version on the left has smaller chunks, whereas the version on the right has larger chunks. Note that smaller chunks mean we need more chunks to represent all of the cells. Therefore, the size of each chunk determines how many chunks are necessary.

DIAGRAM

Recall that we stated in Rule 2 that each chunk would be of side length $w$. Using this, we can see that the number of chunks in the chunk list is $\frac{N}{w^2}$. So for example, if each chunk is 32 cells long and our grid is currently using 5120 cells (both used and unused), then the chunk list has only 5 chunks in it. Therefore, to find a coordinate in this chunk list, you only need to look at 5 chunks rather than 5120 cells. That seems pretty good!

Well, it's pretty good for 5120 cells at least. What happens when you have more cells, like a million or even a billion cells? We can use Big-O notation to find how well this structure scales. Since it takes time equal to the number of chunks in the list, it must be that accessing the color of a cell takes $O(\frac{N}{w^2})$ time. However, the $\frac{1}{w^2}$ is really just a constant value. It is always the same, and when analyzing algorithms in Big-O, we ignore anything that does not change when the number of cells increases. Therefore, the algorithm is really in $O(N)$. In other words, the more cells we have in the grid, the longer it takes to access colors for individual cells.

The $O(N)$ time is much worse than $O(1)$, and it's even worse than $O(n)$ (which was the time needed by the linked list). This is because the linked list only needed to keep a record of all the currently used cells and ignored unused cells entirely. The chunk list, however, has a bunch of unused cells.

If we analyze how long it takes to change the color of a cell, we find it still takes $O(N)$ for the same reason. You need to traverse the list of chunks, but once you find the right chunk, getting the desired coordinate is easy.

# A Comparison

Expanding Array | Linked List | Chunk List | |
---|---|---|---|

Read Coordinate | $O(1)$ | $O(n)$ | $O(N)$ |

Write Coordinate | $O(N)$ | $O(n)$ | $O(N)$ |

Space | O(N) | O(n) | O(N) |

## Comments