I have been curious about this for a while now. I fail to understand as to why the table did not print in a serial order.
Of course I might be able to use table.sort
but just curious why it doesn't do by itself.
Sample code.
-- [Declaration Section] local Table = _G.Alphabets -- [Output Section] for x, y in pairs(Table) do print(x, y); end;
I am not going to post the alphabets but the layout is in this order
_G.Alphabets = { [1] = "a" };
Anyone wanna explain me the reason.
27 A
2 b
38 L
3 c
4 d
It may seem like it goes back into order but again, it cuts off and goes somewhere else. And no its not _G's
fault here, I tried without it and still prints the same.
I don't know if this should be in a comment or an answer, since it's sort of not relevant, but seems a little big for a comment.
As already mentioned by mattscy, dictionaries don't have an order imposed on their values, while arrays do. You should use an array for something like this.
But - why is this? Computers are inherently deterministic machines. Why would they not impose an order when an obvious one exists?
The answer is down to the way dictionaries differ from arrays, and to properly understand it, we have to consider the computer's operation at a level closer to the hardware than we're normally used to dealing with.
Your computer's memory (or primary storage, not to be confused with secondary storage, or disk space) is basically divided up into little blocks. Every variable you use is assigned its own little block, and the value of the variable is written here.
However, programming in terms of block addresses is disgusting and a terrible idea for numerous reasons, so what we do instead is we assign names or identifiers to each of these little blocks, which we know as variable names.
When I do bananas_in_my_hand = 5
, what I'm doing is asking for a new block to be allocated to me, writing the value 5 to this block, and then saving the address of this block in a place called the "symbol table".
It's a little more complicated than this, but in short, a symbol table translates the human-readable name "banans_in_my_hand" to a machine readable address that the computer can actually use to go and fetch the value 5.
In short, whenever you use bananas_in_my_hand
again, what the computer is doing is asking its symbol table what address that identifier actually refers to, and then asking the computer's memory what value this address refers to.
Again, memory is actually a little more complicated than this, because we end up adding a bunch more layers between the raw hardware and your Lua script because each one is useful for something. Look into a process called "paging" if you want to learn more.
This is useful information, but how do arrays work?
Before we allocated one block to one variable, and now we allocate lots of different blocks to store all the values in an array. The key thing here though is that all the blocks we allocate are "contiguous", meaning that they're all next to each other, and there are no gaps between any of them.
Now, when we try to access a certain index of an array, we take the block at the start of the array, and we add an offset to it. For example, to access index 3 of bananas_on_the_tree
, we'd do bananas_on_the_tree[3]
, right? What's actually going on here?
What the computer is doing in the background, is going to the symbol table. The symbol table gives the address of the first block in bananas_on_the_tree
. Then we take this address, and add 3 to it, and from there we get the third location in our array. This is why, in most programming languages, arrays start at 0 - the first element of an array is at an address without any offset.
Now, there are sort of two obvious problems to this system
We need to allocate the blocks contiguously, meaning we can't add blocks to the end - there might be a few blocks next to our array in memory, so we'd have to introduce gaps, which we can't do.
We have to use indexes for our arrays
To get around the first problem, what a language will normally do is dynamically resize the array for you. It will "guess" the size of your array when you first create it, and then if you need more space, it requests an entirely new array, copies the old array to the new array, and then writes the new value to the correct position of the new array.
Obviously this introduces problems of its own. Resizing an array is obviously a really expensive process, because we have to deal with every element in the array already. This isn't really noticeable when we're dealing with arrays of size 5 or even 1000, but when we get to arrays of size 100 000, we start to see noticeable impacts in performance. Therefore, deciding when to resize an array is actually a pretty important decision to make.
The drawback to arrays is that we can only have indexes which are numbers. Going from our understanding of how arrays work, it's not like we can take the start address of an array and add "Bananas" to it, so how does things_in_a_shop["bananas"]
get evaluated?
Here what we do is something pretty clever - we apply what's called a "hashing function". Basically what this does is take any arbitrary data we feed it, and convert it through a one-way process to a number.
For example, if we were to write a hashing function for a string, what we could do is assign every letter a number, convert all letters in a string to numbers, and then add these numbers up.
Using hashing functions, we can therefore convert anything we like to a number. Then using that number, we can index it in an underlying array we keep hidden from the outside world. This array is still subject to the normal restrictions of an array (fixed size, contiguous in memory), but we get around these issues with our hashing function.
When we need to write to an index, we hash the index to a number x
. We write to index x
of an underlying array.
When we need to read from an index, we hash the index to a number y
. We read from index y
of an underlying array.
A structure built in this way is known as a "Dictionary" in languages such as Lua and Python, and a "Hash Map" in languages such as Java, C#, C, etc.
However, our idea has three key problems.
What if two elements end up having the same value once hashed? E.g., "cab" and "cba" would lead to the same number. How do we deal with this collision?
What if we end up choosing a hashing function which spreads out the indexes really far?
Similarly, what if the hashing function can generate numbers of any size? How big do we define the underlying array to be in the first place?
This is probably the most difficult part to understand, so don't worry if it's a little weird.
The solution to the first problem is that instead of storing a value at our address, we store another address, which points to what is effectively a big bucket where we store all our values.
When we have two values which hash to the same thing, such as "cab" and "cba", it is known as a "hash collision". We store both these indexes in the same bucket.
For example, we want to create the dictionary {"cab": 5, "cba": 9}
.
Before we do anything else, we create empty buckets 1 through to 10. 10 is pretty arbitrary here, and could actually be any number we like.
Now we insert "cab": 5
- hashing "cab"
gives us 3 + 1 + 2 = 6
, so we go to bucket number 6. We put "cab": 5
in to bucket number 6, that is, we basically store an mini array of length 2. The first value in the array is "cab", the index of our dictionary. The second value is the value of our dictionary, in this case 5.
Now we insert "cba": 6
- hashing "cba"
gives us 3 + 2 + 1 = 6
, so we go to bucket number 6 again. We put the array {"cba", 9} into bucket number 6.
Now say we want to check what the value at index "cba"
is. Hashing "cba"
, we get 6 as before, so we go to bucket number 6. Now we go through every array in the bucket, and check the first value.
First we find {"cab", 5}
, but "cab" ~= "cba"
, so we carry on.
We find {"cba", 9}
, and "cba" == "cba"
, so we know we've found our man. We look up the value at the second position in the array, which is 9.
However, how do we implement these buckets. We can't fix the length of them, because the whole reason we're doing this is that we don't know how many values are potentially going to be in one bucket.
Up until now we've seen two data structures - an array, and a dictionary. Solving this problem requires the introduction of a third - a linked list.
A linked list is made up of nodes. Every node is a value, and the address of the next node. The first node in a linked list is the "head".
To first the first value in a linked list, we just look at the value of the head.
To find the second value in a linked list, we find the head, and look at the next node on from the head, and then look at this one's value.
To find the third value in a linked list, we find the head, and look at the next node on from the next node on from the head, and then look at this one's value.
As you can probably imagine, the reason we don't use these normally is that this is really inefficient. Not only do we have to store the address of the next node every time, wasting memory, we also have to consider every value in the chain. For comparison, we only need to consider the start and and offset to read from an array index.
However, where we're only storing a few values, such as in our dictionary, this is okay. Therefore, we use linked lists for the "bucket" in the hash table.
Now we've solved the problem of hash collisions, how do we solve the other two problems? Sort of in one fell swoop, as it turns out.
Basically, we artificially impose a size constraint on our dictionary. When we hash a value, we take the modulo of that value, and use the result as the index.
For example, for a dictionary of size 5, if we get a hash value of 7, we'd take 7 % 5 = 2
, and store at index 2.
The rest of implementation of our dictionary comes down to deciding how big to make the underlying array in our dictionary, how to hash our values, and when to resize the underlying array if all the buckets are getting too full.
A slight caveat to all of the above is that in Lua, lists are actually dictionaries in disguise. Basically though, the way they're set out "under the hood" allows them to behave closely enough to lists that it doesn't really matter.
That's basically all there is.
When you create a dictionary, the keys have no order. As a result, iterating through a dictionary may not give each value in the order that you wrote it in. To solve this, you should remove the indexes and just make an array. Even though the keys are still the same integers, iterating through will keep it in the right order. Also, you might want to consider using module scripts rather than _G
.
This example should always print in order, as it is an array:
local Alphabet = { "A"; "B"; "C"; } for x,y in pairs(Alphabet) do print(x,y) end