Hash Maps
-
The string should have a unique value. For that, we use a hash.
-
Assume that hash collisions will happen as part of the design.
-
Mapping
u32into 20k indices:-
Modulo operator:
-
.
-
.
-
-
The modulo operator can be used to separate the value into "buckets", where the number of buckets is equal to the value used for the module:
% 3 -> 3 buckets.-
.
-
-
To map 4 billion into 20 thousand:
4bi % 20k:-
.
-
-
Open Addressing -> Technique using One Array
-
Dealing with collisions:
-
.
-
Same
123index.
-
-
If that was already something in that index that doesn't correspond to what was expected, we keep searching forwards in the array until we find the data we're looking for:
-
.
-
-
-
The more items there is in this type of hashmap, it worse its performance gets, to the point that when full, the hashmap speed is basically equal to just linearly searching through an array.
-
We only want to use this design with a hashmap with partial capacity.
-
How much the internal array is full is called "load factor". When this load factor exceeds a certain amount, when should resize the array to avoid clustering and diminishing performance.
-
Resizing the array breaks the old
% length, as the length changes, so all previous elements have to be remapped using the new length. -
Deletion is also a big problem, as that could break the lookup due to collisions, requiring that instead of simply removing an element from the array, the slot should be marked as "deactivated", so the old lookups with collision still return a match; this obviously creates bloat.
Closed Addressing -> Stores a pointer to a different array
-
It's a better solution that Open Addressing.
-
It differs only by how Collision Resolution is done.
-
Instead of looking forwards in the array to find an empty slot, like with Open Addressing, the Closing addressing stores a pointer to a different array to store the collisions.
-
.
-
It basically introduces a linked list in the design.
Sparse Sets
-
Check the ECS section.