Strukt dictionary
As long as we follow the same probe sequence when looking for an element, we will be able to find the element again. The idea is that if we can't put an element in a particular place, we just keep walking up through the array until we find an empty slot. , wrapping around at the end of the array. A typical probe sequence (called linear probing) is just H(x), H(x)+1, H(x)+2. To find these other locations, we fix some probe sequence that tells us where to look if A contains an element that is not x.
With open addressing, we store only one element per location, and handle collisions by storing the extra elements in other unused locations in the array. So as long as the number of elements is proportional to the available space, we get constant-time FIND operations. Under the assumption that the hash function is a random function (which does not mean that it returns random values every time you call it but instead means that we picked one of the many possible hash functions uniformly at random), we can analyze the expected cost of a failed search as a function of the load factor α = n/m, where n is the number of elements in the table and m is the number of locations. Since the cost of scanning a linked list is linear in its size, this means that the worst-case cost of searching for a particular key will be linear in the number of keys in the table that hash to the same location. Every time we insert a new element in a particular location, we simply add it to this list. We can't really store more than one record in an array location, but we can fake it by making each array location be a pointer to a linked list. Let's consider these two approaches separately. Because we are always using the same function H, we will always be directed to the same position 137.īut what if H("Smith, Wayland") and H("Hephaestos") both equal 137? Now we have a collision, and we have to resolve it by finding some way to either (a) effectively store both records in a single array location, or (b) move one of the records to a new location that we can still find later. So in a database of people, to find "Smith, Wayland", we would first compute H("Smith, Wayland") = 137 (say), and then look in position 137 in the array. The solution is to feed the keys through some hash function H, which maps them down to array indices.
But we would like to get the constant-time performance of an array anyway. Unfortunately, naming conventions for most objects are not so convenient, and even enumerations like Social Security numbers are likely to span a larger range than we want to allocate. , n-1, we could simply use an array, and be able to find a record given a key in constant time. If our keys were conveniently named 0, 1, 2. For example, we could implement the dictionary as an array of structs that we search through, but that would be expensive: O(n) time to find a key in the worst case. 1 Dict * title 2 const char * user 3 title = dictCreate() 4 dictSet( title, " Barack ", " President ") 5 user = " Barack " 6 printf( " Welcome %s %s \n ", dictGet( title, user), user) Īs with other abstract data types, the idea is that the user of the dictionary type doesn't need to know how it is implemented.