2022-05-31 23:13:31 -04:00
|
|
|
# HashMaps
|
|
|
|
|
2024-09-25 18:14:26 -04:00
|
|
|
This hash map library features two methods for collision resolution: separate
|
|
|
|
chaining, and open addressing with quadratic probing. All methods for both
|
|
|
|
classes were implemented iteratively to guarantee straightforward time and space
|
|
|
|
complexity. Further, no built-in Python methods or data structures were used -
|
|
|
|
this library was written to avoid all current and future hidden surprises from
|
|
|
|
the ground up.
|
|
|
|
|
|
|
|
## Separate Chaining
|
|
|
|
|
|
|
|
This implementation leverages a dynamic array of singly linked lists to create
|
|
|
|
chains of key/value pairs. Time complexity assumes your hash function has a
|
|
|
|
complexity of O(1).
|
|
|
|
|
|
|
|
| Method | Time Complexity (worst case) | Description |
|
|
|
|
|-------------------|-----------------|----------------------------------------------------|
|
|
|
|
| `put` | O(n) | Adds (or updates) a key/value pair to the hash map |
|
|
|
|
| `empty_buckets` | O(n) | Gets the number of empty buckets in the hash table |
|
|
|
|
| `table_load` | O(1) | Gets the current hash table load factor |
|
|
|
|
| `clear` | O(n) | Clear the contents of the hash map without changing its capacity |
|
|
|
|
| `resize_table` | O(n) | Changes the capacity of the hash table |
|
|
|
|
| `get` | O(n) | Gets the value associated with the given key |
|
|
|
|
| `contains_key` | O(n) | Checks if a given key is in the hash map |
|
|
|
|
| `remove` | O(n) | Removes a key/value pair from the hash map |
|
|
|
|
| `get_keys` | O(n) | Gets an array that contains all the keys in the hash map |
|
|
|
|
|
|
|
|
|
|
|
|
This data structure also includes a standalone function, `find_mode`, which
|
|
|
|
returns a tuple containing an array comprising the mode (elements with the
|
|
|
|
highest number of occurrences) and frequency (the number of times the mode
|
|
|
|
appears.)
|
|
|
|
|
|
|
|
## Open Addressing
|
|
|
|
|
|
|
|
This hash map uses a dynamic array to create a series of individual
|
|
|
|
buckets. Each bucket contains a key/value pair as well as a flag to indicate if
|
|
|
|
the value has been deleted. This flag is also commonly known as a
|
|
|
|
*tombstone*. The open address implementation also resizes the table
|
|
|
|
automatically to ensure efficient insertion of new elements as the size
|
|
|
|
increases. For the purpose of calculating time complexity, this implementation
|
|
|
|
also assumes that your hash function runs in constant time.
|
|
|
|
|
|
|
|
| Method | Time Complexity (worst case) | Description |
|
|
|
|
|-------------------|------------------------------|----------------------------------------------------|
|
|
|
|
| `put` | O(n) | Adds (or updates) a key/value pair to the hash map |
|
|
|
|
| `empty_buckets` | O(n) | Gets the number of empty buckets in the hash table |
|
|
|
|
| `table_load` | O(1) | Get the current hash table load factor |
|
|
|
|
| `clear` | O(n) | Clear the contents of the hash map without changing its capacity |
|
|
|
|
| `resize_table` | O(n) | Changes the capacity of the hash table |
|
|
|
|
| `get` | O(n) | Gets the value associated with the given key |
|
|
|
|
| `contains_key` | O(n) | Checks if a given key is in the hash map |
|
|
|
|
| `remove` | O(n) | Removes a key/value pair from the hash map |
|
|
|
|
| `get_keys` | O(n) | Gets an array that contains all the keys in the hash map |
|
|
|
|
|
|
|
|
## Notes on Time Complexity
|
|
|
|
|
|
|
|
While I have provided theoretical "worst case" time complexities in the tables
|
|
|
|
above, the actual time complexity is highly dependent on a hash map's *load
|
|
|
|
factor*. In short, we can consider the load factor to be *n/m*, where *n* is the
|
|
|
|
number of elements and *m* is the number of available spaces. In the case of
|
|
|
|
open addressing, the average expected time for adding an element is *1/1-λ*,
|
|
|
|
where λ is the load factor. Thus, when *λ < 1* we should expect that the average
|
|
|
|
and amortized time complexity for the `put` operation will actually be O(1). On
|
|
|
|
the other hand, we can consider the expected time for separate chaining to be
|
|
|
|
*λ + 1*, where the 1 represents the hashing operation. Once again, time
|
|
|
|
complexity is dependant on the load factor and we should expect that the average
|
|
|
|
and amortized time complexity will be O(1). However, it should be noted that the
|
|
|
|
separate chaining hash table provided here will not automatically resize
|
|
|
|
itself. I have left it to the user to decide when it is appropriate for their
|
|
|
|
own program to resize.
|
|
|
|
|