README: added method details with time complexity analysis

This commit is contained in:
Andrew Scott 2024-09-25 18:14:26 -04:00
parent 155571e227
commit 6ae584ec8c
Signed by: a
GPG key ID: 7CD5A5977E4931C1

View file

@ -1,12 +1,72 @@
# HashMaps
These two hash map implementations feature open addressing with quadratic probing
and separate chaining to handle collisions. The hm\_include module provides the
underlying data structures, and two hash functions.
This hash map library features two methods for collision resolution: separate
chaining, and open addressing with quadratic probing. All methods for both
classes were implemented iteratively to guarantee straightforward time and space
complexity. Further, no built-in Python methods or data structures were used -
this library was written to avoid all current and future hidden surprises from
the ground up.
## Separate Chaining
This implementation leverages a dynamic array of singly linked lists to create
chains of key/value pairs. Time complexity assumes your hash function has a
complexity of O(1).
| Method | Time Complexity (worst case) | Description |
|-------------------|-----------------|----------------------------------------------------|
| `put` | O(n) | Adds (or updates) a key/value pair to the hash map |
| `empty_buckets` | O(n) | Gets the number of empty buckets in the hash table |
| `table_load` | O(1) | Gets the current hash table load factor |
| `clear` | O(n) | Clear the contents of the hash map without changing its capacity |
| `resize_table` | O(n) | Changes the capacity of the hash table |
| `get` | O(n) | Gets the value associated with the given key |
| `contains_key` | O(n) | Checks if a given key is in the hash map |
| `remove` | O(n) | Removes a key/value pair from the hash map |
| `get_keys` | O(n) | Gets an array that contains all the keys in the hash map |
This data structure also includes a standalone function, `find_mode`, which
returns a tuple containing an array comprising the mode (elements with the
highest number of occurrences) and frequency (the number of times the mode
appears.)
## Open Addressing
This hash map uses a dynamic array to create a series of individual
buckets. Each bucket contains a key/value pair as well as a flag to indicate if
the value has been deleted. This flag is also commonly known as a
*tombstone*. The open address implementation also resizes the table
automatically to ensure efficient insertion of new elements as the size
increases. For the purpose of calculating time complexity, this implementation
also assumes that your hash function runs in constant time.
| Method | Time Complexity (worst case) | Description |
|-------------------|------------------------------|----------------------------------------------------|
| `put` | O(n) | Adds (or updates) a key/value pair to the hash map |
| `empty_buckets` | O(n) | Gets the number of empty buckets in the hash table |
| `table_load` | O(1) | Get the current hash table load factor |
| `clear` | O(n) | Clear the contents of the hash map without changing its capacity |
| `resize_table` | O(n) | Changes the capacity of the hash table |
| `get` | O(n) | Gets the value associated with the given key |
| `contains_key` | O(n) | Checks if a given key is in the hash map |
| `remove` | O(n) | Removes a key/value pair from the hash map |
| `get_keys` | O(n) | Gets an array that contains all the keys in the hash map |
## Notes on Time Complexity
While I have provided theoretical "worst case" time complexities in the tables
above, the actual time complexity is highly dependent on a hash map's *load
factor*. In short, we can consider the load factor to be *n/m*, where *n* is the
number of elements and *m* is the number of available spaces. In the case of
open addressing, the average expected time for adding an element is *1/1-λ*,
where λ is the load factor. Thus, when *λ < 1* we should expect that the average
and amortized time complexity for the `put` operation will actually be O(1). On
the other hand, we can consider the expected time for separate chaining to be
*λ + 1*, where the 1 represents the hashing operation. Once again, time
complexity is dependant on the load factor and we should expect that the average
and amortized time complexity will be O(1). However, it should be noted that the
separate chaining hash table provided here will not automatically resize
itself. I have left it to the user to decide when it is appropriate for their
own program to resize.
Both implementations use the included DynamicArray class for the underlying hash table,
however hash\_map\_sc.py uses a singly linked list for each bucket while hash\_map\_oa.py
uses a HashEntry object. Additionally, hash\_map\_sc.py includes a seperate function,
find\_mode(), that provides a mechanism for finding the value that occurs most
frequently in the hash map and how many times it occurs with an O(n) time complexity.
Finally, both implementations include some basic testing when run as a script.