From 6ae584ec8c1d9127611c5131770a890a758dcf0f Mon Sep 17 00:00:00 2001 From: Andrew Scott Date: Wed, 25 Sep 2024 18:14:26 -0400 Subject: [PATCH] README: added method details with time complexity analysis --- README.md | 78 ++++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 69 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index 29ac54d..f60bf92 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,72 @@ # HashMaps -These two hash map implementations feature open addressing with quadratic probing -and separate chaining to handle collisions. The hm\_include module provides the -underlying data structures, and two hash functions. +This hash map library features two methods for collision resolution: separate +chaining, and open addressing with quadratic probing. All methods for both +classes were implemented iteratively to guarantee straightforward time and space +complexity. Further, no built-in Python methods or data structures were used - +this library was written to avoid all current and future hidden surprises from +the ground up. + +## Separate Chaining + +This implementation leverages a dynamic array of singly linked lists to create +chains of key/value pairs. Time complexity assumes your hash function has a +complexity of O(1). + +| Method | Time Complexity (worst case) | Description | +|-------------------|-----------------|----------------------------------------------------| +| `put` | O(n) | Adds (or updates) a key/value pair to the hash map | +| `empty_buckets` | O(n) | Gets the number of empty buckets in the hash table | +| `table_load` | O(1) | Gets the current hash table load factor | +| `clear` | O(n) | Clear the contents of the hash map without changing its capacity | +| `resize_table` | O(n) | Changes the capacity of the hash table | +| `get` | O(n) | Gets the value associated with the given key | +| `contains_key` | O(n) | Checks if a given key is in the hash map | +| `remove` | O(n) | Removes a key/value pair from the hash map | +| `get_keys` | O(n) | Gets an array that contains all the keys in the hash map | + + +This data structure also includes a standalone function, `find_mode`, which +returns a tuple containing an array comprising the mode (elements with the +highest number of occurrences) and frequency (the number of times the mode +appears.) + +## Open Addressing + +This hash map uses a dynamic array to create a series of individual +buckets. Each bucket contains a key/value pair as well as a flag to indicate if +the value has been deleted. This flag is also commonly known as a +*tombstone*. The open address implementation also resizes the table +automatically to ensure efficient insertion of new elements as the size +increases. For the purpose of calculating time complexity, this implementation +also assumes that your hash function runs in constant time. + +| Method | Time Complexity (worst case) | Description | +|-------------------|------------------------------|----------------------------------------------------| +| `put` | O(n) | Adds (or updates) a key/value pair to the hash map | +| `empty_buckets` | O(n) | Gets the number of empty buckets in the hash table | +| `table_load` | O(1) | Get the current hash table load factor | +| `clear` | O(n) | Clear the contents of the hash map without changing its capacity | +| `resize_table` | O(n) | Changes the capacity of the hash table | +| `get` | O(n) | Gets the value associated with the given key | +| `contains_key` | O(n) | Checks if a given key is in the hash map | +| `remove` | O(n) | Removes a key/value pair from the hash map | +| `get_keys` | O(n) | Gets an array that contains all the keys in the hash map | + +## Notes on Time Complexity + +While I have provided theoretical "worst case" time complexities in the tables +above, the actual time complexity is highly dependent on a hash map's *load +factor*. In short, we can consider the load factor to be *n/m*, where *n* is the +number of elements and *m* is the number of available spaces. In the case of +open addressing, the average expected time for adding an element is *1/1-λ*, +where λ is the load factor. Thus, when *λ < 1* we should expect that the average +and amortized time complexity for the `put` operation will actually be O(1). On +the other hand, we can consider the expected time for separate chaining to be +*λ + 1*, where the 1 represents the hashing operation. Once again, time +complexity is dependant on the load factor and we should expect that the average +and amortized time complexity will be O(1). However, it should be noted that the +separate chaining hash table provided here will not automatically resize +itself. I have left it to the user to decide when it is appropriate for their +own program to resize. -Both implementations use the included DynamicArray class for the underlying hash table, -however hash\_map\_sc.py uses a singly linked list for each bucket while hash\_map\_oa.py -uses a HashEntry object. Additionally, hash\_map\_sc.py includes a seperate function, -find\_mode(), that provides a mechanism for finding the value that occurs most -frequently in the hash map and how many times it occurs with an O(n) time complexity. -Finally, both implementations include some basic testing when run as a script.