README: added method details with time complexity analysis

2024-12-29 13:53:11 -05:00 · 2024-09-25 18:14:26 -04:00 · 2024-09-25 18:14:26 -04:00 · 6ae584ec8c
commit 6ae584ec8c
parent 155571e227
1 changed files with 69 additions and 9 deletions
--- a/README.md
+++ b/README.md
@ -1,12 +1,72 @@
 # HashMaps

-These two hash map implementations feature open addressing with quadratic probing
-and separate chaining to handle collisions. The hm\_include module provides the
-underlying data structures, and two hash functions.
+This hash map library features two methods for collision resolution: separate
+chaining, and open addressing with quadratic probing. All methods for both
+classes were implemented iteratively to guarantee straightforward time and space
+complexity. Further, no built-in Python methods or data structures were used -
+this library was written to avoid all current and future hidden surprises from
+the ground up.
+
+## Separate Chaining
+
+This implementation leverages a dynamic array of singly linked lists to create
+chains of key/value pairs. Time complexity assumes your hash function has a
+complexity of O(1).
+
+| Method            | Time Complexity (worst case) | Description |
+|-------------------|-----------------|----------------------------------------------------|
+| `put`           | O(n)            | Adds (or updates) a key/value pair to the hash map |
+| `empty_buckets` | O(n)            | Gets the number of empty buckets in the hash table |
+| `table_load`    | O(1)            | Gets the current hash table load factor |
+| `clear`         | O(n)            | Clear the contents of the hash map without changing its capacity |
+| `resize_table`  | O(n)            | Changes the capacity of the hash table |
+| `get`           | O(n)            | Gets the value associated with the given key |
+| `contains_key`  | O(n)            | Checks if a given key is in the hash map |
+| `remove`        | O(n)            | Removes a key/value pair from the hash map |
+| `get_keys`      | O(n)            | Gets an array that contains all the keys in the hash map |
+
+
+This data structure also includes a standalone function, `find_mode`, which
+returns a tuple containing an array comprising the mode (elements with the
+highest number of occurrences) and frequency (the number of times the mode
+appears.)
+
+## Open Addressing
+
+This hash map uses a dynamic array to create a series of individual
+buckets. Each bucket contains a key/value pair as well as a flag to indicate if
+the value has been deleted. This flag is also commonly known as a
+*tombstone*. The open address implementation also resizes the table
+automatically to ensure efficient insertion of new elements as the size
+increases. For the purpose of calculating time complexity, this implementation
+also assumes that your hash function runs in constant time.
+
+| Method            | Time Complexity (worst case) | Description |
+|-------------------|------------------------------|----------------------------------------------------|
+| `put`           | O(n)            | Adds (or updates) a key/value pair to the hash map |
+| `empty_buckets` | O(n)            | Gets the number of empty buckets in the hash table |
+| `table_load`    | O(1)            | Get the current hash table load factor |
+| `clear`         | O(n)            | Clear the contents of the hash map without changing its capacity |
+| `resize_table`  | O(n)            | Changes the capacity of the hash table |
+| `get`           | O(n)            | Gets the value associated with the given key |
+| `contains_key`  | O(n)            | Checks if a given key is in the hash map |
+| `remove`        | O(n)            | Removes a key/value pair from the hash map |
+| `get_keys`      | O(n)            | Gets an array that contains all the keys in the hash map |
+
+## Notes on Time Complexity
+
+While I have provided theoretical "worst case" time complexities in the tables
+above, the actual time complexity is highly dependent on a hash map's *load
+factor*. In short, we can consider the load factor to be *n/m*, where *n* is the
+number of elements and *m* is the number of available spaces. In the case of
+open addressing, the average expected time for adding an element is *1/1-λ*,
+where λ is the load factor. Thus, when *λ < 1* we should expect that the average
+and amortized time complexity for the `put` operation will actually be O(1). On
+the other hand, we can consider the expected time for separate chaining to be
+*λ + 1*, where the 1 represents the hashing operation. Once again, time
+complexity is dependant on the load factor and we should expect that the average
+and amortized time complexity will be O(1). However, it should be noted that the
+separate chaining hash table provided here will not automatically resize
+itself. I have left it to the user to decide when it is appropriate for their
+own program to resize.

-Both implementations use the included DynamicArray class for the underlying hash table, 
-however hash\_map\_sc.py uses a singly linked list for each bucket while hash\_map\_oa.py 
-uses a HashEntry object. Additionally, hash\_map\_sc.py includes a seperate function, 
-find\_mode(), that provides a mechanism for finding the value that occurs most
-frequently in the hash map and how many times it occurs with an O(n) time complexity.
-Finally, both implementations include some basic testing when run as a script.