mirror of
https://codeberg.org/andyscott/HashMaps.git
synced 2024-12-29 13:53:11 -05:00
README: added method details with time complexity analysis
This commit is contained in:
parent
155571e227
commit
6ae584ec8c
1 changed files with 69 additions and 9 deletions
78
README.md
78
README.md
|
@ -1,12 +1,72 @@
|
|||
# HashMaps
|
||||
|
||||
These two hash map implementations feature open addressing with quadratic probing
|
||||
and separate chaining to handle collisions. The hm\_include module provides the
|
||||
underlying data structures, and two hash functions.
|
||||
This hash map library features two methods for collision resolution: separate
|
||||
chaining, and open addressing with quadratic probing. All methods for both
|
||||
classes were implemented iteratively to guarantee straightforward time and space
|
||||
complexity. Further, no built-in Python methods or data structures were used -
|
||||
this library was written to avoid all current and future hidden surprises from
|
||||
the ground up.
|
||||
|
||||
## Separate Chaining
|
||||
|
||||
This implementation leverages a dynamic array of singly linked lists to create
|
||||
chains of key/value pairs. Time complexity assumes your hash function has a
|
||||
complexity of O(1).
|
||||
|
||||
| Method | Time Complexity (worst case) | Description |
|
||||
|-------------------|-----------------|----------------------------------------------------|
|
||||
| `put` | O(n) | Adds (or updates) a key/value pair to the hash map |
|
||||
| `empty_buckets` | O(n) | Gets the number of empty buckets in the hash table |
|
||||
| `table_load` | O(1) | Gets the current hash table load factor |
|
||||
| `clear` | O(n) | Clear the contents of the hash map without changing its capacity |
|
||||
| `resize_table` | O(n) | Changes the capacity of the hash table |
|
||||
| `get` | O(n) | Gets the value associated with the given key |
|
||||
| `contains_key` | O(n) | Checks if a given key is in the hash map |
|
||||
| `remove` | O(n) | Removes a key/value pair from the hash map |
|
||||
| `get_keys` | O(n) | Gets an array that contains all the keys in the hash map |
|
||||
|
||||
|
||||
This data structure also includes a standalone function, `find_mode`, which
|
||||
returns a tuple containing an array comprising the mode (elements with the
|
||||
highest number of occurrences) and frequency (the number of times the mode
|
||||
appears.)
|
||||
|
||||
## Open Addressing
|
||||
|
||||
This hash map uses a dynamic array to create a series of individual
|
||||
buckets. Each bucket contains a key/value pair as well as a flag to indicate if
|
||||
the value has been deleted. This flag is also commonly known as a
|
||||
*tombstone*. The open address implementation also resizes the table
|
||||
automatically to ensure efficient insertion of new elements as the size
|
||||
increases. For the purpose of calculating time complexity, this implementation
|
||||
also assumes that your hash function runs in constant time.
|
||||
|
||||
| Method | Time Complexity (worst case) | Description |
|
||||
|-------------------|------------------------------|----------------------------------------------------|
|
||||
| `put` | O(n) | Adds (or updates) a key/value pair to the hash map |
|
||||
| `empty_buckets` | O(n) | Gets the number of empty buckets in the hash table |
|
||||
| `table_load` | O(1) | Get the current hash table load factor |
|
||||
| `clear` | O(n) | Clear the contents of the hash map without changing its capacity |
|
||||
| `resize_table` | O(n) | Changes the capacity of the hash table |
|
||||
| `get` | O(n) | Gets the value associated with the given key |
|
||||
| `contains_key` | O(n) | Checks if a given key is in the hash map |
|
||||
| `remove` | O(n) | Removes a key/value pair from the hash map |
|
||||
| `get_keys` | O(n) | Gets an array that contains all the keys in the hash map |
|
||||
|
||||
## Notes on Time Complexity
|
||||
|
||||
While I have provided theoretical "worst case" time complexities in the tables
|
||||
above, the actual time complexity is highly dependent on a hash map's *load
|
||||
factor*. In short, we can consider the load factor to be *n/m*, where *n* is the
|
||||
number of elements and *m* is the number of available spaces. In the case of
|
||||
open addressing, the average expected time for adding an element is *1/1-λ*,
|
||||
where λ is the load factor. Thus, when *λ < 1* we should expect that the average
|
||||
and amortized time complexity for the `put` operation will actually be O(1). On
|
||||
the other hand, we can consider the expected time for separate chaining to be
|
||||
*λ + 1*, where the 1 represents the hashing operation. Once again, time
|
||||
complexity is dependant on the load factor and we should expect that the average
|
||||
and amortized time complexity will be O(1). However, it should be noted that the
|
||||
separate chaining hash table provided here will not automatically resize
|
||||
itself. I have left it to the user to decide when it is appropriate for their
|
||||
own program to resize.
|
||||
|
||||
Both implementations use the included DynamicArray class for the underlying hash table,
|
||||
however hash\_map\_sc.py uses a singly linked list for each bucket while hash\_map\_oa.py
|
||||
uses a HashEntry object. Additionally, hash\_map\_sc.py includes a seperate function,
|
||||
find\_mode(), that provides a mechanism for finding the value that occurs most
|
||||
frequently in the hash map and how many times it occurs with an O(n) time complexity.
|
||||
Finally, both implementations include some basic testing when run as a script.
|
||||
|
|
Loading…
Reference in a new issue