Hash maps using (Separate) Chaining, and Open Addressing with Quadratic Probing
Find a file
2024-09-25 18:14:26 -04:00
.gitignore Initial commit 2022-06-01 05:13:31 +02:00
hash_map_oa.py Update docstrings, clean up testing output 2022-06-25 13:49:07 -04:00
hash_map_sc.py Fix testing indentation 2022-06-25 13:50:01 -04:00
hm_include.py Rename file, add citation for Oregon State 2022-06-25 13:48:31 -04:00
LICENSE Add name and year 2022-06-01 05:14:11 +02:00
README.md README: added method details with time complexity analysis 2024-09-25 18:14:26 -04:00

HashMaps

This hash map library features two methods for collision resolution: separate chaining, and open addressing with quadratic probing. All methods for both classes were implemented iteratively to guarantee straightforward time and space complexity. Further, no built-in Python methods or data structures were used - this library was written to avoid all current and future hidden surprises from the ground up.

Separate Chaining

This implementation leverages a dynamic array of singly linked lists to create chains of key/value pairs. Time complexity assumes your hash function has a complexity of O(1).

Method Time Complexity (worst case) Description
put O(n) Adds (or updates) a key/value pair to the hash map
empty_buckets O(n) Gets the number of empty buckets in the hash table
table_load O(1) Gets the current hash table load factor
clear O(n) Clear the contents of the hash map without changing its capacity
resize_table O(n) Changes the capacity of the hash table
get O(n) Gets the value associated with the given key
contains_key O(n) Checks if a given key is in the hash map
remove O(n) Removes a key/value pair from the hash map
get_keys O(n) Gets an array that contains all the keys in the hash map

This data structure also includes a standalone function, find_mode, which returns a tuple containing an array comprising the mode (elements with the highest number of occurrences) and frequency (the number of times the mode appears.)

Open Addressing

This hash map uses a dynamic array to create a series of individual buckets. Each bucket contains a key/value pair as well as a flag to indicate if the value has been deleted. This flag is also commonly known as a tombstone. The open address implementation also resizes the table automatically to ensure efficient insertion of new elements as the size increases. For the purpose of calculating time complexity, this implementation also assumes that your hash function runs in constant time.

Method Time Complexity (worst case) Description
put O(n) Adds (or updates) a key/value pair to the hash map
empty_buckets O(n) Gets the number of empty buckets in the hash table
table_load O(1) Get the current hash table load factor
clear O(n) Clear the contents of the hash map without changing its capacity
resize_table O(n) Changes the capacity of the hash table
get O(n) Gets the value associated with the given key
contains_key O(n) Checks if a given key is in the hash map
remove O(n) Removes a key/value pair from the hash map
get_keys O(n) Gets an array that contains all the keys in the hash map

Notes on Time Complexity

While I have provided theoretical "worst case" time complexities in the tables above, the actual time complexity is highly dependent on a hash map's load factor. In short, we can consider the load factor to be n/m, where n is the number of elements and m is the number of available spaces. In the case of open addressing, the average expected time for adding an element is 1/1-λ, where λ is the load factor. Thus, when λ < 1 we should expect that the average and amortized time complexity for the put operation will actually be O(1). On the other hand, we can consider the expected time for separate chaining to be λ + 1, where the 1 represents the hashing operation. Once again, time complexity is dependant on the load factor and we should expect that the average and amortized time complexity will be O(1). However, it should be noted that the separate chaining hash table provided here will not automatically resize itself. I have left it to the user to decide when it is appropriate for their own program to resize.