a/HashMaps

mirror of https://codeberg.org/andyscott/HashMaps.git synced 2025-04-28 07:37:54 -04:00

Hash maps using (Separate) Chaining, and Open Addressing with Quadratic Probing

data-structures hashmap hashtable python

Find a file

Andrew Scott 6ae584ec8c README: added method details with time complexity analysis		2024-09-25 18:14:26 -04:00
.gitignore	Initial commit	2022-06-01 05:13:31 +02:00
hash_map_oa.py	Update docstrings, clean up testing output	2022-06-25 13:49:07 -04:00
hash_map_sc.py	Fix testing indentation	2022-06-25 13:50:01 -04:00
hm_include.py	Rename file, add citation for Oregon State	2022-06-25 13:48:31 -04:00
LICENSE	Add name and year	2022-06-01 05:14:11 +02:00
README.md	README: added method details with time complexity analysis	2024-09-25 18:14:26 -04:00

README.md

HashMaps

This hash map library features two methods for collision resolution: separate chaining, and open addressing with quadratic probing. All methods for both classes were implemented iteratively to guarantee straightforward time and space complexity. Further, no built-in Python methods or data structures were used - this library was written to avoid all current and future hidden surprises from the ground up.

Separate Chaining

This implementation leverages a dynamic array of singly linked lists to create chains of key/value pairs. Time complexity assumes your hash function has a complexity of O(1).

Method	Time Complexity (worst case)	Description
`put`	O(n)	Adds (or updates) a key/value pair to the hash map
`empty_buckets`	O(n)	Gets the number of empty buckets in the hash table
`table_load`	O(1)	Gets the current hash table load factor
`clear`	O(n)	Clear the contents of the hash map without changing its capacity
`resize_table`	O(n)	Changes the capacity of the hash table
`get`	O(n)	Gets the value associated with the given key
`contains_key`	O(n)	Checks if a given key is in the hash map
`remove`	O(n)	Removes a key/value pair from the hash map
`get_keys`	O(n)	Gets an array that contains all the keys in the hash map

This data structure also includes a standalone function, find_mode, which returns a tuple containing an array comprising the mode (elements with the highest number of occurrences) and frequency (the number of times the mode appears.)

Open Addressing

This hash map uses a dynamic array to create a series of individual buckets. Each bucket contains a key/value pair as well as a flag to indicate if the value has been deleted. This flag is also commonly known as a tombstone. The open address implementation also resizes the table automatically to ensure efficient insertion of new elements as the size increases. For the purpose of calculating time complexity, this implementation also assumes that your hash function runs in constant time.

Method	Time Complexity (worst case)	Description
`put`	O(n)	Adds (or updates) a key/value pair to the hash map
`empty_buckets`	O(n)	Gets the number of empty buckets in the hash table
`table_load`	O(1)	Get the current hash table load factor
`clear`	O(n)	Clear the contents of the hash map without changing its capacity
`resize_table`	O(n)	Changes the capacity of the hash table
`get`	O(n)	Gets the value associated with the given key
`contains_key`	O(n)	Checks if a given key is in the hash map
`remove`	O(n)	Removes a key/value pair from the hash map
`get_keys`	O(n)	Gets an array that contains all the keys in the hash map

Notes on Time Complexity

While I have provided theoretical "worst case" time complexities in the tables above, the actual time complexity is highly dependent on a hash map's load factor. In short, we can consider the load factor to be n/m, where n is the number of elements and m is the number of available spaces. In the case of open addressing, the average expected time for adding an element is 1/1-λ, where λ is the load factor. Thus, when λ < 1 we should expect that the average and amortized time complexity for the put operation will actually be O(1). On the other hand, we can consider the expected time for separate chaining to be λ + 1, where the 1 represents the hashing operation. Once again, time complexity is dependant on the load factor and we should expect that the average and amortized time complexity will be O(1). However, it should be noted that the separate chaining hash table provided here will not automatically resize itself. I have left it to the user to decide when it is appropriate for their own program to resize.