What is a Hash Table?

Paraphrased/rewritten from a couple of good answers from https://stackoverflow.com/questions/730620/how-does-a-hash-table-work:

Description 1

A hash table is basically an array of some key/value pairs. The maximum size of the hash table array aims to be smaller than the number of items being stored in the hash table. A hashing function is used to generate the array index where each piece of data is stored.

It's generally possible for the hashing function to generate the same hash (array index) for multiple pieces of data in the original set. In this case when the lookup occurs and multiple entries are found for a given hash, the items in that "bucket" can be looped through to perform direct comparison to find the matching entry.

Description 2

Take a bunch of items, and an array. For each item, make up an index for it, called a hash. An important thing about the hash is that it 'scatter' a lot as you don't want two similar items to have similar hashes.

Put each item into the array at the position indicated by the hash. More than one item can wind up at a given hash, so the items are stored in an array or similar, frequently called a bucket.

When looking up items in the hash table, go through the same steps, figuring out the hash value of the item, seeing what's in the bucket at that location, and checking whether it's what you're looking for.

When the hashing strategy is working well and the array big enough there will only be a few items at most at any particular index in the array, so lookups should be quick.

One optimization: when the hash table is accessed and an item found, move it to the first position of the bucket for quick retrieval next time.

Tags

 data structures  hash tables  algorithms  definitions