Hash Tables

Preface
Hash Tables are a very important data structure that I think that many novice programmers such as myself don't fully utilize in their programming.

Here I will explain:
  1. What is a Hash Table?
  2. How do they Work?
  3. What are some effective uses for Hash Tables?

What is a Hash Table?
Hash tables can be classified as "associate arrays", which basically it is an array structure that can map a "key" to some "value".  A key is some unique identifying value that you can use to "identify" some specific object or value.  

I think the most given example of this key/value pairing would be the student ID example.  For each student, they can be identified by their unique student ID.  No other student can have their student ID as long as the student is active in the school.  What is associated with that student ID, is a student record, which may contain: age, school level, classes taken, and any other data you would like to record.  So we end up with the Student ID is the "key", and the student record as the "value" in a Hash Table.

How does this array manage these values?  The array is able to store these key/value "entries" by using a "hash function" that maps a key to a specific location in the array.  

How do they work?

Taken from: Wikipedia

This super smooth diagram can be found at the Wikipedia entry for Hash Tables.  A simple explanation of this is that you have keys - in this case: "John Smith", "Lisa Smith", "Sandra Dee" that go through the "hash function" and are stored into the array.  We look at the key "John Smith".  As it is passed through the hash function, it is relocated to "bucket 2" (also known as index 2 for the array).  What is stored in that bucket is his phone number (also known as the value).  So in the example, the key/value pair is: name/phone number.

Question: So does this mean that the array size must be as large as the possible inputs?  
Answer: Nope!  The way that hash tables manage values, multiple keys can be mapped to a specific index in the array, which can be related to the idea of creating a "bucket".  Inside each index of the array, one way to handle what we call "collisions" (or when we have multiple keys mapped to the same index), is that we create a linked list that will append the new element onto the list.  In the event that we feel that there are far too many collisions, there are several methods that enable you to expand the array size in order to create more possible buckets for these values to fall in.  You will have to "re-hash" the hash table in that every entry in the table will be assigned a new bucket to be put in.

There are several other methods that you can use instead of chaining (when you use a linked list to handle collisions), but using Java Collection, as a novice programmer, you can let the java library handle those abstractions for you for now!

What are some good uses for Hash Tables?
Most explicit is when a problem requires you to associate two values with one each other.  Many beginner functions will allow you to use an array index as a way to associate that index with a value, but you begin to explore more problems, you will find that you will need more than the index to identify to a value. Hash tables give you this ability to "associate" a "key" to a "value".  This value does not have to be a primitive type, and is mostly used to store larger data structures such as a person object.

Say you want to create a data structure that given a state, it will return the capitol.  The key will be the name of the state, and the capital will be value in the hash table.

A quick Java implementation would go like this:

I use the HashMap implementation provided by in Java Collections.

//Start Code Segment
HashMap<String, String> hashtable = new HashMap<String, String>();
hashtable.put("California", "Sacramento");
//..
hashtable.put("Colorado", "Denver");
//etc
System.out.println("The capitol of Colorado is: " + hashtable.get("Colorado"));
//Alternatively you can print out the entire map
System.out.println(hashtable);
//Which will output => {California=Sacramento, Colorado=Denver}
//End Code Segment

In this code snippet, you can see that I created a HashMap with the parameters <String, String>.  In java, the first string will be the "key" parameter and the second string will be the "value" parameter.  So we when we call hashtable.put("California", "Sacramento"), we are saying that California is the key and Sacramento is the value.  hashtable.put is the method to insert the key/value pair into the hash table.  hashtable.get is the method that you use to use your key to get the value that you want.

Conclusion
This concludes this post about Hash Tables.  These are advanced structures to implement, but their use is fairly straightforward.  As a novice programmer, you should learn to look for situations that you can use hash tables to solve the problem in simple manner.  As you grow as a programmer, you can look into the underlying implementation of hash tables in order to further increase your understanding of when to use them more effectively, but don't worry about those details now.   Go out and try to use your own hash table!

Comments

Popular posts from this blog

Uncle Bob's Clean Architecture

C4 Model: Describing Software Architecture

Running RabbitMQ with Docker