Wednesday, January 22, 2014

How HashSet is Implemented or works Internally in Java

Not many programmer know that HashSet is internally implemented using HashMap in Java, so if you know How HashMap works internally in Java, more likely you can figure out how HashSet works in Java. But, now a curious Java developer can question that, how come HashSet uses HashMap, because you need a key value pair to use with Map, while in HashSet we only store one object. Good question, isn't it? If you remember some functionality of earlier class, then you know that HashMap allows duplicate values and this property is exploited while implementing HashSet in Java. Since HashSet implements Set interface it needs to guarantee uniqueness and this is achieved by storing elements as keys with same value always. Things gets clear by checking HashSet.java from JDK source code. All you need to look at is, how elements are stored in HashSet and how they are retrieved from HashSet. Since HashSet doesn't provide any direct method for retrieving object e.g. get(Key key) from HashMap or get(int index) from List, only way to get object from HashSet is via Iterator. See here for code example of iterating over HashSet in Java. When you create an object of HashSet in Java, it internally create instance of backup Map with default initial capacity 16 and default load factor 0.75 as shown below :

/**
  * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
  * default initial capacity (16) and load factor (0.75).
  */

public HashSet() {
   map = new HashMap<>();
}

Now let's see the code for add() and iterate() method from java.util.HashSet in Java to understand how HashSet works internally in Java.


How Object is stored in HashSet
As you can see below, a call to add(Object) is delegate to put(Key, Value) internally, where Key is the object you have passed and value is another object,  called PRESENT, which is a constant in java.util.HashSet as shown below :

private transient HashMap<E,Object> map;

// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();

public boolean add(E e) {
   return map.put(e, PRESENT)==null;
}

Since PRESENT is a constant, for all keys we have same value in backup HashMap called map.


How Object is retrieved from HashSet
Now let's see the code for getting iterator for traversing over HashSet in Java. iterator() method from java.util.HashSet class returns iterator for backup Map returned by map.keySet().iterator() method.

       /**
     * Returns an iterator over the elements in this set.  The elements
     * are returned in no particular order.
     *
     * @return an Iterator over the elements in this set
     * @see ConcurrentModificationException
     */

    public Iterator<E> iterator() {
        return map.keySet().iterator();
    }


How to use HashSet in Java

Using HashSet in Java is very simple, don't think it is Map but think more like Collection i.e. add elements by using add() method, check its return value to see if object already existed in HashSet or not. Similarly use iterator for retrieving element from HashSet in Java. You can also use contains() method to check if any object already exists in HashSet or not. This method use equals() method for comparing object for matching. You can also use remove() method to remove object from HashSet. Since element of HashSet is used as key in backup HashMap, they must implement equals() and hashCode() method. Immutability is not requirement but if its immutable then you can assume that object will not be changed during its stay on set. Following example demonstrate basic usage of HashSet in Java, for more advanced example, you can check this tutorial.    

import java.util.HashSet;
import java.util.Iterator;


/**
 * Java Program to demonstrate How HashSet works internally in Java.
 * @author http://java67.blogspot.com
 */

public class HashSetDemo{
     
   public static void main(String args[]) {

      HashSet<String> supportedCurrencies = new HashSet<String>();              

      // adding object into HashSet, this will be translated to put() calls
      supportedCurrencies.add("USD");
      supportedCurrencies.add("EUR");
      supportedCurrencies.add("JPY");
      supportedCurrencies.add("GBP");
      supportedCurrencies.add("INR");
      supportedCurrencies.add("CAD");

      // retrieving object from HashSet
      Iterator<String> itr = supportedCurrencies.iterator();
      while(itr.hasNext()){
         System.out.println(itr.next());

      }
   }

}

Output
JPY
EUR
INR
USD
CAD
GBP

That's all about How HashSet is implemented in Java and How HashSet works internally. As I said, If you how HashMap internally in Java, you can explain working of HashSet provided,  you know it uses same values for all keys. Remember to override equals() and hashCode() for any object you are going to store in HashSet, since your object is used as key in backup Map, it must override those method. Make your object Immutable or effective immutable if possible.

10 comments:

  1. Little confusion.
    At the starting you said elements are stored as keys with same value always but later you said the value for all keys is constant which is PRESENT.
    Which of these is correct?

    ReplyDelete
    Replies
    1. @Hi, He is right. Object we are adding to the set is internally added as a Key to the HashMap and value for the HahsMap in the internal implementation of HashSet will be same "PRESENT". Hence we are only responsible for our object value that we are adding to the set.
      Thanks
      Kumar Shorav

      Delete
  2. As per my understanding I guess values are stored as null for all keys.

    ReplyDelete
    Replies
    1. That's not true, PRESENT object is stored as values for all keys. Look at the below code from add() method of HashSet :
      public boolean add(E e) {
      return map.put(e, PRESENT)==null;
      }

      You should not have doubt after looking at this code :)

      Delete
  3. why would anyone like to store null values ?.
    And in fact values are never stored ,only the references pointing to those values(ie objects) are stored in any type of collection.

    ReplyDelete
    Replies
    1. That's correct Amitesh, No null values are stored, only PRESENT is stored, and that too is just one object, it's just references which are stored.

      Delete
  4. But why HashSet cant have its own implementation , why does it use HashMap, why do we have to store "PRESENT" constant unnecerrily

    ReplyDelete
    Replies
    1. Well Prateek, this is just one way to implement HashSet. The advantage of this approach is that they don't have to implement HashSet from scratch, they can reuse HashMap code to the heavy lifting.

      Delete
  5. Very helpful article. I am becoming a very BIG fan of your blog. The content you write is extremely simple to understand. Thanks!

    ReplyDelete
    Replies
    1. Thank You Jaya. If you like article, don't forget to share with your friends :-)

      Delete

Java67 Headline Animator