Huffman Coding Java

Last Updated : 27 Jun, 2024

The Huffman coding is a popular algorithm used for lossless data compression. It works by assigning the variable-length codes to the input characters with the shorter codes assigned to the more frequent characters. This results in the prefix-free code meaning no code is a prefix of any other code. The Huffman coding guarantees an optimal prefix-free encoding minimizing the total number of bits required to encode a message.

Data Structure Organization:

The Huffman coding organizes data using the binary tree called a Huffman tree. The tree is constructed using the priority queue or a min-heap to efficiently find the two lowest frequency nodes and merge them into the new node. This process continues until all nodes are merged into a single root node forming the Huffman tree.

Root

/ \

Node1 Node2

/ \ / \

... ... ... ...

Implementation:

The Huffman coding can be implemented using various data structures such as the arrays linked lists or priority queues. One common approach is to the use a priority queue to efficiently select the two lowest frequency nodes for the merging. This implementation typically involves the following steps:

  • Calculate the frequency of each input character.
  • Create a leaf node for each character and add it to the priority queue.
  • While there is more than one node in the priority queue:
  • Remove the two nodes with the lowest frequency.
  • They merge them into a new internal node with the sum of their frequencies.
  • The Add the new node back to priority queue.
  • The remaining node in priority queue becomes the root of the Huffman tree.

Basic Operations:

  • Encoding: The Generate the Huffman codes for the each input character based on the constructed Huffman tree.The Time complexity: O(n log n) where n is the number of the unique characters.
  • Decoding: The Decode a sequence of the Huffman codes back into the original message using the Huffman tree. The Time complexity: O(n) where n is the length of the encoded message.
  • Compression: The Encode a message using the generated Huffman codes. The Space complexity: The Depends on the efficiency of the Huffman tree typically close to the entropy of the message.
  • Decompression: The Decode a compressed message back into the original message. The Space complexity: Depends on the size of the compressed message and efficiency of the Huffman tree.

Program:

Java
// Java Program to Implement
// Huffman Coding
import java.util.PriorityQueue;
import java.util.HashMap;

// Class representing a node in the Huffman Tree
class HuffmanNode {
  	// Character data
    char data;           
  	
  	// Frequency of the character
    int frequency;       
  
  	// Left and right child nodes
    HuffmanNode left, right; 

    // Constructor to initialize the node
    HuffmanNode(char data, int frequency) {
        this.data = data;
        this.frequency = frequency;
        left = right = null;
    }
}

// Driver Class
public class HuffmanCoding {
  	// Main Function
    public static void main(String[] args) {
      
        // The message to be encoded
        String message = "Huffman coding is a lossless data compression algorithm.";

        // Create a frequency map to count the frequency of each character
        HashMap<Character, Integer> frequencyMap = new HashMap<>();
      
        for (char c : message.toCharArray()) {
            frequencyMap.put(c, frequencyMap.getOrDefault(c, 0) + 1);
        }

        // Create a priority queue to build the Huffman Tree
        PriorityQueue<HuffmanNode> priorityQueue =
          new PriorityQueue<>((a, b) -> a.frequency - b.frequency);

        // Create a Huffman node for each character and add it to the priority queue
        for (char c : frequencyMap.keySet()) {
            priorityQueue.add(new HuffmanNode(c, frequencyMap.get(c)));
        }

        // Build the Huffman Tree
        while (priorityQueue.size() > 1) {
            // Remove the two nodes with the lowest frequency
            HuffmanNode left = priorityQueue.poll();
            HuffmanNode right = priorityQueue.poll();

            // Create a new internal node with these two nodes
          	// as children and add it back to the queue
            HuffmanNode newNode =
              new HuffmanNode('$', left.frequency + right.frequency);
          
            newNode.left = left;
            newNode.right = right;
            priorityQueue.add(newNode);
        }

        // The remaining node is the root of the Huffman Tree
        HuffmanNode root = priorityQueue.poll();

        // Print the Huffman codes for each character
        System.out.println("Huffman codes:");
        printCodes(root, new StringBuilder());
    }

    // Method to print the Huffman codes
    public static void printCodes(HuffmanNode root, StringBuilder code) {
        if (root == null) return;

        // If this is a leaf node, print the character and its code
        if (root.data != '$') {
            System.out.println(root.data + ": " + code);
        }
        
        // Traverse the left subtree
        if (root.left != null) {
            printCodes(root.left, code.append('0'));
            code.deleteCharAt(code.length() - 1);
        }
        
        // Traverse the right subtree
        if (root.right != null) {
            printCodes(root.right, code.append('1'));
            code.deleteCharAt(code.length() - 1);
        }
    }
}

Output

Huffman codes:
a: 000
m: 0010
n: 0011
.: 01000
p: 010010
u: 010011
l: 0101
s: 011
: 100
f: 10100
g: 10101
r: 10110
d: 10111
H: 110000
h: 110001
c: 11001
i: 1101
t: 11100
e: 11101
o: 1111

Applications of Huffman Coding

  • Data compression: The Huffman coding is widely used in the file compression algorithms like ZIP, gzip and JPEG.
  • Telecommunication: The Huffman coding is used in the voice and image compression to the reduce bandwidth usage.
  • Error detection and correction: The Huffman codes with the error-detecting capabilities are used in the applications like QR codes and barcodes.

Conclusion

The Huffman coding provides an efficient method for the lossless data compression by the assigning the variable-length codes to the input characters based on their frequencies. Its applications span across the various domains making it a fundamental algorithm in the computer science and telecommunications. By understanding the Huffman coding and its implementation one can appreciate its significance in the optimizing data storage and transmission.

Comment