Trie – Data Structure | Strings | Everything you need to know
logo
Manas Sinha
Developer | Designer
logo
Manas Sinha
Developer | Designer

By Manas | 21 September 2021 | 2 mins read

TRIE - DATA STRUCTURES

Applications | Building Trie from Scratch
Let’s say we have a series of strings. What operations do you think we can do on this list? We could look for a certain string in the list, add strings to the list, and delete strings from the list. While there are many data structures you can use to store this, they all have some pros and cons. Trie is one of those data structures. It is an efficient information reTrieval data structure.
Trie, also known as prefix tree or digital tree, is an advanced tree data structure that makes data retrieval operations more efficient. Tries are usually used to store strings and are visualized as a tree. The advantage of using trie is that it can reduce the complexity of searching a certain string in the list to O(m) where m is the length of the string. If you compare this with an array , array will have O(m*n) complexity where m is the length of the string and n is the size of array. Whereas a BST will also have a time complexity of O(m*log(n)). It does however come with a limitation that is space which we will see later.
So it seems quite clear that if searching is your primary objective, Trie is the way to go. And also its quite easy to implement… But it’s up to you .. or is it? 😏

0. TRIE STRUCTURE

Each node of a trie contains two things:
  • An array of pointers usually 26 pointers pointing to each one of the alphabets but there are no restrictions.
  • A boolean value to find out if the current character is the end of some string.
trie
We can see that, to access a key we have to do a depth-first search and following the link at every node that represents the character in the key until we find the boolean isEnd == True.
We should also note that, unlike binary trees, there is no data member in any node. This is because the nodes in a trie don’t store the key. Instead, each non-empty link in the pointers array represents the next character which comes after the string formed by its parent node.For example, if the 2nd position in the array is not NULL, as seen in the image, this means ‘b’ comes after the string formed by the current node’s parent.
All the children of any internal node in the trie carry the same prefix, thus giving it the name ‘prefix tree’.
But why bother using such a complex data structure? We’ll see…

1. Making a trie

Step 1 : Create a Trie class. It’s like making a linked list. As I said earlier, each node has two things, array of children and an IsEnd boolean.
Step 2 : To initialize it we set all children pointer to NULL meaning there are no children yet.
Step 3 : Now the cool part, to insert a word in this Trie. We already know, any node in a trie doesn’t have any data. So we only need to keep inserting new Trie child node at required positions in parent nodes. Let’s say we want to insert the word “binary”, we start with the root node of Trie. We check if the children[1] //(0=a,1=b and so on) is NULL or not. If it is not NULL very good ‘b’ is already there. But If it is null it means we don’t have a node for ‘b’ yet, so we add it. Then we move to that node. Then we do the same thing for our next character ‘i’. We will repeat these two steps until we reach the last character, where we will set the IsEnd boolean to true marking the end of the word.
Step 4 : Search operation is very similar to insert. Here we just check if the next character that we want is already there in the trie or not. If at some point we find a character is missing it means the word is also missing. If we reach the end of the word and found all the characters we can’t just end and say yes the word is present. We need to check if IsEnd boolean is true or not. Because if it is not, the word doesn’t exist in the trie #word_heist. This is an important point to note. So if you want to delete a word you just need to look up that word and mark isEnd = false.
You can refer to This leetcode problem to solve it yourself.
//Step 1 ==========
class Trie {
    Trie* children[26];
    bool isEnd;
public:
//Step 2 ==========
    Trie() {
        isEnd = false;
        for(int i=0;i<26;++i)
            children[i] = NULL;
    }
    
//Step 3 ==========
    void insert(string word) {
        Trie* curr = this;
        for(char c : word)
        {
            if(!curr->children[c - 'a'])
            {
                curr->children[c - 'a'] = new Trie();
            }
            curr = curr->children[c - 'a'];
        }
        curr->isEnd = true;
    }
//Step 4 ==========
    bool search(string word) {
        Trie* curr = this;
        for(char c : word)
        {
            curr = curr->children[c - 'a'];
            if(!curr)
                return false;
        }
        return curr->isEnd;
    }
};

/**
 * Your Trie object will be instantiated and called as such:
 * Trie* obj = new Trie();
 * obj->insert(word);
 * bool param_2 = obj->search(word);
 */

animation -- big good

3.advantages of using trie

Problem : You are given two lists of strings named dict and keys. Your job is to find how many strings in the list ‘keys’ are present in the list ‘dict’.
There are many possible solutions for this. One of the efficient ones will be to use a hash table EASY!! We can also solve this problem using a Trie. Which one will you use? Trie is a much efficient solution and more efficient if the size of dict list is very long and there is a much larger probability for hash collisions. Apart from this Trie has more advantages.
  • This you already know. No hash collisions.
  • No complex hash functions are needed.
  • Space doesn’t depend on the number of strings in the dict list rather than the length of each word.
  • Lookup can be done in O(k), k being the length of the word. In the worst case the lookup in the hash table be O(N).
  • Trie can be used to order keys in alphabetical order.

4. disadvantages of using trie

You already knew this was coming. There can’t be a solution with all advantages and no disadvantage, its always about the trade off. So lets see the disadvantages of trie:
  • In average case lookup in trie is slow.
  • Tries might end up taking more space sometimes as every character of a key is represented as a node (that has another array and a boolean inside of it), instead of a chunk of memory for every key as in the hash table.
Both of the data structures come neck-to-neck in terms of being efficient so it’s just a matter of the given problem statement. This is more than enough reason to learn this data structure.

5. Applications of Trie

  • In the real world, trie is used to implement the autocomplete feature that you see everywhere.
  • TrTrie is also used in correcting spelling.
  • Sorting list of keys.
  • String matching is another case where tries do a great job.

LEAVE A COMMENT

If you like the post leave a comment and share it.

This Post Has One Comment

Leave a Reply