Implementing A Trie Data structure In Golang(Search engine)
I was working on a small project that I needed to do text search.My initial thought was to use elastic search.However Installing and managing elastic search seemed like an overkill for a project that is less that 1000 lines of code.
What I wanted is just a small embedded library that can search about 5000,000 words.The library had to be:
- Fast enough
- Support Full text search
- Have an Autocomplete feature.
- Cache my result.
- Be memory efficient.
I tried using Bleve search but ended up abandoning it since it used to much memory when indexing my data.Finally what I was left with is tying to make a simple text search library and that when I decided to try creating a Trie data structure in golang.
Below is a simplified version of how I implemented the Trie.
Assumption made.
- We will only store lowercase letters.
- The basic trie will only be able to index and search data.
How does a trie work.
A Trie is a tree data structure in which all children of a node have a common prefix.
This means that the name cat and can would share the same prefix that is ca.
Looking up data in a trie is fast. The worst case is O(m) time (where m is the length of a search string).
Each character in a trie can be represented by a node.And in our case each node should have at-most 26 children that is (a-z) since we store only small letters.
Check the diagram below for an explanation.

We start with the root node that will always be the first node that we will visit.As you can see the root node has 26 children that is (a-z).
Node a which is a child of the root node will also have 26 children that is a to z.Add all nodes we will add to our trie should also have 26 children(slots).
Let take the name cat and see how we would represent it in our trie.

As you can see from the black nodes to index or search the word cat we would have to visit only 4 nodes. That is the root node ….then child c of the root node the a of that child and finally t of the child a.
If we had the name cats we would extend node t to have another 26 children that is a to z and then visit s of that t child.
This would continue until we are done with all the words we have.
Note:Even if we had one billion word in a dictionary it would take us only 4 steps to find the word cat.
Another thing you will notice is that most words share a path.For example car and cat share the node c and node a.Like below

Creating a node
We have established that.
- The first node in a trie is a root node that does not have any letter and can be null.
- Each node including the root node in our case should have 26 children.
Now let go ahead and implement our node.
//Node represent each character
type Node struct {
//this is a single letter stored for example letter a,b,c,d,etc
Char string
//store all children of a node
//that is from a-z
//a slice of Nodes(and each child will also have 26 children)
Children [26]*Node
}
/// NewNode this will be used to initialize a new node with 26 children
///each child should first be initialized to nil
func NewNode(char string) *Node {
node := &Node{Char: char}
for i := 0; i < 26; i++ {
node.Children[i] = nil
}
return node
}Creating the trie
Now that we have the main part of our trie (Nodes let create the trie).
- The trie should have a root node which can hold anything as it will be only be used for traversing the tree.
// Trie is our actual tree that will hold all of our nodes
//the Root node will be nil
type Trie struct {
RootNode *Node
}
// NewTrie Creates a new trie with a root('constructor')
func NewTrie() *Trie {
//we will not use this node so
//it can be anything
root := NewNode("\000")
return &Trie{RootNode: root}
}Indexing Our data
Indexing data means replacing a nil node with an actual character.
For example imagine we are indexing the word cat.
Basically what we would do is start from the root node.Then start transversing the tree trying to find the appropriate node to place the character c .If we get to the end of our tree and we have not finished indexing the whole word we just need to create a new node.
For example our trie at the moment contains only the root node and 26 children which all have the value of nil.
We already know that for each node's children index 0 will map to character a the index 1 will map to character b ,index 2 will map to character c and so on.
That means at index 2 of the root node we should insert c and then create another children of node c and place a at index 0 of that node.Continue doing this until all characters of cat are indexed.Note if a node already exist move to the next character.
/// Insert inserts a word to the trie
func (t *Trie) Insert(word string) error {
///this will keep track of our current node
///when transversing our tree
///it should always start at the top of our tree
///i.e our root
current := t.RootNode
///remove all spaces from the word
///and convert it to lowercase
strippedWord := strings.ToLower(strings.ReplaceAll(word, " ", ""))
for i := 0; i < len(strippedWord); i++ {
//from the ascii table a represent decimal number 97
//from the ascii table b represent decimal number 98
//from the ascii table c represent decimal number 99
/// and so on so basically if we were to say 98-97=1 which means the index of b is 1 and for c is 99-97
///that what is happening below (we are taking the decimal representation of a character and subtracting decimal representation of a)
index := strippedWord[i] - 'a'
///check if current already has a node created at our current node
//if not create the node
if current.Children[index] == nil {
current.Children[index] = NewNode(string(strippedWord[i]))
}
current = current.Children[index]
//since we want to support autocomplete
}
return nil
}NB: we have a line where we are saying index := strippedWord[i] — 'a' to get the index of a character.Basically what we are doing is taking the decimal representation of a character from the ascii table and subtracting decimal representation of a(97). For example c-a would be translated to (99–97)=2 which is index of c.
Searching for a word
The function to search for a word will be similar to that of indexing.
In short we will be transversing the tree in the same way.
// SearchWord will return false if a word we
//are searching for is not in the trie
//and true otherwise
func (t *Trie) SearchWord(word string) bool {
strippedWord := strings.ToLower(strings.ReplaceAll(word, " ", ""))
current := t.RootNode
for i := 0; i < len(strippedWord); i++ {
index := strippedWord[i] - 'a'
//we have encountered null in the path we were transversing meaning this is the last node
///that means this word is not indexed(present) in this trie
if current == nil || current.Children[index] == nil {
return false
}
}
return true
}Conclusion
Looking up data in a trie is fast in the worst case, O(m) time (where m is the length of a search string).We can also improve on this trie to do fuzzy search and autocomplete.
As for caching we can use a simple golang map to store already searched words or most searched words.