Thursday, August 16, 2018

Tree Creation - Mosaic Algorithm

Motivation

When I hear the word 'mosaic' I imagine an old, possibly ancient Roman, house with mosaic floors which depict ancient heroes. In the ancient times slaves spent many years putting the pieces in the correct order. 
The idea of a mosaic algorithm follows a similar pattern. Imagine that you have a lot of binary relations like "node A is the parent of node B", "node C is the child of node D", "nodes B and D are children of the same node". You can get such a set of binary relations from a natural language processing tool. For example the great Standford NLP library can give you a set of relations like 
  • Richard    per:children    Sonya
  • Sonya       per:siblings   Samantha
  • Samantha per:children   Robert
It would be great to build a complete tree from this set of relations. This is what the mosaic algorithm is about.

Applicability

Use this algorithm when you need to create a tree from a set of relations between the nodes. We assume that all the relations are correct even though the algorithm can handle duplicate relations.

Structure

The main idea is to use the set of relations as mosaic pieces. For every relation a fragment of the tree is created or modified. We'll consider 2 cases: a children relation and a sibling relation. Like this:

Subject
Relation
Object
Children
NodeX
children
NodeY
Siblings
NodeA
siblings
NodeB
This converts to the following structures in the code;

For any of these relations any node may already exist among the created fragments. It is also possible that no nodes exist for this relation. The actions necessary are summarized in the table:


Subj doesn’t exist,
Obj doesn’t exist
Subj exists,
Obj doesn’t exist
Subj doesn’t exist,
Obj exists
Subj exists,
Obj exists
Children
Create 2 nodes: one parent, one child
Find the parent in the existing nodes, add a child node
Create the parent node, find the child in the existing nodes and add it as a child to the parent node
Establish the relation if the nodes belong to different fragments (have different roots)
Siblings
Create 2 nodes with a dummy parent
Find the existing sibling, add a dummy parent if necessary, add a new child to the parent
Find the existing sibling, add a dummy parent if necessary, add a new child to the parent
Establish the relation if the nodes belong to different fragments (have different roots)

As every relation is processed either a new fragment is created or an existing fragment is modified. When relation processing is complete we have the full picture.
Only one important part of the algorithm remains - how to find the existing nodes for a relation. To achieve this after every relation is processed the newly created nodes are added to a Map or ArrayList (If the node has a unique key the HashMap is faster than the ArrayList).
An illustration how the tree is evolving as the relations are being processed:

Final Thoughts

I think I should be humble and avoid claiming that a new algorithm has been invented. This algorithm could be a special case of a well-known more general algorithm. But the idea looked interesting and I decided to write a post about it.

No comments:

Post a Comment