Motivation
When I hear the word 'mosaic' I imagine an old, possibly ancient Roman, house with mosaic floors which depict ancient heroes. In the ancient times slaves spent many years putting the pieces in the correct order.
The idea of a mosaic algorithm follows a similar pattern. Imagine that you have a lot of binary relations like "node A is the parent of node B", "node C is the child of node D", "nodes B and D are children of the same node". You can get such a set of binary relations from a natural language processing tool. For example the great Standford NLP library can give you a set of relations like
- Richard per:children Sonya
- Sonya per:siblings Samantha
- Samantha per:children Robert
It would be great to build a complete tree from this set of relations. This is what the mosaic algorithm is about.
Applicability
Use this algorithm when you need to create a tree from a set of relations between the nodes. We assume that all the relations are correct even though the algorithm can handle duplicate relations.
Structure
The main idea is to use the set of relations as mosaic pieces. For every relation a fragment of the tree is created or modified. We'll consider 2 cases: a children relation and a sibling relation. Like this:
This converts to the following structures in the code;
For any of these relations any node may already exist among the created fragments. It is also possible that no nodes exist for this relation. The actions necessary are summarized in the table:
Subject
|
Relation
|
Object
|
|
Children
|
NodeX
|
children
|
NodeY
|
Siblings
|
NodeA
|
siblings
|
NodeB
|
For any of these relations any node may already exist among the created fragments. It is also possible that no nodes exist for this relation. The actions necessary are summarized in the table:
Subj
doesn’t exist,
Obj
doesn’t exist
|
Subj
exists,
Obj
doesn’t exist
|
Subj
doesn’t exist,
Obj
exists
|
Subj
exists,
Obj
exists
|
|
Children
|
Create 2
nodes: one parent, one child
|
Find the
parent in the existing nodes, add a child node
|
Create the
parent node, find the child in the existing nodes and add it as a
child to the parent node
|
Establish the relation if the nodes belong to different fragments (have different roots)
|
Siblings
|
Create 2
nodes with a dummy parent
|
Find the
existing sibling, add a dummy parent if necessary, add a new child
to the parent
|
Find the
existing sibling, add a dummy parent if necessary, add a new child
to the parent
|
Establish the relation if the nodes belong to different fragments (have different roots)
|
As every relation is processed either a new fragment is created or an existing fragment is modified. When relation processing is complete we have the full picture.
Only one important part of the algorithm remains - how to find the existing nodes for a relation. To achieve this after every relation is processed the newly created nodes are added to a Map or ArrayList (If the node has a unique key the HashMap is faster than the ArrayList).
An illustration how the tree is evolving as the relations are being processed:
Only one important part of the algorithm remains - how to find the existing nodes for a relation. To achieve this after every relation is processed the newly created nodes are added to a Map or ArrayList (If the node has a unique key the HashMap is faster than the ArrayList).
An illustration how the tree is evolving as the relations are being processed:
No comments:
Post a Comment