Sunday, February 26, 2006

Hierarchical data structures in relational databases

It's all relative, as are the many possibly ways to express hierarchical structures in relational databases. I spend a few hours this weekend reading about the theory behind this subject and I found many interesting articles but they all seemed to boil down to the following:

Adjacency list
Nested sets (Nested Intervals)
Materialized path

The three above mentioned methods may not be the only ones, but I think most others are derived from one of the above.
Anyways - I was looking for a method for blazingly fast reads and somewhat fast inserts. Especially fast inserts seems to rule out nested sets and fast reads on more than one level seems to rule out adjacency lists, but does it really.. I've come up with an approach albeit it might need some more space (hence data) its basically still an adjacency list but with link data.
For each node I insert the path to the root in a link table (called hierarchy in the below example). I simply go from the node to the root and for each node along the way, I insert a link into the link table. Then I am able to query all nodes for the whole tree or just a sub-tree if I want. I still have a "parent_id" on each node, so a "gimme all nodes directly below this node" is still possibly as with any adjacency list design.. I just have a tad more book-keeping.. But querying is fast and simply and so is inserts and deletes (since its quite fast to find ALL descendants of any given node).. Moves are a hazzle, but who does this a lot?? I surely don't..

There might be an "offical word" for this approach but I call it "Linked adjacency list".

Example:

Node (id, name, parent_id)
--------------
"1";"ROOT";""
"2";"Node 1";"1"
"3";"Node 2";"1"
"4";"Node 3";"1"
"5";"Node 1.1";"2"
"6";"Node 2.1";"3"
"7";"Node 3.1";"4"
"8";"Node 1.1.1";"5"
"9";"Node 1.1.2";"5"
"10";"Node 1.1.1.1";"8"
"11";"Node 3.1.1";"7"
"12";"Node 3.1.1.1";"11"
"13";"Node 3.1.1.1.1";"12"
"14";"Node 3.1.1.1.2";"12"

Hierarchy (node_id, ancestor_id)
--------------
"2";"1"
"3";"1"
"4";"1"
"5";"1"
"5";"2"
"6";"1"
"6";"3"
"7";"1"
"7";"4"
"8";"1"
"8";"2"
"8";"5"
"9";"5"
"9";"2"
"9";"1"
"10";"8"
"10";"5"
"10";"2"
"10";"1"
"11";"7"
"11";"4"
"11";"1"
"12";"11"
"12";"7"
"12";"4"
"12";"1"
"13";"12"
"13";"11"
"13";"7"
"13";"4"
"13";"1"
"14";"12"
"14";"11"
"14";"7"
"14";"4"
"14";"1"

This is a simple query of all data below node with id 1 (which is the root node, hence all data) wich select the name of the node and nodes depth in the tree:

select n.name, count(h2.node_id) as depth
from node n, hierarchy h, hierarchy h2
where h.ancestor_id = 1
and h2.node_id = h.node_id
and n.id = h.node_id
group by n.name

Which gives the following result:

Name;Depth
--------------
Node 3;1
Node 1.1.2;3
Node 2;1
Node 3.1.1;3
Node 2.1;2
Node 1.1.1.1;4
Node 3.1;2
Node 1.1;2
Node 1.1.1;3
Node 3.1.1.1.1;5
Node 1;1
Node 3.1.1.1.2;5
Node 3.1.1.1;4

Quite messy, huh?

0 Comments:

Post a Comment

<< Home