Storing/Retrieving Hierarchies In SQL Server Database
Hierarchies are one of the most important part of many applications. It can be in shape of categories, family tree, organization charts etc. Maintaining hierarchies in data source like XML is not a big deal but when it comes to the database like SQL Server which is consist of flat table in terms of storing data, then it become a challenge.
To overcome this (which I am sure most of us did ), we use recursion. Implementing recursion is very easy and can be applied on both back-end (by using cursors) and front-end. But yet recursive operation can be a great mess if hierarchy grow larger. Lots of repetitive quires and n number of database connection can make your application way slow.
So, the idea of the moment is to use the hierarchical data without using recursion. while searching for the solution I came across “Storing Hierarchical Data in a Database” by Gijs Van Tulder. In this article, the Dutch guy mention two method for the processing of Hierarchal data. One is the same recursion which we have discussed above and the other one is ‘modified preorder tree traversal' algorithm.
Modified preorder tree traversal :
To make you understand better let me quote what he mention in his article.
We'll start by laying out our tree in a horizontal way. Start at the root node (‘Food'), and write a 1 to its left. Follow the tree to ‘Fruit' and write a 2 next to it. In this way, you walk (traverse) along the edges of the tree while writing a number on the left and right side of each node. The last number is written at the right side of the ‘Food' node. In this image, you can see the whole numbered tree, and a few arrows to indicate the numbering order.
We'll call these numbers left and right (e.g. the left value of ‘Food' is 1, the right value is 18). As you can see, these numbers indicate the relationship between each node. Because ‘Red' has the numbers 3 and 6, it is a descendant of the 1-18 ‘Food' node. In the same way, we can say that all nodes with left values greater than 2 and right values less than 11, are descendants of 2-11 ‘Fruit'. The tree structure is now stored in the left and right values. This method of walking around the tree and counting nodes is called the ‘modified preorder tree traversal' algorithm.
Sources : http://www.sitepoint.com/article/hierarchical-data-database/2/
Note : Let me keep myself honest, the above two paragraphs and image is by Gijs Van Tulder so He deserve the credit of this content. I am just sharing the T/SQL implementation of this algorithm.
I again strongly recommend you to read the article completely before going onward.
Schema :
Ok to start, lets create a following table
ID is of course the primary key, where as comments is the simple text. Left and Right are the indexes of the child records where as level will show us at what level this node belongs to. so that we can set the tab index and indentation where as Entity Id is to group the nodes because in this table hundred different nodes set. So in short, Entity Id is to separate the different node set in a single table.
Following is the script of the table
1: CREATE TABLE [dbo].[tblComments](
2: [Id] [numeric](18, 0) IDENTITY(1,1) NOT NULL,
3: [Comments] [nvarchar](2000) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
4: [Left] [int] NULL,
5: [right] [int] NULL,
6: [Level] [int] NULL,
7: [EntityId] [int] NULL,
8: CONSTRAINT [PK_tblComments] PRIMARY KEY CLUSTERED
9: (
10: [Id] ASC
11: )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
12: ) ON [PRIMARY]
Add Record:
Now to add the record in this table, lets have a look at the following procedure
1: CREATE PROCEDURE [dbo].[InsertComments]
2: @Comments nvarchar(2000),
3: @EntityId int,
4: @ParentId int = null
5: AS
6: BEGIN
7: Declare @Left int
8: Declare @Right int
9: Declare @Level int
10: declare @RightParent int
11: declare @rightParentOrig int
12: declare @leftParentOrig int
13:
14: set @RightParent = null
15:
16: if @ParentId is null
17: begin
18: set @Left = 1
19: set @Right = 2
20: end
21: else
22: begin
23: select @rightParentOrig = [right],@leftParentOrig =[left] , @Right = [right] + 1,@Left=[right],@Level=[Level]+1 from tblComments where id = @ParentId
24: set @RightParent = @Right + 1
25: end
26:
27: UPDATE tblcomments
28: set
29: [right] = [right] + 2,
30: [left] = [left] + 2
31: where
32: [left] >= @Left and entityId = @EntityId
33:
34: UPDATE tblcomments
35: SET
36: [right] = [right] + 2
37: WHERE [left] < @leftParentOrig and [right] > @rightParentOrig
38:
39: INSERT INTO [tblComments]([Comments],[Left],[right],[Level],[EntityId])
40: VALUES
41: (@Comments,@Left,@Right,@Level,@EntityId)
42:
43: if @RightParent is not null
44: update tblComments set [right] = @RightParent where id = @ParentId
45:
46: END
Ok, now let me define the code line by line
First 15 : Well, these lines are quit easy to understand, we have declared some variables which we will use in next lines.
Line no 16-24 : We are checking if the ParentId is null, then of course the node is the parent it self, that is why setting left to 1 and right to 2. But when the parent is specified, we need to set the right and left value according to the parent (+1 of the parent) and also need to +1 the level.
Line no 27-37 : Consider a situation, where you have a tree completed and you want to add a node in the middle of the tree. These are the lines for that. Here, we are Creating space of the node by increasing the left and right values of the sibling nodes.
Line no 39: This is the simple insert statement, where we are adding our record to the table with calculated right and left.
Line no 43: In this statement, we are updating the right side of the parent node.
Now Add some records using the above procedures, and do the select as following
Select Records:
1: select * from tblComments where EntityId = 1
2: order by [left] ,[right]
3: -- 2nd option
4: select * from tblComments where [left] >= 1 and [right] <= 10 and EntityId = 1
5: order by [left] ,[right]
Both the above query will return the following result set.
Now you can see the first record i.e id=30 is the parent record. So, the left value is 1 and the right is 10 and the level is 1. where as when you see the second record, it is the child of the first record that is why level is 2 and left is +1 greater then the parent and because all the records up to id=35 are it’s child so it has right index +1 greater then the last child.
Now if you want to see the child of id=32 you can do as follows
1: declare @left int,
2: @right int,
3: @EntityId int
4:
5: select @left= [left], @right= [right],@EntityId = EntityId from tblComments where Id = 32
6:
7: select * from tblComments where [left] >= @left and [right] <= @right and EntityId = @EntityId
8: order by [left] ,[right]
In this way, you can exactly see the child nodes of each node.
Delete Records:
Now, lets see how can we remove the record from this table as hierarchy is depend upon the left and right index.
1: CREATE PROCEDURE [dbo].[DeleteComments]
2: @CommentId int
3: AS
4: BEGIN
5: Declare @Left int
6: Declare @Right int
7: declare @EntityId int
8:
9: select @Left = [left], @Right = [right], @EntityId = [EntityId] from tblComments
10: where id = @CommentId
11:
12:
13: delete from tblComments
14: where id = @CommentId
15:
16: update tblComments
17: set [left] = [left] - 2,
18: [right] = [right] - 2
19: where [left] >= @Left and entityId = @EntityId
20:
21:
22: UPDATE tblcomments
23: SET
24: [right] = [right] - 2
25: WHERE [left] < @Left and [right] > @Right
26:
27: END
In Line no 9-10 : we are storing the left, right and entity of the record that will be deleted.
In Line no 13-14 : delete the record from table.
In Line no 16- end: updating the right and left values of the parent and sibling nodes by –2.
Conclusion:
This way, we can easily achieve the task of Hierarchal data without recursion. One more time, with full respect I like to give all the credit to Gijs Van Tulder. I have just made small modification in this to make it work according to me to.