Monday, January 09, 2006

SQL - Parent child relationship trees

--- WORK IN PROGRESS ---

I'm still tyring to finish this writeup on my understanding of oracle trees. Please feel free to post comments/suggestions.

I've been working on a project where the client can manage a list of categories with an unlimited level of subcategories. The result is a tree structure that looks something like this:

category1
--subcategory1
--subcategory2
----subsubcategory1

category2
--subcategory3

Note that the first category1 has 3 tiers (category, subcategory and subsubcategory) while category2 has only 2 tiers.

I need to be able to create a unlimited level tree structure from this data for web site navigation. To get this to work I created a parent child relationship in a DB table as shown below.

--- using oracle style sql

CREATE TABLE categories (
category_id INT NOT NULL PRIMARY KEY,
category_name VARCHAR2(128) NOT NULL,
parent_id INT NULL
);

INSERT INTO categories (category_id, category_name,parent_id)
VALUES(1,'category1',NULL);
INSERT INTO categories (category_id, category_name,parent_id)
VALUES(2,'subcategory1',1);
INSERT INTO categories (category_id, category_name,parent_id)
VALUES(3,'subcategory2',1);
INSERT INTO categories (category_id, category_name,parent_id)
VALUES(4,'subsubcategory1',3);
INSERT INTO categories (category_id, category_name,parent_id)
VALUES(5,'category2',NULL);
INSERT INTO categories (category_id, category_name,parent_id)
VALUES(6,'subcategory3',5);
COMMIT;

You end up with rows of data in the table that look like the following matrix.









category_idcategory_nameparent_id
1category1NULL
2subcategory11
3subcategory21
4subsubcategory13
5category2NULL
6subcategory35


The category_id column is the unique primary key for records in this table. The parent_id is a foreign key to the primary key column in the same table. The parent_id links the current row to a parent record in the same table. In short, the parent_id tells us if the current record has a parent (NULL means no parent) and who that parent is. We will use this relationship to "Walk the tree" where given any category_id, we can find all of its parents and all of its children within the tree.

When building a navigation menu, we tend to start at the top and work our way down. To do that, we need to start with all the top level parent categories. We know we are at the "top" of the list when we have a record with no parent (NULL for the parent_id).


--- Select all top level categories
SELECT * FROM categories WHERE parent_id IS NULL;

category_id category_name parent_id
----------- ------------- ---------
1 category1 (null)
6 category2 (null)

2 record(s) selected


If you look at the original tree drawing, this query does return the top level parent records. Now its time to put this simple parent/child relationship to the test and if it can be usefull as it is or if we need to add more data to make it work.


Lets try to create the navigation.


SELECT * FROM categories;

CATEGORY_ID CATEGORY_NAME PARENT_ID
-------------- ---------------- ------------
1 category1 (null)
2 subcategory1 1
3 subcategory2 1
4 subsubcategory1 3
5 category2 (null)
6 subcategory3 5

6 record(s) selected


Everything looks ok, but there is a hidden problem. How do we know these records are in the correct order? In this case, pure dum luck. If we change our example data we can see the problem more clearly:


DROP TABLE categories;

CREATE TABLE categories (
CATEGORY_ID NUMBER(22,0) NOT NULL,
CATEGORY_NAME VARCHAR2(128) NOT NULL,
PARENT_ID NUMBER(22,0) NULL,
SORT_ORDER NUMBER(22,0) NOT NULL
);

INSERT INTO categories(category_id, category_name, parent_id)
VALUES(1, 'category3', NULL);
INSERT INTO categories(category_id, category_name, parent_id)
VALUES(2, 'subcategory5', 1);
INSERT INTO categories(category_id, category_name, parent_id)
VALUES(3, 'subcategory4', 1);
INSERT INTO categories(category_id, category_name, parent_id)
VALUES(4, 'subsubcategory4', 3);
INSERT INTO categories(category_id, category_name, parent_id)
VALUES(5, 'category2', NULL);
INSERT INTO categories(category_id, category_name, parent_id)
VALUES(6, 'subcategory3', 5);

COMMIT;


SELECT * FROM categories;

CATEGORY_ID CATEGORY_NAME PARENT_ID
-------------- ---------------- ------------
1 category3 (null)
2 subcategory5 1
3 subcategory4 1
4 subsubcategory4 3
5 category2 (null)
6 subcategory3 5

6 record(s) selected



Woah! Whats going on here? We havent told Oracle how to sort the results, so Oracle is picking the method it thinks is best. A good rule of when working with data sets is to allways specify an ORDER BY clause to guarntee you get the results back in the way you expect.

So, how can we fix the order of this recordset? We cant. We need more data to allow us to tell Oracle how to sort these records. We need a sort order column. So lets revisit our table creation statment with our new colum "sort_order" with values for sorting:



DROP TABLE categories;

CREATE TABLE categories (
category_id INT NOT NULL PRIMARY KEY,
category_name VARCHAR2(128) NOT NULL,
parent_id INT NULL,
sort_order INT NOT NULL
);

INSERT INTO categories (category_id, category_name,parent_id,sort_order)
VALUES(1,'category1',NULL,1);
INSERT INTO categories (category_id, category_name,parent_id,sort_order)
VALUES(2,'subcategory1',1,1);
INSERT INTO categories (category_id, category_name,parent_id,sort_order)
VALUES(3,'subcategory2',1,2);
INSERT INTO categories (category_id, category_name,parent_id,sort_order)
VALUES(4,'subsubcategory1',3,1);
INSERT INTO categories (category_id, category_name,parent_id,sort_order)
VALUES(5,'category2',NULL,2);
INSERT INTO categories (category_id, category_name,parent_id,sort_order)
VALUES(6,'subcategory3',5,1);
COMMIT;


I've added a sort order column and I have chosen to order each record within the higharchy level that I know it should be returned within.
This means that sort_orders will only be unique within a tier for a given parent. If I lost you with that last statement, just keep reading, we will address that issue soon.
Here is how our data looks now:


SELECT * FROM categories;

CATEGORY_ID CATEGORY_NAME PARENT_ID SORT_ORDER
-------------- ---------------- ------------ -------------
1 category3 (null) 2
2 subcategory5 1 2
3 subcategory4 1 1
4 subsubcategory4 3 1
5 category2 (null) 1
6 subcategory3 5 1

6 record(s) selected


Here we see the basic sorting problem. We now have all the information that we should need to create and sort our higharchy.
So where do we go from here? Lets see what happens when we sort on the sort_order column:


SELECT * FROM categories ORDER BY sort_order;

CATEGORY_ID CATEGORY_NAME PARENT_ID SORT_ORDER
-------------- ---------------- ------------ -------------
3 subcategory4 1 1
4 subsubcategory4 3 1
5 category2 (null) 1
6 subcategory3 5 1
1 category3 (null) 2
2 subcategory5 1 2

6 record(s) selected



We are not taking into account the higharchy level, so the sorting does not work. (as a side note, we could do absolute sorting where the sort order column is unique, but lets continue to explore this solution first)

What we really need to do is somehow get the results returned already ordered by the higharchy level, then sub-sort by the sort order. SQL does not support returning trees in this way (that I know of) so after searching google I found a suggestion from oracle on how to do this in oracle 9i (the Database I'm currently using)...if you are using 10g, it should be even easier. Check out the article for more info:

http://www.oracle.com/technology/oramag/oracle/05-may/o35asktom.html

Anywho, their version of the solution is a bit hard to understand, so here is my simpler version that solves our problem:


SELECT category_name,
category_id,
level,
sort_order
FROM categories
START WITH parent_id IS NULL
CONNECT BY PRIOR category_id = parent_id
ORDER SIBLINGS BY sort_order;



This is an oracle specific solution. The SYS_CONNECT_BY_PATH function, START WITH and CONNECT BY PRIOR statements are all non-standard oracle features. I have no idea how you would do this in pure SQL...I assume you would need to write some procedual code to do the work for you.

Lets break down how this works:

2 comments:

moneypro said...

Please can tell you:
how can I assign this relation to "Links" table where I will insert links, title, description against a parent or child category.
Thanks

Bryan said...

You should be able to use the category table described above, which gives you the parent-child relationship for categories.

Next, you can create a separate table to store your link records. I'll call that table "links" for now - in that links table you will want to create a foreign key column, call it something like "category_id", which will reference the primary key column of the category table described in my post.

Your links table might look something like this:

--- Create Link Table
CREATE TABLE links ( link_id INT NOT NULL, title VARCHAR2(100), description VARCHAR2(2000), category_id INT NULL );

Where link_id will be the primary key column on the links table, and the category_id will be a foreign key back to the category table's primary key column.

That's just to give you an idea of how I would do it. Don't forget to add primary key and foreign key constraints too.