Re: [HACKERS] Tree type, how best to impliment?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Terry Mackintosh <terry(at)terrym(dot)com>
Cc: PostgreSQL-development <hackers(at)postgreSQL(dot)org>
Subject: Re: [HACKERS] Tree type, how best to impliment?
Date: 1998-11-17 23:03:38
Message-ID: 4058.911343818@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Terry Mackintosh <terry(at)terrym(dot)com> writes:
> On Tue, 17 Nov 1998, Tom Lane wrote:
>> Why is "category, pcatid" unique? This seems to constrain a parent
>> to have only one child per category value --- is that what you want?

> Yes.
> It is very much like a directory structure, one would not want two
> directories of the same name both off the same point in the file system.

Precisely...

>> If so, why not use the category code as the ID suffix, and not have to
>> bother with maintaining a next-ID counter?

> category is human readable, for display, the id is not, and when deciding
> what the next child's name should be, if not for the next-ID one would
> have to go count all the other records that have the same parent.

Why do you need a next ID at all? Think directory structure. Using
your example, this is perfectly valid:

Things
/ \
Big Small
/ \
Cars Boats

The index will work just as well (probably better) with key strings
like "Things/Big/Small" as with key strings like "1.1.2". Moreover,
you don't need a separate index to enforce uniqueness, and you don't
need to update the parent row when adding a child.

You do need invented IDs if the category items are not necessarily
unique, but it seems your problem does not need that, so why complicate
the concept?

> Yes, I just was not sure how well indexes work with text fields?

I use 'em all the time...

> I had also though about one field only (text), where the categories would
> be all chained together delimited with slashes and have a PRIMARY KEY
> index. That would automate by design all of the above problems. But it
> would creat new ones:-) Like for many deep records, the table would be
> much bigger.

If your higher-level nodes have long category names, then the space
savings from using an ID instead of a category name might become
interesting. But if you were willing to use a fixed-size char(255)
(in fact two of 'em!) in every tuple for IDs, I don't think you can
complain about the average space cost of this approach...

Another way to approach this is to give each node a unique serial
number, and use the serial number as the child's back-link:

CREATE TABLE tab (
nodeID int4 serial primary key,
parentID int4 not null, -- nodeID of parent node
category text not null,
unique (parentID, category)
);

(I may not have the "serial" syntax quite right, but you get the idea.)
This approach is very compact, but a row's position in the hierarchy
is represented only indirectly --- you have to chase the parentID links
if you need to build the full pathname given just the nodeID. You can't
lookup a row directly by its "category/subcategory/subsubcategory" path
either; you need a query for each level in order to fetch the parent
IDs. But you needed that in your original design too, since there's no
other way to get the numeric IDs for subcategories.

A further space saving is to use the Postgres OIDs as the row
identifiers, but that makes it difficult to dump and reload the table,
so I don't recommend it.

regards, tom lane

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 1998-11-17 23:19:43 Re: [HACKERS] building 6.4 on sunos 4.1.4
Previous Message Terry Mackintosh 1998-11-17 20:34:05 SPI function help needed.