Re: Recursive use

From: "Jay A(dot) Kreibich" <jak(at)uiuc(dot)edu>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: Alexander Burbello <burbello3000(at)yahoo(dot)com(dot)br>, Lista Postgres <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Recursive use
Date: 2006-10-10 21:33:46
Message-ID: 20061010213346.GF21825@uiuc.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Tue, Oct 10, 2006 at 10:15:42AM -0500, Jim C. Nasby scratched on the wall:
> On Fri, Oct 06, 2006 at 10:37:26AM -0500, Jay A. Kreibich wrote:
> > These are generally referred to as "Hierarchical Queries" and center
> > around the idea of a self-referencing table (such as an employee
> > table with a "manager" field that is a FK to another row in the same
> > table). This essentially makes a tree-like structure.
> <snip>
> > As pointed out by others, the most general way to deal with this in
> > PostgreSQL is to write PL/PgSQL (or some other language) functions
> > that can generate the specific queries you need. It isn't always
> > pretty, but it can be made to work for a specific set of queries.
>
> There are also other ways to represent this type of information without
> using hierarchical queries. Joe Celko presents two methods in SQL For
> Smarties.

If you're referring to Joe's March 1996 DBMS article,
(http://www.dbmsmag.com/9603d06.html) he does demonstrate two models,
but one of them is the self-referencing table model where one column
references another column in the same table. His only suggestion for
dealing with these kinds of tables is self-joins (which I also
mentioned) but points out the obvious limitation that-- unless you go
procedural-- you have to know how many levels you're going to process
before you setup the query.

The other model that is shown (which he calls "nested-set") is
interesting, but has a lot of properties that make me uncomfortable.
(He proposes each node/row have two sequence counters ("left" and "right")
represent pre- and post-visit order in a depth-first traversal; sets
can be calculated by differences or betweens of the two values).
For one, the table requires an extreme amount of maintenance-- something
as simple as inserting a single leaf node may require updating every
row in the whole table. On average, more than half the nodes/rows will
require updating for each record insertion and removal, but it isn't clear
how this update process would work (since the sequences require a
traversal to update, but a proper traversal requires the correct
sequences). There are tricks for the simple cases, but I'm not sure
you could do an update in-place in the general case.

The representation he's chosen also introduces an ordering among siblings--
while this is a required attribute of some tree structures, in most
cases (and in the spirit of general SQL sets) the ordering of peer
nodes/rows is undefined and unimportant. This isn't exactly a flaw,
so much as an unexpected side-effect.

In theory, I agree with his assertion that a conceptual "nested sets"
approach is more SQLish (since SQL likes to deal with sets), but I don't
think the implementation he presented actually has anything to do with
sets (in the traditional sense) that are nested. The whole thing depends
on understanding traversal orderings and some of the tricks you can play
with that to indirectly define sets. I guess it all depends on how you
look at it. I personally tend to think more in C++ than SQL anyways.

I also noticed that Joe has a book out titled "Joe Celko's Trees and
Hierarchies in SQL for Smarties". I have not yet had a chance to
review this book (other than the on-line table of contents) but it
looks interesting. While much of this is on graphs and more general
edge/node structures, a fair bit of the book appears to be about this
type of tree structure. He goes into more detail on some of these
issues, such as insertion and deletion times, and tricks to play for
inserting whole sub-trees, and that kind of thing. Maybe the book
would sell the so-called "nested-set" implementation a bit better,
but it still strikes me as a solution for warehouses, not OLTP style
stuff. I might have to find this book and have a closer read.

Thanks for the reference.

> There's also the ltree module in contrib that might be of some use.

Interesting.

-j

--
Jay A. Kreibich | CommTech, Emrg Net Tech Svcs
jak(at)uiuc(dot)edu | Campus IT & Edu Svcs
<http://www.uiuc.edu/~jak> | University of Illinois at U/C

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2006-10-10 21:41:56 Re: How to remove a superuser
Previous Message Richard Ray 2006-10-10 21:07:19 Re: How to remove a superuser