Performance and doing USENET style threaded messages

From: Lincoln Yeoh <lyeoh(at)pop(dot)jaring(dot)my>
To: pgsql-general(at)postgresql(dot)org
Subject: Performance and doing USENET style threaded messages
Date: 2000-12-23 10:09:11
Message-ID: 3.0.5.32.20001223180911.00960e80@192.228.128.13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi (Merry Christmas and Happy New Millennium everyone!),

I've been wondering what would be the best way to do things like threaded
messages on Postgresql- e.g. where you have the usual parent-child
relationships between entities- each entity has a unique id and a parent id.

So far I'm considering two reasonable options.

<options>
Option 1 - do it the obvious "proper way" - have the application do selects
recursively.
e.g. To select child items
<pseudocode>
traverse ( id, orderspec) {
select id,subdata from tablename where parentid=id order by orderspec
for each id
display(id,subdata)
traverse(id, orderspec)

}
</pseudocode>

Option 2 - do it the kludgy limited way.
Keep the same structure as Option 1 but add a new text column I'll call
geneaology (or gene for short).

For a structure like
1
|_
| |
2 3
|_
| |
4 5

Item 4 will have gene=00000001,00000003,00000004,

Where the ids are all zeropadded _hexadecimal_ and delimited by commas (for
easier human processing if necessary).

So to select all items with parent id=3 (including item id=3) one would use
a query like:

select * from tablename where gene like '00000001,00000003,%' order by
orderspec;

Option 2.1 - use a combination of the two, e.g. use option 1 when the depth
gets crazy.

Option X - any ideas?
</options>

The trouble with option 1 is it seems it will be quite slow when you have
lots of items at significant depths.

The trouble with option 2 is the geneaology column can get quite huge (9 X
depth bytes) if the depth increases (e.g. there is a flame war ;) ). What
is the recommended max indexable text length in Postgresql 7.0? If it won't
index well then option 1 may actually be better.

If option 2 indexes well, I'm thinking of using it and limiting the depth
to say maybe a hundred. I figure in practice it's time for a new thread
once the depth is > 100 - it's mutated enough to warrant a new geneaology ;).

Oracle has a "start with" feature, but even so I'm not sure if the
performance is better that way - it may just be for application level
convenience.

Any comments or suggestions welcome.

Thanks,
Link.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Lincoln Yeoh 2000-12-23 10:14:26 Database I/O and other performance questions.
Previous Message Tom Lane 2000-12-23 05:57:36 Re: allowing users access to a trusted C function