Threaded Records in SQL: Advice Needed

From: "Ingram, Bryan" <BIngram(at)sixtyfootspider(dot)com>
To: pgsql-sql(at)postgersql(dot)org
Subject: Threaded Records in SQL: Advice Needed
Date: 2000-04-10 16:26:42
Message-ID: 9B7D4396307CD311809A00500415EB405D1F99@BKMAIL
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hi Everyone,

I'm planning to implement a project using a php/postgres 6.5.3 combination.

Part of what I need to do is develop a model in SQL to keep track of
"threaded records." What I mean by this, is that I need to keep track of
parent-child relationships in exactly the same way that a
threaded-discussion group does.

The design should be able to support approximately 100 users, mostly
readers, and be able to handle approximately 1 insert per minute.

So far, I've found two methods/algorithms for implenting this in SQL, but
there's a problem with each.

The first one is extremely efficient and inexpensive for selecting and
ordering all of the records in a thread (taking only one select to get all
of the records and in order), but requires that ALL records be updated on
insert, which is very expensive. Because I'm expecting about 120000 records
a year, or ca. 450 new records every business day, the prospect of each
insert updating each record in the database is not appealing. At first it
would be ok, but as the record count increased, the inserts would become
extremely slow. This method works by modeling the records as "nested sets"
and was explained by Joe Celko in one of his articles.

The second method is simply adding fields to keep track of an id, parent_id,
and root_id. Its benefit is that it is inexpensive on inserts, requiring
only one insert and no updates, but it is costly to select and order all the
records in a thread. Essentially, the only way I have found to do this is
to implement a recursive function that selects one record at a time, checks
for children, ad infinitum. Not very efficient.

My concern with method 1 is that the great expense of inserting new records
will slow the database down to unacceptable levels. This is based on my past
experience with postgres when updating a database of 1.5 million rows (which
took an hour and a half with syncing turned off.)

Method 2 is problematic because it uses recursion first of all (which I
understand to be heavy in resource usage) and because it makes a select for
each and every record in a thread to be displayed.

What I'd like to find is either a better way that I'm not aware of, or some
kind of middle ground.
Also, what do you make of these methods. Are they as inefficient as I want
to believe?

Thanks!

Bryan Ingram
Database Programmer
S!XTYFOOTSP!DER
972-492-5676
bingram(at)sixtyfootspider(dot)com <mailto:bingram(at)sixtyfootspider(dot)com>
<http://www.sixtyfootspider.com/> www.sixtyfootspider.com


Browse pgsql-sql by date

  From Date Subject
Next Message Patrick Giagnocavo 2000-04-10 18:25:39 Re: Threaded Records in SQL: Advice Needed
Previous Message Tom Lane 2000-04-10 14:03:16 Re: Function