[mig@utdt.edu: Re: Threaded Records in SQL: Advice Needed]

From: mig(at)utdt(dot)edu
To: pgsql-sql(at)postgresql(dot)org
Subject: [mig@utdt.edu: Re: Threaded Records in SQL: Advice Needed]
Date: 2000-04-10 18:45:13
Message-ID: 200004101845.PAA01725@ant.utdt
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

------- Start of forwarded message -------
Date: 10 Apr 2000 15:41:55 -0300
From: <mig(at)utdt(dot)edu>
To: BIngram(at)sixtyfootspider(dot)com
CC: pgsql-sql(at)postgersql(dot)org
In-reply-to: <9B7D4396307CD311809A00500415EB405D1F99(at)BKMAIL>
(BIngram(at)sixtyfootspider(dot)com)
Subject: Re: [SQL] Threaded Records in SQL: Advice Needed

One solution would be to use "structured id_fields" for your
messages. What I mean is that the unique id of the 5th reply to
message (parent_id) is (parent_id).5

Then:
* root messages have ids with a single "field" (e.g., 25 for the 25th
root)
* replies to a root message have ids with two "fields" (e.g., 25.7 for
the seventh reply to root message 25)
* and so on, having for instance the id 25.7.19.2 for the second
reply to the nineteenth reply to the seventh reply to root message
25

Remark that it is easy to compute the correct id at insert time.

The lexicographic ordering of these ids is the correct one for your
purposes; if you have an unique index on the id field, selecting
ordered by this field should be fast.

If you want the subtree of the thread starting at root message 25,
you just
"select ... where id like '25.%' order by id asc"

The only problem I can see is the variable length of the id records,
which (in principle) can grow without limit. You could of course use
more symbols than just digits following this same principle ... Also,
you could implement a structured id using two database fields; for
instance,
root_id integer
thread_id (this method)
If you have indexes on root_id and on thread_id, and also an unique
index on (root_id,thread_id), searches should be fast enough. This
variant gives you an enormous number of possible threads (integer),
with a possible limitation in the number of answers in each thread. It
also improves the speed of selecting a single thread as it avoids the
search using like:
"select ... where root_id=25 order by thread_id asc"

I am no expert in these things! I would like to know if this kind of
solution is good enough for your purposes ... and if not, where it is
lacking - so that I learn from it.

Miguel Sofer

>From: "Ingram, Bryan" <BIngram(at)sixtyfootspider(dot)com>
>Date: Mon, 10 Apr 2000 11:26:42 -0500
>Content-Type: text/plain;
> charset="iso-8859-1"
>X-Mailing-List: pgsql-sql(at)postgresql(dot)org
>Precedence: bulk
>Sender: pgsql-sql-owner(at)hub(dot)org
>
>Hi Everyone,
>
>I'm planning to implement a project using a php/postgres 6.5.3 combination.
>
>Part of what I need to do is develop a model in SQL to keep track of
>"threaded records." What I mean by this, is that I need to keep track of
>parent-child relationships in exactly the same way that a
>threaded-discussion group does.
>
>The design should be able to support approximately 100 users, mostly
>readers, and be able to handle approximately 1 insert per minute.
>
>So far, I've found two methods/algorithms for implenting this in SQL, but
>there's a problem with each.
>
>The first one is extremely efficient and inexpensive for selecting and
>ordering all of the records in a thread (taking only one select to get all
>of the records and in order), but requires that ALL records be updated on
>insert, which is very expensive. Because I'm expecting about 120000 records
>a year, or ca. 450 new records every business day, the prospect of each
>insert updating each record in the database is not appealing. At first it
>would be ok, but as the record count increased, the inserts would become
>extremely slow. This method works by modeling the records as "nested sets"
>and was explained by Joe Celko in one of his articles.
>
>The second method is simply adding fields to keep track of an id, parent_id,
>and root_id. Its benefit is that it is inexpensive on inserts, requiring
>only one insert and no updates, but it is costly to select and order all the
>records in a thread. Essentially, the only way I have found to do this is
>to implement a recursive function that selects one record at a time, checks
>for children, ad infinitum. Not very efficient.
>
>My concern with method 1 is that the great expense of inserting new records
>will slow the database down to unacceptable levels. This is based on my past
>experience with postgres when updating a database of 1.5 million rows (which
>took an hour and a half with syncing turned off.)
>
>Method 2 is problematic because it uses recursion first of all (which I
>understand to be heavy in resource usage) and because it makes a select for
>each and every record in a thread to be displayed.
>
>What I'd like to find is either a better way that I'm not aware of, or some
>kind of middle ground.
>Also, what do you make of these methods. Are they as inefficient as I want
>to believe?
>
>Thanks!
>
>Bryan Ingram
>Database Programmer
>S!XTYFOOTSP!DER
>972-492-5676
>bingram(at)sixtyfootspider(dot)com <mailto:bingram(at)sixtyfootspider(dot)com>
> <http://www.sixtyfootspider.com/> www.sixtyfootspider.com
>
>
>
>
------- End of forwarded message -------

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2000-04-10 19:50:27 Re: How to do this in PostgreSQL?
Previous Message Patrick Giagnocavo 2000-04-10 18:25:39 Re: Threaded Records in SQL: Advice Needed