Re: Making a tree with "millions and millions" of dynamic nodes

From: joe(dot)celko(at)northface(dot)edu (--CELKO--)
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Making a tree with "millions and millions" of dynamic nodes
Date: 2003-12-10 21:19:00
Message-ID: a264e7ea.0312101319.7de8adb7@posting.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>> The depth should not exceed 6 or 7 levels. The initial import will
have about 6 million leaves, and 3 million branches. I would expect
the leaves to grow significantly, in number easily tripling. <<

That is not too bad -- depth would worry me more than breadth

>> 1. path generation speed <<

That is pretty easy in a nested set model:

SELECT T1.node, (T1.rgt-T1.lft) AS depth
FROM Tree AS T1, Tree AS T2
WHERE T2.lft BETWEEN T1.lft AND T1.rgt
AND T2.node = :my_guy
ORDER BY depth;

The greater (T1.rgt-T1.lft) is, the closer the node is to the root

You can also convert depth into a sequential number by using.

SELECT T2.node, COUNT (T1.node)AS depth
FROM Tree AS T1, Tree AS T2
WHERE T2.lft BETWEEN T1.lft AND T1.rgt
GROUP BY T2.part;

>> 2. immediate sibling speed <<

SELECT B.node AS boss, P.node
FROM Tree AS P
LEFT OUTER JOIN
Tree AS B
ON B.lft
= (SELECT MAX(lft)
FROM Tree AS S
WHERE P.lft > S.lft
ND P.lft <S.rgt);

>> 3. children count speed <<

SELECT :my_guy, (rgt - lft +1)/2 AS subtree_size
FROM Tree
WHERE node = :my_guy;

>> I have implemented a first run using Joe Celko's Nested Sets (w/ a
mod to store tree level for speed). The update propigation issue is
the achilles heel of this approach for me. <<

It is not as bad as people first think. The tree structure is in one
table and the nodes are in another, so you can manipulating of two
integers and one key (perhaps another integer). Since new siblings
are added to the right side of the existing line of siblings under the
parent, you average a bit less than half of the nodes being scanned in
practice.

There are some other tricks, like leaving a large gap between (lft)
and (rgt) numbers so you have room to insert new nodes.

>> So it seems Materialized Path is my only option, however I am
concerned about LIKE performance for the right hand side of the tree,
where the path is 8 digits x 6 levels = 48 chars. Should I be
concerned? <<

It does not sound too bad, assuming an ordered index. I wonder if you
could improve performance of the LIKE predicates by reversing the path
string ...

>> ... some reassuring words about real-time performance of a 20
million row tree, i'd love to hear <<

Sorry, the most I have played with is a few hundred thousand rows with
the nested set model.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Joe Conway 2003-12-10 21:39:13 Re: Strange permission problem regarding pg_settings
Previous Message Jeff Eckermann 2003-12-10 21:12:34 Re: PostgreSQL Training