Question about ltree....

From: "Greg Saylor" <gsaylor(at)integrated-support(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Question about ltree....
Date: 2004-10-26 13:31:46
Message-ID: 20041026133146.12893@mail.net-virtual.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

First let me thank Mike for his earlier suggestion that I consider using
ltree - it really is working very nicely!...

I do have a question about performance/design though...

I have a table like this:

CREATE TABLE sometable (
id SERIAL,
category0 LTREE,
category1 LTREE,
category2 LTREE
);

CREATE INDEX sometable_category0 ON sometable USING gist(category0) WHERE
category0 IS NOT NULL;
CREATE INDEX sometable_category1 ON sometable USING gist(category1) WHERE
category1 IS NOT NULL;
CREATE INDEX sometable_category2 ON sometable USING gist(category2) WHERE
category2 IS NOT NULL;

... The idea is that every row in sometable can be assigned to 1, 2, or 3
categories of a users choosing....

I essentially have categories like this:
a
a.a
a.a.a
a.a.a.a
a.a.a.b
a.a.a.c
a.a.b
a.a.b.a
a.a.b.b

... so the categories are essentially 3 levels deep (not counting the top
level) - in my test case I have about 4300 rows of data in this table.

Unfortunately, queries against this table can be a bit inefficient, for
example, if I want to get a count of all of items within a category I
have do something like:

SELECT count(*) AS count FROM
(SELECT category0 AS category FROM sometable WHERE category0 IS NOT NULL
UNION SELECT category1 AS category FROM sometable WHERE category1
IS NOT NULL
UNION SELECT category2 AS category FROM sometable WHERE category2
IS NOT NULL) AS b
WHERE b.category <@ 'a.b';

If I want to retrieve rows which fit into one or more of the categories,
I have to do something like:

SELECT * FROM sometable WHERE category0 <@ 'a.b' OR category1 <@ 'a.b' OR
category2 <@ 'a.c';

.. What is interesting is that this does a sequential scan:

SELECT * FROM sometable WHERE category0 <@ 'a.b';

.. This on the other hand does an index scan, despite the fact that every
row IS NOT NULL and it is about 10 times faster:

SELECT * FROM sometable WHERE category0 <@ 'a.b' AND category0 IS NOT NULL;

.. But, as soon as I add a second OR clause the planner decides to start
doing sequential scans again, despite the fact all category1 columns are NULL:

SELECT * FROM sometable WHERE (category0 <@ 'a.b' AND category0 IS NOT
NULL) OR (category1 <@ 'a.b' AND category1 IS NOT NULL);

Is there anything which jumps out as being something that I could do to
improve the schema design or the performance of the two queries I
identified above - as these get executed very often any performance
improvement would dramatically boost performance...

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tino Wildenhain 2004-10-26 13:38:07 Re: Strange count(*) implementation?
Previous Message Valentin Militaru 2004-10-26 13:27:49 Re: Any plans on allowing user-defined triggers to