slow view

From: "Stuart" <smcg2297(at)frii(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: slow view
Date: 2006-10-12 01:18:55
Message-ID: egk560$p53$1@sea.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I am having a problem with the performance of a
view that will be a critical part of a database system
I am working on, and would appreciate some advice.
Apologies for the length of this posting!

I have a parent table P and to child tables A and B,:

CREATE TABLE p (
id INT NOT NULL PRIMARY KEY);
CREATE TABLE a (
id INT NOT NULL PRIMARY KEY,
p INT REFERENCES p(id));
CREATE INDEX ON a(p);
CREATE TABLE b (
id INT NOT NULL PRIMARY KEY,
p INT REFERENCES p(id));
CREATE INDEX ON b(p);

Each "p" row has between 1 and 5 (or so)
child rows in the "a" table, and between 0 and 4
rows in the "b" table.

Now for most p's the a's and the b's are independent,
and all combinations of a's and b's for that p are ok.
But for a small percentage of p's (<5%) there are some
combinations of a's and b's are distinguished (I will call
them "invalid").
So I created a table to record these "invalid" combinations:

CREATE TABLE x (
a INT NOT NULL REFERENCES a(id),
b INT NOT NULL REFERENCES b(id),
PRIMARY KEY (a,b));
CREATE INDEX ON x(a);
CREATE INDEX ON x(b);

Here is some sample data with a single p:

# Create one parent item...
INSERT INTO p VALUES(1)

# Create 4 a-items for that parent...
INSERT INTO a VALUES(1,1)
INSERT INTO a VALUES(2,1)
INSERT INTO a VALUES(3,1)
INSERT INTO a VALUES(4,1)

# Create 3 b-items for that parent...
INSERT INTO b VALUES(11,1)
INSERT INTO b VALUES(12,1)
INSERT INTO b VALUES(13,1)

So for parent p=1, there are 12 combinations
of a and b items (each of the 4 a items can be
paired with any of the 3 b items).
Now, make some combinations of a items
and b items "invalid"...

# For a=2, make b=13 invalid, i.e only b=11 and b=12 are valid.
INSERT INTO x VALUES(2,13)
# For a=3, only b=11 is valid.
INSERT INTO x VALUES(3,12)
INSERT INTO x VALUES(3,13)
# For a=4, no b's are valid.
INSERT INTO x VALUES(4,11)
INSERT INTO x VALUES(4,12)
INSERT INTO p VALUES(4,13)

Now I need a view that will display, for each p, its
a's and for each a, only the valid b's, that is, the
combinations of a and b that are *not* in table x.
OK, no problem...

(#1)
SELECT p.id AS pid, a.id AS aid, b.id AS bid
FROM p
JOIN a ON a.p=p.id
LEFT JOIN b ON b.p=a.p
LEFT JOIN x ON x.a=a.id AND x.b=b.id
WHERE x.a IS NULL
AND p.id=1;

Here is the result on the data given above:
pid | aid | bid
-----+-----+-----
1 | 1 | 11
1 | 1 | 12
1 | 1 | 13
1 | 2 | 11
1 | 2 | 13
1 | 3 | 11
1 | 3 | 12

Ok, but I want all a's in the output, even when they
have no valid b's. aid=4 is not in the output.
So I did what I thought was the obvious answer, a
left join between a, and the above query...

(#2)
SELECT p.id AS pid, a.id AS aid, sub.bid AS bid
FROM p
JOIN a ON a.p=p.id
LEFT JOIN (
SELECT a.id AS aid, b.id as bid
FROM a
LEFT JOIN b ON b.p=a.p
LEFT JOIN x ON x.a=a.id AND x.b=b.id
WHERE x.a IS NULL
) AS sub ON sub.aid=a.id
WHERE p.id=1;

Results:
pid | aid | bid
-----+-----+-----
1 | 1 | 11
1 | 1 | 12
1 | 1 | 13
1 | 2 | 11
1 | 2 | 13
1 | 3 | 11
1 | 3 | 12
1 | 4 |

Exactly what I want.

The problem is that when there are ~100K parent entries
the above query (#2) takes ~10 seconds to run but the first
query (#1) runs in a few tens of milliseconds.
Is there any way I can get postgresql to better optimize
query #2, or rewrite it to that is is more "postgresql friendly"?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If it helps, here is a python script I used to generate enough
pseudo-data to show the time difference...

#!/usr/bin/python
import psycopg2, random

def main():
cn = psycopg2.connect (database="test",
user="postgres",password="")
c = cn.cursor()
pkp = 1; pka = 1; pkb = 1;
while pkp < 30000:
c.execute ("INSERT INTO p VALUES(%s)", (pkp,))
na = random.randint(1,5)
for a in range(na):
c.execute ("INSERT INTO a VALUES(%s,%s)", (pka+a,pkp))
nb = random.randint(0,4)
for b in range(nb):
c.execute ("INSERT INTO b VALUES(%s,%s)", (pkb+b,pkp))
if na*nb > 1 and random.randint (0,99) < 10:
zlst = [(a,b) for a in range(pka,pka+na)
for b in range(pkb,pkb+nb)]
for z in random.sample (zlst, random.randint(1,na*nb-1)):
c.execute ("INSERT INTO x VALUES(%s,%s)", z)
pkp += 1; pka += na; pkb += nb
cn.commit()

if __name__ == '__main__': main ()

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Stuart McGraw 2006-10-12 04:15:58 Re: slow view
Previous Message Daniel Drotos 2006-10-11 19:31:37 Re: deleting rows in specific order