Re: Query involving views

From: Laurent Martelli <laurent(at)aopsys(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query involving views
Date: 2004-06-06 21:12:07
Message-ID: 87llj0tmqg.fsf@stan.aopsys.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

Tom> Laurent Martelli <laurent(at)aopsys(dot)com> writes:
>> Now, if I use the following view to abstract access rights:

>> CREATE VIEW userpictures (
>> PictureID,RollID,FrameID,Description,Filename,
>> Owner,EntryDate,Date,
>> NbClick,NbRates,MaxRate,MinRate,AverageRate,SumRates, UserID) AS
>> SELECT DISTINCT ON (Permissions.PictureID,UserID)
>> Pictures.PictureID,RollID,FrameID,Description,Filename,Owner,
>> EntryDate,Date,NbClick,NbRates,MaxRate,MinRate,AverageRate,SumRates,
>> UserID FROM Permissions JOIN Groupsdef using (GroupID) JOIN
>> pictures using (PictureID);

>> [ performance sucks ]

Tom> Find a way to get rid of the DISTINCT ON. That's essentially
Tom> an optimization fence. Worse, the way you are using it here,
Tom> it doesn't even give well-defined results, since there's no
Tom> ORDER BY constraining which row will be selected out of a set
Tom> of duplicates. (I think it may not matter to you, since you
Tom> don't really care which groupsdef row is selected,

That's true. I do not use columns from groupsdef in the end.

Tom> but in general a view constructed like this is broken.)

Tom> It might work to do the view as

Tom> SELECT ... all that stuff ... FROM pictures p, users u WHERE
Tom> EXISTS (SELECT 1 FROM permissions prm, groupsdef g WHERE
Tom> p.pictureid = prm.pictureid AND prm.groupid = g.groupid AND
Tom> g.userid = u.userid);

Tom> I'm not sure offhand about the performance properties of this
Tom> either, but it would be worth trying.

This one does not yield very good performance. In fact, the best
performances I have is when I use a where clause like this one:

WHERE PictureID IN
(SELECT PictureID FROM permissions JOIN groupsdef USING(GroupID)
WHERE groupsdef.UserID=2)

But it's not as elegant to write as the initial view using "distinct
on". I could create a view like this:

CREATE VIEW userpictures (PictureID,UserID)
AS SELECT pictureid,userid
FROM permissions JOIN groupsdef USING(GroupID)

and then do queries like this:

SELECT * FROM pictures
WHERE PictureID IN (SELECT PictureID FROM userpictures WHERE UserID=2)

but it's stillnot as elegant as

SELECT * FROM userpictures WHERE UserID=2

I think I'll try a function:

CREATE FUNCTION picturesID(int) RETURNS SETOF int AS '
SELECT PictureID FROM permissions JOIN groupsdef USING(GroupID)
WHERE groupsdef.UserID=$1
' LANGUAGE sql;

SELECT * FROM pictures WHERE PictureID IN (select * from picturesID(2));

Here's something funny: using a function seems gives slihtly better results
than inlining the query (I did a dozen of runs and the timings were consistent):

SELECT * FROM pictures WHERE PictureID IN (select * from picturesID(2));
QUERY PLAN
Hash Join (cost=15.50..100.49 rows=200 width=97) (actual time=28.609..46.568 rows=2906 loops=1)
Hash Cond: ("outer".pictureid = "inner".picturesid)
-> Seq Scan on pictures (cost=0.00..68.33 rows=2933 width=97) (actual time=0.018..2.610 rows=2933 loops=1)
-> Hash (cost=15.00..15.00 rows=200 width=4) (actual time=28.467..28.467 rows=0 loops=1)
-> HashAggregate (cost=15.00..15.00 rows=200 width=4) (actual time=23.698..26.201 rows=2906 loops=1)
-> Function Scan on picturesid (cost=0.00..12.50 rows=1000 width=4) (actual time=16.202..19.952 rows=5076 loops=1)
Total runtime: 48.601 ms

SELECT * FROM pictures WHERE PictureID IN (
SELECT PictureID FROM permissions JOIN groupsdef USING(GroupID)
WHERE groupsdef.UserID=2);
QUERY PLAN

Hash Join (cost=394.93..504.24 rows=2632 width=97) (actual time=35.770..53.574 rows=2906 loops=1)
Hash Cond: ("outer".pictureid = "inner".pictureid)
-> Seq Scan on pictures (cost=0.00..68.33 rows=2933 width=97) (actual time=0.014..2.543 rows=2933 loops=1)
-> Hash (cost=388.35..388.35 rows=2632 width=4) (actual time=35.626..35.626 rows=0 loops=1)
-> HashAggregate (cost=388.35..388.35 rows=2632 width=4) (actual time=30.988..33.502 rows=2906 loops=1)
-> Merge Join (cost=5.40..376.72 rows=4652 width=4) (actual time=0.247..26.628 rows=5076 loops=1)
Merge Cond: ("outer".groupid = "inner".groupid)
-> Index Scan using permissions_groupid_key on permissions (cost=0.00..280.77 rows=8305 width=8) (actual time=0.031..11.629 rows=7633 loops=1)
-> Sort (cost=5.40..5.43 rows=12 width=4) (actual time=0.207..1.720 rows=5078 loops=1)
Sort Key: groupsdef.groupid
-> Seq Scan on groupsdef (cost=0.00..5.19 rows=12 width=4) (actual time=0.030..0.182 rows=11 loops=1)
Filter: (userid = 2)
Total runtime: 54.748 ms

Tom> A cruder answer is just to accept that the view may give you
Tom> multiple hits, and put the DISTINCT in the top-level query.

I thought of that. But it has the drawback that if you use an ORDER
BY, you must have the same columns in the DISTINCT.

Tom> I think though that in the long run you're going to need to
Tom> rethink this representation of permissions. It's nice and
Tom> simple but it's not going to scale well. Even your "fast"
Tom> query is going to look like a dog once you get to many
Tom> thousands of permission entries.

Tom> It might work to maintain a derived table (basically a
Tom> materialized view) of the form (userid, groupid, pictureid)
Tom> signifying that a user can access a picture through membership
Tom> in a group. Put a nonunique index on (userid, pictureid) on
Tom> it. This could then drive the EXISTS test efficiently.

I'll probably do that if perf goes down when the database grows
bigger.

Thanks for all the advice.

Best regards,
Laurent

--
Laurent Martelli
laurent(at)aopsys(dot)com Java Aspect Components
http://www.aopsys.com/ http://jac.objectweb.org

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Markus Schaber 2004-06-06 22:08:00 Re: Using a COPY...FROM through JDBC?
Previous Message Tom Lane 2004-06-06 16:22:25 Re: Query involving views