Re: left join + case - how is it processed?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Chris <dmagick(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: left join + case - how is it processed?
Date: 2009-01-19 04:17:00
Message-ID: 603c8f070901182017h788b5964y94ef9955e9db60b4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sun, Jan 18, 2009 at 10:30 PM, Chris <dmagick(at)gmail(dot)com> wrote:
> Hi all,
>
> I have a view that looks like this:
>
> SELECT
> CASE
> WHEN r.assetid IS NULL THEN p.assetid
> ELSE r.assetid
> END AS assetid,
> CASE
> WHEN r.userid IS NULL THEN p.userid
> ELSE r.userid
> END AS userid, p.permission, p."granted", p.cascades
> FROM sq_ast_perm p
> LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND
> r.assetid::text = p.assetid::text;
>
>
> It was pointed out to me that the first CASE is useless (since r.assetid
> will always be the same as p.assetid because of the left join condition) so
> I'm looking at that to see if it'll make much of a difference and it does.
>
> I won't post the whole lot but the first line is the most interesting.
>
> # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748';
>
>
> Merge Left Join (cost=9529.34..13823.76 rows=75721 width=102) (actual
> time=284.371..341.536 rows=1 loops=1)
>
> (The row count is right - it's the total # of rows from sq_ast_perm).

The row count is VERY WRONG. Apparently the actual number of rows is
1 and the estimate is 75721: that's bad.

> When I change the view to be:
>
> SELECT p.assetid,
> CASE
> WHEN r.userid IS NULL THEN p.userid
> ELSE r.userid
> END AS userid, p.permission, p."granted", p.cascades
> FROM sq_ast_perm p
> LEFT JOIN sq_vw_ast_role r ON r.roleid::text = p.userid::text AND
> r.assetid::text = p.assetid::text;
>
>
> The Merge left join only returns 3 rows:
>
> # explain analyze SELECT * from sq_vw_ast_perm where assetid='30748';
>
> Merge Left Join (cost=9507.18..9508.23 rows=3 width=70) (actual
> time=11.544..11.549 rows=1 loops=1)
>
> I thought the where condition would cut down on the rows returned, then the
> case statement would take effect to do the null check. It seems to be doing
> it in reverse ??

The ESTIMATE is 3 rows - the actual rows are 1, just as before.
Notice this is a much more accurate estimate: that's good.

The reason why the CASE is affecting your query planning is because
you are using a query that compares assetid to a constant:

SELECT * from sq_vw_ast_perm where assetid='30748';

When PostgreSQL evaluates this statement, assetid gets expanded either
into a case statement (with your first view definition) or into
sq_ast_perm.assetid (with your second view definition). The latter
definition allows PostgreSQL to make use of the column statistics
(which are pretty accurate) whereas the former is probably leading to
a SWAG, because PostgreSQL isn't very good at estimating the
selectivity of CASE. The bad selectivity estimate, in turn, is
leading to a poor plan choice...

...Robert

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Evan Carroll 2009-01-19 04:18:00 Re: left join + case - how is it processed?
Previous Message Chris 2009-01-19 03:30:47 left join + case - how is it processed?