Re: poor execution plan because column dependence

From: Václav Ovsík <vaclav(dot)ovsik(at)i(dot)cz>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: poor execution plan because column dependence
Date: 2011-04-14 08:11:52
Message-ID: 20110414081152.GB11328@bobek.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Apr 13, 2011 at 12:24:06PM -0400, Tom Lane wrote:
> > Interesting the original index tickets5 is still used for
> > int4eq(main.effectiveid, main.id), no need to build a different.
>
> Well, no, it won't be. This hack is entirely dependent on the fact that
> the optimizer mostly works with operator expressions, and is blind to
> the fact that the underlying functions are really the same thing.
> (Which is something I'd like to see fixed someday, but in the meantime
> it gives you an escape hatch.) If you use the int4eq() construct in a
> context where you'd like to see it transformed into an index qual, it
> won't be. For this particular case that doesn't matter because there's
> no use in using an index for that clause anyway. But you'll need to be
> very careful that your changes in the query generator don't result in
> using int4eq() in any contexts other than the "main.EffectiveId=main.id"
> check.

Sorry I'm not certain understand your paragraph completely...

I perfectly understand the fact that change from
A = B into int4eq(A, B)
stopped bad estimate and execution plan is corrected, but that can
change someday in the future.

I'm not certain about your sentence touching int4eq() and index. The
execution plan as show in my previous mail contains information about
using index tickets5:

...
-> Index Scan using tickets5 on tickets main (cost=0.00..4.38 rows=1 width=162) (actual time=0.006..0.006 rows=0 loops=15593)
Index Cond: (main.id = transactions_1.objectid)
Filter: (((main.status)::text <> 'deleted'::text) AND (main.lastupdated > '2008-12-31 23:00:00'::timestamp without time zone) AND (main.created > '2005-12-31 23:00:00'::timestamp without time zone) AND int4eq(main.effectiveid, main.id) AND (main.queue = 15) AND ((main.type)::text = 'ticket'::text) AND ((main.status)::text = 'resolved'::text))
...

Filter condition contains int4eq(main.effectiveid, main.id) and tickets5
is: "tickets5" btree (id, effectiveid)

That means tickets5 index was used for int4eq(main.effectiveid, main.id).
Is it right? Or am I something missing?

Well the index will not be used generally probably, because of
selectivity of int4eq() you mention (33%). The planner thinks it is
better to use seq scan then. I tried this now.

I did hack for this particular case only:

diff --git a/local/lib/DBIx/SearchBuilder.pm b/local/lib/DBIx/SearchBuilder.pm
index f3ee1e1..9e3a6a6 100644
--- a/local/lib/DBIx/SearchBuilder.pm
+++ b/local/lib/DBIx/SearchBuilder.pm
@@ -1040,7 +1040,9 @@ sub _CompileGenericRestrictions {
$result .= ' '. $entry . ' ';
}
else {
- $result .= join ' ', @{$entry}{qw(field op value)};
+ my $term = join ' ', @{$entry}{qw(field op value)};
+ $term =~ s/^(main|Tickets_\d+)\.(EffectiveId) = (\1)\.(id)$/int4eq($1.$2, $3.$4)/i;
+ $result .= $term;
}
}
$result .= ')';

It works as expected.
Thanks
Best Regards
--
Zito

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message tv 2011-04-14 08:23:26 Re: Performance
Previous Message Claudio Freire 2011-04-14 06:49:56 Re: Performance