Re: Query performance with disabled hashjoin and mergejoin

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ivan Voras <ivoras(at)freebsd(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Query performance with disabled hashjoin and mergejoin
Date: 2011-03-07 20:40:48
Message-ID: AANLkTinansJ5OyCbfpceDF6WxJ4A=U_+h3B_ak2X7KHz@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Feb 22, 2011 at 9:07 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Feb 4, 2011 at 7:08 AM, Ivan Voras <ivoras(at)freebsd(dot)org> wrote:
>>                                 ->  BitmapAnd  (cost=1282.94..1282.94
>> rows=1430 width=0) (actual time=5.508..5.508 rows=0 loops=1)
>>                                       ->  Bitmap Index Scan on
>> news_index_layout_id_state  (cost=0.00..150.14 rows=2587 width=0) (actual
>> time=0.909..0.909 rows=3464 loops=1)
>>                                             Index Cond: ((layout_id = 8980)
>> AND (state = 2))
>>                                       ->  BitmapOr (cost=1132.20..1132.20
>> rows=20127 width=0) (actual time=4.136..4.136 rows=0 loops=1)
>>                                             ->  Bitmap Index Scan on
>> news_visible_from  (cost=0.00..1122.09 rows=19976 width=0) (actual
>> time=3.367..3.367 rows=19932 loops=1)
>>                                                   Index Cond: (visible_from
>> IS NULL)
>>                                             ->  Bitmap Index Scan on
>> news_visible_to  (cost=0.00..9.40 rows=151 width=0) (actual
>> time=0.766..0.766 rows=43 loops=1)
>>                                                   Index Cond: (1296806570 <=
>> visible_to)
>
> I think this part of the query is the problem.  Since the planner
> doesn't support cross-column statistics, it can't spot the correlation
> between these different search conditions, resulting in a badly broken
> selectivity estimate.
>
> Sometimes you can work around this by adding a single column, computed
> with a trigger, that contains enough information to test the whole
> WHERE-clause condition using a single indexable test against the
> column value.  Or sometimes you can get around it by partitioning the
> data into multiple tables, say with the visible_from IS NULL rows in a
> different table from the rest.

Why should you need cross column statistics for this case? You should
be able to multiple selectivity from left to right as long as you are
doing equality comparisons, yes?

Right now the planner is treating
select * from foo where (a,b,c) between (1,1,1) and (9,9,9) the same
(using selectivity on a) as
select * from foo where (a,b,c) between (1,1,5) and (1,1,7)

but they are not the same. since in the second query terms a,b are
equal, shouldn't you able to multiply the selectivity through?

merlin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Andreas Forø Tollefsen 2011-03-07 21:49:48 Re: Performance issues
Previous Message Merlin Moncure 2011-03-07 20:27:24 Re: Performance trouble finding records through related records