Re: Planner selects slow "Bitmap Heap Scan" when "Index Scan" is faster

From: Kim Hansen <kim(at)rthansen(dot)dk>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Planner selects slow "Bitmap Heap Scan" when "Index Scan" is faster
Date: 2012-04-10 09:55:46
Message-ID: CAEGYRW5ZGZ_WipDBXV6K-61FwfOuPmY5_07tApnU=-BcFc0Ujg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Apr 10, 2012 at 04:59, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Fri, Apr 6, 2012 at 3:09 PM, Kim Hansen <kim(at)rthansen(dot)dk> wrote:
>
>> I have run the queries a few times in order to warm up the caches, the
>> queries stabilise on 20ms and 180ms.
>
> My first curiosity is not why the estimate is too good for Bitmap
> Index Scan, but rather why the actual execution is too poor.  As far
> as I can see the only explanation for the poor execution is that the
> bitmap scan has gone lossy, so that every tuple in every touched block
> needs to be rechecked against the where clause.  If that is the case,
> it suggests that your work_mem is quite small.
>
> In 9.2, explain analyze will report the number of tuples filtered out
> by rechecking, but that isn't reported in your version.
>
> It looks like the planner makes no attempt to predict when a bitmap
> scan will go lossy and then penalize it for the extra rechecks it will
> do.  Since it doesn't know it will be carrying out those extra checks,
> you can't just increase the tuple or operator costs factors.

You are right, when I increase the work_mem from 1MB to 2MB the time
decreases from 180ms to 30ms for the slow query. I have now configured
the server to 10MB work_mem.

> So that may explain why the bitmap is not getting penalized for its
> extra CPU time.  But that doesn't explain why the estimated cost is
> substantially lower than the index scan.  That is probably because the
> bitmap scan expects it is doing more sequential IO and less random IO.
>  You could cancel that advantage be setting random_page_cost to about
> the same as seq_page_cost (which since you indicated most data will be
> cached, would be an appropriate thing to do regardless of this
> specific issue).

I have set seq_page_cost and random_page_cost to 0.1 in order to
indicate that data is cached, the system now selects the faster index
scan.

Thanks for your help,
--
Kim Rydhof Thor Hansen
Vadgårdsvej 3, 2. tv.
2860 Søborg
Phone: +45 3091 2437

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2012-04-10 13:21:56 Re: about multiprocessingmassdata
Previous Message Istvan Endredy 2012-04-10 07:19:49 Re: bad planning with 75% effective_cache_size