Re: Bitmap scan is undercosted?

From: pryzby(at)telsasoft(dot)com (Justin Pryzby)
To: Vitaliy Garnashevich <vgarnashevich(at)gmail(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Bitmap scan is undercosted?
Date: 2017-12-02 02:06:03
Message-ID: 20171202020603.GO18413@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Fri, Dec 01, 2017 at 05:11:04PM -0600, Justin Pryzby wrote:
> I tried to reproduce this issue and couldn't, under PG95 and 10.1:

I'm embarassed to say that I mis-read your message, despite you're amply clear
subject. You're getting a bitmap scan but you'd prefer to get an index scan.
I anticipated the opposite problem (which is what I've had issues with myself).

> On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote:
> > On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:
> > > We recently had an issue in production, where a bitmap scan was chosen
> > > instead of an index scan. Despite being 30x slower, the bitmap scan had
> > > about the same cost as the index scan.
>
> Note:
> postgres=# SELECT correlation FROM pg_stats WHERE tablename='aaa' AND attname='num';
> correlation | 0.00710112
>
> ..so this is different from the issue corrected by the patch I created while
> testing.

Actually, that the table is "not correlated" on "num" column is maybe the
primary reason why PG avoids using an index scan. It (more or less correctly)
deduces that it's going to have to "read" a large fraction of the pages (even
if only to process a small fraction of the rows), which is costly, except it's
all cached.. In your case, that overly-penalizes the index scan.

This is cost_index() and cost_bitmap_heap_scan() in costsize.c. Since the
index is uncorrelated, it's returning something close to max_IO_cost. It looks
like effective_cache_size only affects index_pages_fetched().

I'm going to try to dig some more into it. Maybe there's evidence to
re-evaluate one of these:

cost_index()
| run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
or
cost_bitmap_heap_scan()
| cost_per_page = spc_random_page_cost -
| (spc_random_page_cost - spc_seq_page_cost)
| * sqrt(pages_fetched / T);

Justin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-12-02 02:46:47 Re: [HACKERS] Parallel Hash take II
Previous Message Andres Freund 2017-12-02 00:55:21 Re: [HACKERS] Parallel Hash take II

Browse pgsql-performance by date

  From Date Subject
Next Message Roman Konoval 2017-12-02 02:34:43 Re: Bad plan for ltree predicate <@
Previous Message Vitaliy Garnashevich 2017-12-01 23:54:09 Re: Bitmap scan is undercosted?