Re: index-only count(*) for indexes supporting bitmap scans

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Cc: Alexey Chernyshov <a(dot)chernyshov(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: index-only count(*) for indexes supporting bitmap scans
Date: 2017-11-01 22:06:28
Message-ID: 5974.1509573988@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru> writes:
> On 04.09.2017 15:17, Alexey Chernyshov wrote:
>> make check-world fails on contrib/postgres_fdw because of Subquery Scan on ... Probably, query plan was changed.

> Thanks for testing! This is the same problem as the one in
> 'select_distinct' I mentioned before. I changed the test, the updated
> patch is attached.

I've pushed the executor part of this, but mostly not the planner part,
because I didn't think the latter was anywhere near ready for prime
time: the regression test changes it was causing were entirely bogus.

You had basically two categories of plan changes showing up in the
regression tests. One was insertion of Subquery Scan nodes where
they hadn't been before; that was because the use_physical_tlist
change broke the optimization that removed no-op Subquery Scans.
I fixed that by narrowing the use_physical_tlist change to apply
only to BitmapHeapPath nodes, which is the only case where there
would be any benefit anyway. The remaining plan diffs after making
that change all amounted to replacing regular index-only scan plans
with bitmap scans, which seems to me to be silly on its face: if we
can use an IOS then it is unlikely that a bitmap scan will be better.
Those changes indicate that the cost adjustment you'd inserted in
cost_bitmap_heap_scan was way too optimistic, which is hardly
surprising. I think a realistic adjustment would have to account
for all or most of these factors:

* Whether the index AM is ever going to return recheck = false.
The planner has no way to know that at present, but since there are
lots of cases where it simply won't ever happen, I think assuming
that it always will is not acceptable. Perhaps we could extend the
AM API so that we could find out whether recheck would happen always,
never, or sometimes. (Doing better than "sometimes" might be too hard,
but I think most opclasses are going to be "always" or "never" anyway.)

* Whether the bitmap will become lossy, causing us to have to make
rechecks anyway. We could probably estimate that pretty well based
on comparing the initial number-of-pages estimate to work_mem.

* Whether the plan will need to fetch heap tuples to make filter-qual
checks. In principle the planner ought to know that. In practice,
right now it doesn't bother to figure out whether the qual will be
empty until createplan.c time, because that's rather a significant
amount of work and it's undesirable to expend it for paths we might
not end up using. We might be able to approximate it better than
we do now, though. It's a live problem for regular indexscan costing
as well as bitmap scans, IIRC.

* What fraction of the table is actually all-visible. You'd effectively
hard-wired that at 50%, but it's easy to do better --- the regular
IOS code does

if (indexonly)
pages_fetched = ceil(pages_fetched * (1.0 - baserel->allvisfrac));

and it would be appropriate to do the same here if we conclude that
the other gating conditions apply.

Without a patch that deals more realistically with these concerns,
I think we're better off not changing the cost estimate at all.

As far as the executor side goes, I made several cosmetic changes
and one not so cosmetic one: I changed the decision about whether
to prefetch so that it looks at whether the potential prefetch
page is all-visible. This still relies on the same assumption you
had that the recheck flag will stay the same from one page to the
next, but at least it's not assuming that the all-visible state
will stay the same.

I'm going to mark the CF entry closed, but if you feel like working
on the cost estimate business, feel free to submit another patch
for that.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gilles Darold 2017-11-01 22:13:47 Re: proposal: schema variables
Previous Message Peter Eisentraut 2017-11-01 21:05:58 Re: Custom compression methods