Re: [PERFORM] Slow query: bitmap scan troubles

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PERFORM] Slow query: bitmap scan troubles
Date: 2013-01-06 16:29:17
Message-ID: CAMkU=1zYe4eujgrvBMsZLS8gjZg2hn+eWr_r8D17J-1FanzKog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Saturday, January 5, 2013, Tom Lane wrote:

> Jeff Janes <jeff(dot)janes(at)gmail(dot)com <javascript:;>> writes:
> > [moved to hackers]
> > On Wednesday, December 5, 2012, Tom Lane wrote:
> >> Hm. To tell you the truth, in October I'd completely forgotten about
> >> the January patch, and was thinking that the 1/10000 cost had a lot
> >> of history behind it. But if we never shipped it before 9.2 then of
> >> course that idea is false. Perhaps we should backpatch the log curve
> >> into 9.2 --- that would reduce the amount of differential between what
> >> 9.2 does and what previous branches do for large indexes.
>
> > I think we should backpatch it for 9.2.3. I've seen another email which
> is
> > probably due to the same issue (nested loop vs hash join). And some
> > monitoring of a database I am responsible for suggests it might be
> heading
> > in that direction as well as the size grows.
>
> I received an off-list report of a case where not only did the 1/10000
> factor cause a nestloop-vs-hashjoin decision to be made wrongly, but
> even adding the ln() computation as in commit bf01e34b556 didn't fix it.
> I believe the index in question was on the order of 20000 pages, so
> it's not too hard to see why this might be the case:
>
> * historical fudge factor 4 * 20000/100000 = 0.8
> * 9.2 fudge factor 4 * 20000/10000 = 8.0
> * with ln() correction 4 * ln(1 + 20000/10000) = 4.39 or so
>
> At this point I'm about ready to not only revert the 100000-to-10000
> change, but keep the ln() adjustment, ie make the calculation be
> random_page_cost * ln(1 + index_pages/100000). This would give
> essentially the pre-9.2 behavior for indexes up to some tens of
> thousands of pages, and keep the fudge factor from getting out of
> control even for very very large indexes.
>

Yeah, I agree that even the log function grows too rapidly, especially at
the early stages. I didn't know if a change that changes that asymptote
would be welcome in a backpatch, though.

>
> > But I am wondering if it should be present at all in 9.3. When it was
> > introduced, the argument seemed to be that smaller indexes might be
> easier
> > to keep in cache.
>
> No. The argument is that if we don't have some such correction, the
> planner is liable to believe that different-sized indexes have *exactly
> the same cost*, if a given query would fetch the same number of index
> entries.

But it seems like they very likely *do* have exactly the same cost, unless
you want to take either the CPU cost of descending the index into account,
or take cachebility into account. If they do have the same cost, why
shouldn't the estimate reflect that? Using cpu_index_tuple_cost * lg(#
index tuples) would break the tie, but by such a small amount that it would
easily get swamped by the stochastic nature of ANALYZE for nodes expected
to return more than one row.

> This is quite easy to demonstrate when experimenting with
> partial indexes, in particular - without the fudge factor the planner
> sees no advantage of a partial index over a full index from which the
> query would fetch the same number of entries. We do want the planner
> to pick the partial index if it's usable, and a fudge factor is about
> the least unprincipled way to make it do so.
>

I noticed a long time ago that ordinary index scans seemed to be preferred
over bitmap index scans with the same cost estimate, as best as I could
determine because they are tested first and the tie goes to the first one
(and there is something about it needs to be better by 1% to be counted as
better--although that part might only apply when the start-up cost and the
full cost disagree over which one is best). If I've reconstructed that
correctly, could something similar be done for partial indexes, where they
are just considered first? I guess the problem there is a index scan on a
partial index is not a separate node type from a index scan on a full
index, unlike index vs bitmap.

>
> > The argument for increasing the penalty by a factor of 10 was that the
> > smaller one could be "swamped by noise such as page-boundary-roundoff
> > behavior".
>
> Yeah, I wrote that, but in hindsight it seems like a mistaken idea.
> The noise problem is that because we round off page count and row count
> estimates to integers at various places, it's fairly easy for small
> changes in statistics to move a plan's estimated cost by significantly
> more than this fudge factor will. However, the case that the fudge
> factor is meant to fix is indexes that are otherwise identical for
> the query's purposes --- and any roundoff effects will be the same.
> (The fudge factor itself is *not* rounded off anywhere, it flows
> directly to the bottom-line cost for the indexscan.)
>

OK, and this agrees with my experience. It seemed like it was the
stochastic nature of analyze, not round off problems, that caused the plans
to go back and forth.

>
> > One thing which depends on the index size which, as far as I can tell, is
> > not currently being counted is the cost of comparing the tuples all the
> way
> > down the index. This would be proportional to log2(indextuples) *
> > cpu_index_tuple_cost, or maybe log2(indextuples) *
> > (cpu_index_tuple_cost+cpu_operator_cost), or something like that.
>
> Yeah, I know. I've experimented repeatedly over the years with trying
> to account explicitly for index descent costs. But every time, anything
> that looks even remotely principled turns out to produce an overly large
> correction that results in bad plan choices. I don't know exactly why
> this is, but it's true.
>

log2(indextuples) * cpu_index_tuple_cost should produce pretty darn small
corrections, at least if cost parameters are at the defaults. Do you
remember if that one of the ones you tried?

>
> One other point is that I think it is better for any such correction
> to depend on the index's total page count, not total tuple count,
> because otherwise two indexes that are identical except for bloat
> effects will appear to have identical costs.

This isn't so. A bloated index will be estimated to visit more pages than
an otherwise identical non-bloated index, and so have a higher cost.

jeff=# create table bar as select * from generate_series(1,1000000);
jeff=# create index foo1 on bar (generate_series);
jeff=# create index foo2 on bar (generate_series);
jeff=# delete from bar where generate_series %100 !=0;
jeff=# reindex index foo1;
jeff=# analyze ;
jeff=# explain select count(*) from bar where generate_series between 6 and
60;
QUERY PLAN
--------------------------------------------------------------------------
Aggregate (cost=8.27..8.28 rows=1 width=0)
-> Index Scan using foo1 on bar (cost=0.00..8.27 rows=1 width=0)
Index Cond: ((generate_series >= 6) AND (generate_series <= 60))
(3 rows)

jeff=# begin; drop index foo1; explain select count(*) from bar where
generate_series between 6 and 600; rollback;
QUERY PLAN
---------------------------------------------------------------------------
Aggregate (cost=14.47..14.48 rows=1 width=0)
-> Index Scan using foo2 on bar (cost=0.00..14.46 rows=5 width=0)
Index Cond: ((generate_series >= 6) AND (generate_series <= 600))
(3 rows)

This is due to this in genericcostestimate

if (index->pages > 1 && index->tuples > 1)
numIndexPages = ceil(numIndexTuples * index->pages / index->tuples);

If the index is bloated (or just has wider index tuples), index->pages will
go up but index->tuples will not.

If it is just a partial index, however, then both will go down together and
it will not be counted as a benefit from being smaller.

For the bloated index, this correction might even be too harsh. If the
index is bloated by having lots of mostly-empty pages, then this seems
fair. If it is bloated by having lots of entirely empty pages that are not
even linked into the tree, then those empty ones will never be visited and
so it shouldn't be penalized.

Worse, this over-punishment of bloat is more likely to penalize partial
indexes. Since they are vacuumed on the table's schedule, not their own
schedule, they likely get vacuumed less often relative to the amount of
turn-over they experience and so have higher steady-state bloat. (I'm
assuming the partial index is on the particularly hot rows, which I would
expect is how partial indexes would generally be used)

This extra bloat was one of the reasons the partial index was avoided in
"Why does the query planner use two full indexes, when a dedicated partial
index exists?"

So from that standpoint,
> the ln() form of the fudge factor seems quite reasonable as a crude form
> of index descent cost estimate. The fact that we're needing to dial
> it down so much reinforces my feeling that descent costs are close to
> negligible in practice.
>

If they are negligible, why do we really care that it use a partial index
vs a full index? It seems like the only reason we would care is
cacheability. Unfortunately we don't have any infrastructure to model that
directly.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-01-06 18:18:03 Re: [PERFORM] Slow query: bitmap scan troubles
Previous Message Tomas Vondra 2013-01-06 12:18:29 Re: too much pgbench init output

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2013-01-06 18:18:03 Re: [PERFORM] Slow query: bitmap scan troubles
Previous Message AJ Weber 2013-01-06 15:27:01 Re: Partition table in 9.0.x?