Quick Links

Re: strange plan with bitmap heap scan and multiple partial indexes

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: strange plan with bitmap heap scan and multiple partial indexes
Date:	2015-07-11 21:20:23
Message-ID:	55A18897.5090008@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 07/11/2015 06:32 PM, Tom Lane wrote:
...
>
> Presumably, this is happening because the numbers of rows actually
> satisfying the index predicates are so small that it's a matter of
> luck whether any of them are included in ANALYZE's sample.
>
> Given this bad data for the index sizes, it's not totally surprising
> that choose_bitmap_and() does something wacko. I'm not sure whether
> we should try to make it smarter, or write this off as "garbage in,
> garbage out".

I think we should make it smarter, if possible - while this example is
somewhat artificial, partial indexes are often used exactly like this,
i.e. to index only very small subset of data. A good example may be an
index on "active invoices", i.e. invoices that were yet sorted out.
There may be a lot of invoices in the table, but only very small
fraction of them will be active (and thus in the index).

So I don't think is an artificial problem, and we should not write it
off as "garbage in".

> Another idea is to not trust any individual ANALYZE's estimate of
> the index rowcount so completely. (I'd thought that the
> moving-average logic would get applied to that, but it doesn't seem
> to be kicking in for some reason.)
>
> We could probably make this smarter if we were willing to apply the
> predicate-proof machinery in more situations; in this example, once
> we know that idx001 is applicable, we really should disregard idx002
> and idx003 altogether because their predicates are implied by
> idx001's. I've always been hesitant to do that because the cost of
> checking seemed likely to greatly outweigh the benefits. But since
> Tomas is nosing around in this territory already, maybe he'd like to
> investigate that further.

I think there are two possible approaches in general - we may improve
the statistics somehow, or we may start doing the predicate proofing.

I doubt approaching this at the statistics level alone is sufficient,
because even with statistics target 10k (i.e. the most detailed one),
the sample is still fixed-size. So there will always exist a combination
of a sufficiently large data set and selective partial index, causing
trouble with the sampling.

Moreover, I can't really think of a way to fix this at the statistics
level. Maybe there's a clever trick guarding against this particular
issue, but my personal experience is that whenever I used such a smart
hack, it eventually caused strange issues elsewhere.

So I think the predicate proofing is a better approach, but of course
the planning cost may be an issue. But maybe we can make this cheaper by
some clever tricks? For example, given two predicates A and B, it seems
that if A => B, then selectivity(A) <= selectivity(B). Could we use this
to skip some of the expensive stuff? We should have the selectivities
anyway, no?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Re: strange plan with bitmap heap scan and multiple partial indexes at 2015-07-11 16:32:48 from Tom Lane

Responses

Re: strange plan with bitmap heap scan and multiple partial indexes at 2015-07-11 21:40:02 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Joe Conway	2015-07-11 21:21:58	Re: more RLS oversights
Previous Message	Tom Lane	2015-07-11 20:28:40	TABLESAMPLE patch is really in pretty sad shape