From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: strange plan with bitmap heap scan and multiple partial indexes |
Date: | 2015-07-11 21:20:23 |
Message-ID: | 55A18897.5090008@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 07/11/2015 06:32 PM, Tom Lane wrote:
...
>
> Presumably, this is happening because the numbers of rows actually
> satisfying the index predicates are so small that it's a matter of
> luck whether any of them are included in ANALYZE's sample.
>
> Given this bad data for the index sizes, it's not totally surprising
> that choose_bitmap_and() does something wacko. I'm not sure whether
> we should try to make it smarter, or write this off as "garbage in,
> garbage out".
I think we should make it smarter, if possible - while this example is
somewhat artificial, partial indexes are often used exactly like this,
i.e. to index only very small subset of data. A good example may be an
index on "active invoices", i.e. invoices that were yet sorted out.
There may be a lot of invoices in the table, but only very small
fraction of them will be active (and thus in the index).
So I don't think is an artificial problem, and we should not write it
off as "garbage in".
> Another idea is to not trust any individual ANALYZE's estimate of
> the index rowcount so completely. (I'd thought that the
> moving-average logic would get applied to that, but it doesn't seem
> to be kicking in for some reason.)
>
> We could probably make this smarter if we were willing to apply the
> predicate-proof machinery in more situations; in this example, once
> we know that idx001 is applicable, we really should disregard idx002
> and idx003 altogether because their predicates are implied by
> idx001's. I've always been hesitant to do that because the cost of
> checking seemed likely to greatly outweigh the benefits. But since
> Tomas is nosing around in this territory already, maybe he'd like to
> investigate that further.
I think there are two possible approaches in general - we may improve
the statistics somehow, or we may start doing the predicate proofing.
I doubt approaching this at the statistics level alone is sufficient,
because even with statistics target 10k (i.e. the most detailed one),
the sample is still fixed-size. So there will always exist a combination
of a sufficiently large data set and selective partial index, causing
trouble with the sampling.
Moreover, I can't really think of a way to fix this at the statistics
level. Maybe there's a clever trick guarding against this particular
issue, but my personal experience is that whenever I used such a smart
hack, it eventually caused strange issues elsewhere.
So I think the predicate proofing is a better approach, but of course
the planning cost may be an issue. But maybe we can make this cheaper by
some clever tricks? For example, given two predicates A and B, it seems
that if A => B, then selectivity(A) <= selectivity(B). Could we use this
to skip some of the expensive stuff? We should have the selectivities
anyway, no?
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Joe Conway | 2015-07-11 21:21:58 | Re: more RLS oversights |
Previous Message | Tom Lane | 2015-07-11 20:28:40 | TABLESAMPLE patch is really in pretty sad shape |