Still more sanity checking in gincostestimate()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: Andy Colson <andy(at)squeakycode(dot)net>
Subject: Still more sanity checking in gincostestimate()
Date: 2015-12-31 01:21:30
Message-ID: 18501.1451524890@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I looked into the problem reported in
http://www.postgresql.org/message-id/flat/56830EA5(dot)7080907(at)squeakycode(dot)net
which briefly is that gincostestimate can produce ridicuously large index
scan cost estimates for partial-match queries.

It appears that there are two basic problems contributing to this:

1. gincostestimate will scale up the statistics found in the index
metapage, no matter what, as long as they're not zero (but it doesn't
insist on nEntries being > 0). If the stats date back to the index being
empty or nearly so, we can come out with some page counts that are pretty
silly, and we can also come out with numEntries = 0, which we then clamp
to 1, but this is still orders of magnitude off of reality.

2. The partial-match logic in gincost_pattern is pretty bogus and can
produce partialEntries values that are not very realistic.

The direct cause of the bad cost estimate is that some of the cost factors
get multiplied by partialEntries / numEntries, which can be quite a lot
larger than one, a scaling that was certainly never intended.

The attached proposed patch deals with this first by not applying the
scaling logic if it would be scaling up the index size more than 4X;
if it would be, we fall back to the rule-of-thumb estimates I
introduced recently in commit 7fb008c5ee59b040. Secondly, we clamp
the partialEntries / numEntries ratio to not more than 1.

This is obviously all a bit ad-hoc, but it seems less likely to produce
insane estimates than what's there now. On the other hand, those
rule-of-thumb estimates are too new to have any field experience behind
them, so maybe doubling down on our dependence on them isn't bright.

Comments?

regards, tom lane

Attachment Content-Type Size
more-gin-costing-defenses.patch text/x-diff 6.2 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2015-12-31 01:26:58 Re: Revisiting pg_stat_statements and IN() (Was: Re: pg_stat_statements fingerprinting logic and ArrayExpr)
Previous Message Peter Geoghegan 2015-12-31 01:21:05 Re: Revisiting pg_stat_statements and IN() (Was: Re: pg_stat_statements fingerprinting logic and ArrayExpr)