Re: Performance Anomaly with "col in (A, B)" vs. "col = A OR col = B" ver. 9.0.3

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
Cc: Craig James <craig_james(at)emolecules(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance Anomaly with "col in (A, B)" vs. "col = A OR col = B" ver. 9.0.3
Date: 2011-09-27 00:39:23
Message-ID: 7045.1317083963@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> writes:
> This has me wondering about putting together a maintenance/analysis tool
> that generates and captures stats from several ANALYZE runs and compares
> them to see if they're reasonably consistent. It then re-runs with
> higher targets as a one-off, again to see if the stats agree, before
> restoring the targets to defaults. The tool could crunch comparisons of
> the resulting stats and warn about tables or columns where the default
> stats targets aren't sufficient.

It would certainly be useful to have such a tool, but I suspect it's
easier said than done. The problem is to know whether the queries on
that table are particularly sensitive to having better stats. I think
we've largely solved issues having to do with the quality of the
histogram (eg, what fraction of the table has values falling into some
range), and the remaining trouble spots have to do with predicting the
frequency of specific values that are too infrequent to have made it
into the most-common-values list. Enlarging the MCV list helps not so
much by getting these long-tail values into the MCV list --- they
probably still aren't there --- as by allowing us to tighten the upper
bound on what the frequency of an unrepresented value must be. So what
you have to ask is how many queries care about that. Craig James'
example query in this thread is sort of a worst case, because the values
it's searching for are in fact not in the table at all.

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message anthony.shipman 2011-09-27 02:45:00 Re: overzealous sorting?
Previous Message Craig Ringer 2011-09-27 00:09:37 Re: Performance Anomaly with "col in (A,B)" vs. "col = A OR col = B" ver. 9.0.3