Re: Index only scan sometimes switches to sequential scan for small amount of rows

From: Feike Steenbergen <feikesteenbergen(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Index only scan sometimes switches to sequential scan for small amount of rows
Date: 2015-03-25 20:00:44
Message-ID: CAK_s-G27W_WuB-OsEcTiacwhk8i764pQFHMjOKSBnyz+N27T3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 25 March 2015 at 19:07, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> Also, I doubt that that is the problem in the first place. If you collect a
> sample of 30,000 (which the default target size of 100 does), and the
> frequency of the second most common is really 0.00307333 at the time you
> sampled it, you would expect to find it 92 times in the sample. The chances
> against actually finding 1 instead of around 92 due to sampling error are
> astronomical.

It can be that the distribution of values is very volatile; we hope
the increased stats target (from the default=100 to 1000 for this
column) and frequent autovacuum and autoanalyze helps in keeping the
estimates correct.

It seems that it did find some other records (<> 'PRINTED), as is
demonstrated in the stats where there was only one value in the MCV
list: the frequency was 0.996567 and the fraction of nulls was 0,
therefore leaving 0.03+ for other values. But because none of them
were in the MCV and MCF list, they were all treated as equals. They
are certainly not equal.

I not know why some values were found (they are mentioned in the
histogram_bounds), but are not part of the MCV list, as you say, the
likeliness of only 1 item being found is very small.

Does anyone know the criteria for a value to be included in the MCV list?

> The problem seems to be rapidly changing stats, not too small of a target
> size (unless your original target size was way below the current default
> value, forgive me if you already reported that, I didn't see it anywhere).
> Maybe it would work better if you built the partial index where status =
> 'NOT_YET_PRINTED', instead of !='PRINTED'.

Thanks, we did create a partial index on 'NOT_YET_PRINTED' today to
help aiding these kind of queries.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2015-03-25 21:45:47 Re: Index only scan sometimes switches to sequential scan for small amount of rows
Previous Message Jeff Janes 2015-03-25 18:07:12 Re: Index only scan sometimes switches to sequential scan for small amount of rows