Re: Why is indexonlyscan so darned slow?

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-22 16:33:32
Message-ID: CAMkU=1wL75zCo40EHyV9h6SX-bUE4yajPsiRGUHRuTr6n-Fnwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 21, 2012 at 2:29 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Mon, May 21, 2012 at 4:17 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>
>> For vaguely real life, take your example of pgbench -i -s200 -F 50,
>> and I have 2Gig RAM, which seems to be the same as you do.
>>
>> With select only work load (pgbench -S -M prepared -T 30), I get
>>
>> tps = 193
>>
>> But now enable index-only scans:
>>
>> psql -c "create index on pgbench_accounts(aid, abalance);"
>>
>> and it goes up to.
>>
>> tps = 10137
>
> Right -- the main driver here is that your index fits neatly in ram
> and the heap does not -- so you're effectively measuring the
> difference between a buffered and non-buffered access.  That's highly
> contrived as you noted and unlikely to come up all *that* often in the
> real world.

I don't think this one is highly contrived. With the index being 10
fold smaller than the table, there is plenty of window for one to fit
in RAM and the other one not to in a variety of real world situations.
(Although I only get 10 fold window because of -F 50. I don't why
Josh had that big of a different in his original, unless he also used
a nondefault fill factor setting. But of course many real world
tables will be wider than pgbench_accounts is.)

The highly contrived example useful for dissecting the implementation
would be to do:

set enable_seqscan=off;
set enable_indexonlyscan=off;
select count(*) from pgbench_accounts where aid is not null;

The key that I keep forgetting is the "where aid is not null'.
Without that it uses the full table scan, even with enable_seqscan
off, rather than doing an ordinary index scan.

> Generally though the real world wins (although the gains will be
> generally less spectacular) are heavily i/o bound queries where the
> indexed subset of data you want is nicely packed and the (non
> clustered) heap records are all over the place.  By skipping the semi
> random heap lookups you can see enormous speedups.  I figure 50-90%
> improvement would be the norm there, but this is against queries that
> are taking forever, being i/o bound.
>
> See here: http://www.devheads.net/database/postgresql/performance/index-all-necessary-columns-postgres-vs-mssql.htm
> for a 'in the wild' gripe about about not having index scans.

But without scripts to recreate the data with the right selectivities
and correlations, and to generate a long stream of appropriate query
parameterizations so that they don't become cached, that is just a
gripe and not an example.

I tried to reproduce the problem as stated, and couldn't make IOS be
useful because I couldn't make it be slow even without them.
Presumably I'm doing something wrong, but how could I tell what? Have
we heard back on whether IOS was tried and proved useful to the
originator of that thread?

If you want an example where neither index nor table fit in memory,
then just bump up the scale to 2000---and get a machine with only 2G
of memory if you don't already have one :)

With the extra index in place:

with enable_indexonlyscan=off
tps = 155.1
with enable_indexonlyscan=on
tps = 453.7

It seems like that should have only doubled, I'm not sure why it did
more than double. Maybe the index became better cached when the table
stopped competing with it for buffers.

If you leave the select-only world and go back to doing updates, then
that extra index is doing to hurt you somewhat, but if the dominant
bottleneck is rpm of your WAL drive, it might not be noticeable.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-05-22 16:39:16 Re: Readme of Buffer Management seems to have wrong sentence
Previous Message David E. Wheeler 2012-05-22 16:28:40 Re: Exclusion Constraints on Arrays?