Re: Why is indexonlyscan so darned slow?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Joshua Berkus <josh(at)agliodbs(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-20 19:24:50
Message-ID: 7600.1337541890@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> On Thu, May 17, 2012 at 11:35 AM, Joshua Berkus <josh(at)agliodbs(dot)com> wrote:
>> That's in-RAM speed ... I ran the query twice to make sure the index was cached, and it didn't get any better. And I meant 5X per byte rather than 5X per tuple.

> Ah, OK that makes more sense. I played around with this, specifically
> count(*), quite a bit when IOS first came out, and I attributed a
> large part of the time to the code that forms a tuple out of raw
> bytes, and the code that advances the aggregate. The first one is
> probably more a per-tuple cost than per byte, and the second
> definitely is per tuple cost.
> I can't find my detailed notes from this work, so this is just from memory.

I did a quick investigation of this example with oprofile, and found
that there's not going to be any easy large improvement available.
It's basically all per-tuple CPU costs, breaking down like this:

samples % symbol name
15513 13.4664 IndexOnlyNext
10886 9.4498 index_getnext_tid
7526 6.5331 visibilitymap_test
7116 6.1772 ExecClearTuple
7054 6.1234 _bt_checkkeys
6804 5.9064 _bt_next
6344 5.5070 ExecProject
6033 5.2371 advance_aggregates
5619 4.8777 ExecScan
5331 4.6277 advance_transition_function
5202 4.5157 btgettuple
4928 4.2779 _bt_saveitem
4653 4.0391 ExecProcNode
4473 3.8829 int8inc
3404 2.9549 MemoryContextReset
3125 2.7127 _bt_readpage
2768 2.4028 FunctionCall2Coll
2278 1.9775 ExecAgg
1502 1.3038 ExecStoreVirtualTuple
1198 1.0399 BufferGetBlockNumber
1105 0.9592 ExecIndexOnlyScan
946 0.8212 hash_search_with_hash_value

A fair chunk of what's being attributed to IndexOnlyNext is actually the
inlined code for StoreIndexTuple, and that in turn is mostly the inlined
code for index_getattr. We might possibly save a bit here if we noticed
that the query doesn't actually need us to fetch the indexed columns,
but it looks like that would only buy a couple percent overall --- and
testing for that would add useless cycles to every case where we *do*
need the indexed value. So I'm afraid that it might amount to optimizing
SELECT COUNT(*) at the expense of everything else, which I'm not for.

Another possibility is to try to reduce the costs of index_getnext_tid
and FunctionCall2Coll, which are basically just trampolines to reach
btgettuple. It's not immediately obvious how to make that much better
though.

Anyway, on my machine it seems that the per-tuple CPU costs for SELECT
COUNT(*) with an index-only scan are something like 10% higher than the
per-tuple costs with a heap scan. We might get that down to roughly par
with some hacking, but it's never going to be vastly better. The
argument in favor of index-only scans is mainly about reducing I/O costs
anyway.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2012-05-20 19:30:52 Re: Remove readline notice from psql --version?
Previous Message Alvaro Herrera 2012-05-20 18:12:07 Re: read() returns ERANGE in Mac OS X