Quick Links

Re: index-only scans

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: index-only scans
Date:	2011-10-08 01:23:52
Message-ID:	CA+TgmoaN6pE3fmEna6SKLHjcV5TXzisk26P88edctQJXzA-fQQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Oct 7, 2011 at 8:14 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> 1. The way that nodeIndexscan.c builds up the faux heap tuple is
>> perhaps susceptible to improvement. I thought about building a
>> virtual tuple, but then what do I do with an OID column, if I have
>> one? Or maybe this should be done some other way altogether.
>
> I switched it to use a virtual tuple for now, and just not attempt to
> use index-only scans if a system column is required. We're likely to
> want to rethink this anyway, because as currently constituted the code
> can't do anything with an expression index, and avoiding recalculation
> of an expensive function could be a nice win. But the approach of
> just building a faux heap tuple fundamentally doesn't work for that.

Figuring out how to fix that problem likely requires more knowledge of
the executor than I have got.

>> 2. Suppose we scan one tuple on a not-all-visible page followed by 99
>> tuples on all-visible pages. The code as written will hold the pin on
>> the first heap page for the entire scan. As soon as we hit the end of
>> the scan or another tuple where we have to actually visit the page,
>> the old pin will be released, but until then we hold onto it.
>
> I did not do anything about this issue --- ISTM it needs performance
> testing.

I'm actually less worried about any performance problem than I am
about the possibility of holding up VACUUM. That can happen the old
way, too, but now the pin could stay on the same page for quite a
while even when the scan is advancing.

I think we maybe ought to think seriously about solving the problem at
the other end, though - either make VACUUM skip pages that it can't
get a cleanup lock on without blocking (except in anti-wraparound
mode) or have it just do the amount of work that can be done with an
exclusive lock (i.e. prune but not defragment, which would work even
in anti-wraparound mode). That would solve the problems of (1)
undetected VACUUM deadlock vs. a buffer pin, (2) indefinite VACUUM
stall due to a suspended query, and (3) this issue.

>> 3. The code in create_index_path() builds up a bitmapset of heap
>> attributes that get used for any purpose anywhere in the query, and
>> hangs it on the RelOptInfo so it doesn't need to be rebuilt for every
>> index under consideration. However, if it were somehow possible to
>> have the rel involved without using any attributes at all, we'd
>> rebuild the cache over and over, since it would never become non-NULL.
>
> I dealt with this by the expedient of getting rid of the caching ;-).
> It's not clear to me that it was worth the trouble, and in any case
> it's fundamentally wrong to suppose that every index faces the same
> set of attributes it must supply. It should not need to supply columns
> that are only needed in indexquals or index predicate conditions.
> I'm not sure how to deal with those refinements cheaply enough, but
> the cache isn't helping.

Oh, hmm.

>> 4. There are a couple of cases that use index-only scans even though
>> the EXPLAIN output sort of makes it look like they shouldn't. For
>> example, in the above queries, an index-only scan is chosen even
>> though the query does "SELECT *" from the table being scanned. Even
>> though the EXPLAIN (VERBOSE) output makes it look otherwise, it seems
>> that the target list of an EXISTS query is in fact discarded, e.g.:
>
> The reason it looks that way is that we're choosing to use a "physical
> result tuple" to avoid an ExecProject step at runtime. There's nothing
> wrong with the logic, it's just that EXPLAIN shows something that might
> mislead people.

I wonder if we oughta do something about that.

I was also thinking we should probably make EXPLAIN ANALYZE display
the number of heap fetches, so that you can see how index-only your
index-only scan actually was.

>> 5. We haven't made any planner changes at all, not even for cost
>> estimation. It is not clear to me what the right way to do cost
>> estimation here is.
>
> Yeah, me either. For the moment I put in a hard-wired estimate that
> only 90% of the heap pages would actually get fetched. This is
> conservative and only meant to ensure that the planner picks an
> index-only-capable plan over an indexscan with a non-covering index,
> all else being equal. We'll need to do performance testing before
> we can refine that.

Yep.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: index-only scans at 2011-10-08 00:14:53 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	2011-10-08 01:59:40	Re: Why does WAL_DEBUG macro need to be defined by default?
Previous Message	Robert Haas	2011-10-08 01:10:53	Re: Why does WAL_DEBUG macro need to be defined by default?