From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: index-only scans |
Date: | 2011-10-08 01:23:52 |
Message-ID: | CA+TgmoaN6pE3fmEna6SKLHjcV5TXzisk26P88edctQJXzA-fQQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Oct 7, 2011 at 8:14 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> 1. The way that nodeIndexscan.c builds up the faux heap tuple is
>> perhaps susceptible to improvement. I thought about building a
>> virtual tuple, but then what do I do with an OID column, if I have
>> one? Or maybe this should be done some other way altogether.
>
> I switched it to use a virtual tuple for now, and just not attempt to
> use index-only scans if a system column is required. We're likely to
> want to rethink this anyway, because as currently constituted the code
> can't do anything with an expression index, and avoiding recalculation
> of an expensive function could be a nice win. But the approach of
> just building a faux heap tuple fundamentally doesn't work for that.
Figuring out how to fix that problem likely requires more knowledge of
the executor than I have got.
>> 2. Suppose we scan one tuple on a not-all-visible page followed by 99
>> tuples on all-visible pages. The code as written will hold the pin on
>> the first heap page for the entire scan. As soon as we hit the end of
>> the scan or another tuple where we have to actually visit the page,
>> the old pin will be released, but until then we hold onto it.
>
> I did not do anything about this issue --- ISTM it needs performance
> testing.
I'm actually less worried about any performance problem than I am
about the possibility of holding up VACUUM. That can happen the old
way, too, but now the pin could stay on the same page for quite a
while even when the scan is advancing.
I think we maybe ought to think seriously about solving the problem at
the other end, though - either make VACUUM skip pages that it can't
get a cleanup lock on without blocking (except in anti-wraparound
mode) or have it just do the amount of work that can be done with an
exclusive lock (i.e. prune but not defragment, which would work even
in anti-wraparound mode). That would solve the problems of (1)
undetected VACUUM deadlock vs. a buffer pin, (2) indefinite VACUUM
stall due to a suspended query, and (3) this issue.
>> 3. The code in create_index_path() builds up a bitmapset of heap
>> attributes that get used for any purpose anywhere in the query, and
>> hangs it on the RelOptInfo so it doesn't need to be rebuilt for every
>> index under consideration. However, if it were somehow possible to
>> have the rel involved without using any attributes at all, we'd
>> rebuild the cache over and over, since it would never become non-NULL.
>
> I dealt with this by the expedient of getting rid of the caching ;-).
> It's not clear to me that it was worth the trouble, and in any case
> it's fundamentally wrong to suppose that every index faces the same
> set of attributes it must supply. It should not need to supply columns
> that are only needed in indexquals or index predicate conditions.
> I'm not sure how to deal with those refinements cheaply enough, but
> the cache isn't helping.
Oh, hmm.
>> 4. There are a couple of cases that use index-only scans even though
>> the EXPLAIN output sort of makes it look like they shouldn't. For
>> example, in the above queries, an index-only scan is chosen even
>> though the query does "SELECT *" from the table being scanned. Even
>> though the EXPLAIN (VERBOSE) output makes it look otherwise, it seems
>> that the target list of an EXISTS query is in fact discarded, e.g.:
>
> The reason it looks that way is that we're choosing to use a "physical
> result tuple" to avoid an ExecProject step at runtime. There's nothing
> wrong with the logic, it's just that EXPLAIN shows something that might
> mislead people.
I wonder if we oughta do something about that.
I was also thinking we should probably make EXPLAIN ANALYZE display
the number of heap fetches, so that you can see how index-only your
index-only scan actually was.
>> 5. We haven't made any planner changes at all, not even for cost
>> estimation. It is not clear to me what the right way to do cost
>> estimation here is.
>
> Yeah, me either. For the moment I put in a hard-wired estimate that
> only 90% of the heap pages would actually get fetched. This is
> conservative and only meant to ensure that the planner picks an
> index-only-capable plan over an indexscan with a non-covering index,
> all else being equal. We'll need to do performance testing before
> we can refine that.
Yep.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2011-10-08 01:59:40 | Re: Why does WAL_DEBUG macro need to be defined by default? |
Previous Message | Robert Haas | 2011-10-08 01:10:53 | Re: Why does WAL_DEBUG macro need to be defined by default? |