Re: [PERFORM] encouraging index-only scans

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <peter(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PERFORM] encouraging index-only scans
Date: 2013-09-08 21:05:00
Message-ID: CAMkU=1ycer3D3QrT+11s_nOig8e_fPeuv9M=9FyfGJcjEY4eNw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Thu, Sep 5, 2013 at 7:00 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Thu, Sep 5, 2013 at 09:10:06PM -0400, Robert Haas wrote:
>> On Thu, Sep 5, 2013 at 8:14 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> > Actually, I now realize it is more complex than that, and worse. There
>> > are several questions to study to understand when pg_class.relallvisible
>> > is updated (which is used to determine if index-only scans are a good
>> > optimization choice), and when VM all-visible bits are set so heap pages
>> > can be skipped during index-only scans:
>> >
>> > 1) When are VM bits set:
>> > vacuum (non-full)
>> > analyze (only some random pages)
>>
>> Analyze doesn't set visibility-map bits. It only updates statistics
>> about how many are set.
>
> Sorry, yes you are correct.
>
>> > The calculus we should use to determine when we need to run vacuum has
>> > changed with index-only scans, and I am not sure we ever fully addressed
>> > this.
>>
>> Yeah, we didn't. I think the hard part is figuring out what behavior
>> would be best. Counting inserts as well as updates and deletes would
>> be a simple approach, but I don't have much confidence in it. My
>> experience is that having vacuum or analyze kick in during a bulk-load
>> operation is a disaster. We'd kinda like to come up with a way to
>> make vacuum run after the bulk load is complete, maybe, but how would
>> we identify that time, and there are probably cases where that's not
>> right either.
>
> I am unsure how we have gone a year with index-only scans and I am just
> now learning that it only works well with update/delete workloads or by
> running vacuum manually. I only found this out going back over January
> emails. Did other people know this? Was it not considered a serious
> problem?

I thought it was well known, but maybe I was overly optimistic. I've
considered IOS to be mostly useful for data mining work on read-mostly
tables, which you would probably vacuum manually after a bulk load.

For transactional tables, I think that trying to keep the vm set-bit
density high enough would be a losing battle. If we redefined the
nature of the vm so that doing a HOT update would not clear the
visibility bit, perhaps that would change the outcome of this battle.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2013-09-08 23:19:31 Re: [rfc] overhauling pgstat.stat
Previous Message Jeff Janes 2013-09-08 21:04:19 Re: [rfc] overhauling pgstat.stat

Browse pgsql-performance by date

  From Date Subject
Next Message Amit Kapila 2013-09-09 03:49:03 Re: [PERFORM] encouraging index-only scans
Previous Message Andres Freund 2013-09-07 22:47:35 Re: [PERFORM] encouraging index-only scans