Re: [PERFORM] encouraging index-only scans

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Geoghegan <peter(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PERFORM] encouraging index-only scans
Date: 2013-09-06 02:00:43
Message-ID: 20130906020043.GH27195@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Thu, Sep 5, 2013 at 09:10:06PM -0400, Robert Haas wrote:
> On Thu, Sep 5, 2013 at 8:14 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > Actually, I now realize it is more complex than that, and worse. There
> > are several questions to study to understand when pg_class.relallvisible
> > is updated (which is used to determine if index-only scans are a good
> > optimization choice), and when VM all-visible bits are set so heap pages
> > can be skipped during index-only scans:
> >
> > 1) When are VM bits set:
> > vacuum (non-full)
> > analyze (only some random pages)
>
> Analyze doesn't set visibility-map bits. It only updates statistics
> about how many are set.

Sorry, yes you are correct.

> > The calculus we should use to determine when we need to run vacuum has
> > changed with index-only scans, and I am not sure we ever fully addressed
> > this.
>
> Yeah, we didn't. I think the hard part is figuring out what behavior
> would be best. Counting inserts as well as updates and deletes would
> be a simple approach, but I don't have much confidence in it. My
> experience is that having vacuum or analyze kick in during a bulk-load
> operation is a disaster. We'd kinda like to come up with a way to
> make vacuum run after the bulk load is complete, maybe, but how would
> we identify that time, and there are probably cases where that's not
> right either.

I am unsure how we have gone a year with index-only scans and I am just
now learning that it only works well with update/delete workloads or by
running vacuum manually. I only found this out going back over January
emails. Did other people know this? Was it not considered a serious
problem?

Well, our logic has been that vacuum is only for removing expired rows.
I think we either need to improve that, or somehow make sequential scans
update the VM map, and then find a way to trigger update of
relallvisible even without inserts.

Ideas
-----

I think we need to detect tables that do not have VM bits set and try to
determine if they should be vacuumed. If a table has most of its VM
bits set, there in need to vacuum it for VM bit setting.

Autovacuum knows how many pages are in the table via its file size, and
it can scan the VM map to see how many pages are _not_ marked
all-visible. If the VM map has many pages that are _not_ marked as
all-visible, and change count since last vacuum is low, those pages
might now be all-visible and vacuum might find them. One problem is
that a long-running transaction is not going to update relallvisible
until commit, so you might be vacuuming a table that is being modified,
e.g. bulk loads. Do we have any way of detecting if a backend is
modifying a table?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2013-09-06 03:47:04 Re: Is it necessary to rewrite table while increasing the scale of datatype numeric?
Previous Message Gavin Flower 2013-09-06 01:29:32 Re: [PERFORM] encouraging index-only scans

Browse pgsql-performance by date

  From Date Subject
Next Message Suzuki Hironobu 2013-09-06 02:17:16 Re: Question About WAL filename and its time stamp
Previous Message Gavin Flower 2013-09-06 01:29:32 Re: [PERFORM] encouraging index-only scans