Re: [PERFORM] encouraging index-only scans

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PERFORM] encouraging index-only scans
Date: 2013-09-06 00:14:37
Message-ID: 20130906001437.GA29264@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Wed, Sep 4, 2013 at 04:56:55PM -0400, Bruce Momjian wrote:
> > "Add a column pg_class.relallvisible to remember the number of pages
> > that were all-visible according to the visibility map as of the last
> > VACUUM
> > (or ANALYZE, or some other operations that update pg_class.relpages).
> > Use relallvisible/relpages, instead of an arbitrary constant, to
> > estimate how many heap page fetches can be avoided during an
> > index-only scan."
> >
> > Have I missed some nuance?
>
> I am looking back at this issue now and I think you are correct. The
> commit you mention (Oct 7 2011) says ANALYZE updates the visibility map,
> and the code matches that:
>
> if (!inh)
> vac_update_relstats(onerel,
> RelationGetNumberOfBlocks(onerel),
> totalrows,
> --> visibilitymap_count(onerel),
> hasindex,
> InvalidTransactionId);
>
> so if an index scan was not being used after an ANALYZE, it isn't a bad
> allvisibile estimate but something else. This code was in PG 9.2.

Actually, I now realize it is more complex than that, and worse. There
are several questions to study to understand when pg_class.relallvisible
is updated (which is used to determine if index-only scans are a good
optimization choice), and when VM all-visible bits are set so heap pages
can be skipped during index-only scans:

1) When are VM bits set:
vacuum (non-full)
analyze (only some random pages)
2) When are massive rows added but VM bits not set:
copy
3) When are VM bits cleared:
insert/update/delete
vacuum (non-full)
4) When are VM map files cleared:
vacuum full
cluster
5) When is pg_class.relallvisible updated via a VM map file scan:
vacuum (non-full)
analyze

Vacuums run by autovacuum are driven by n_dead_tuples, which is only
update and delete. Therefore, any scenario where vacuum (non-full) is
never run will not have significant VM bits set. The only bits that
will be set will be by pages visited randomly by analyze.

The following table activities will not set proper VM bits:

vacuum full
cluster
copy
insert-only

If updates and deletes happen, there will eventually be sufficient
reason for autovacuum to vacuum the table and set proper VM bits, and
pg_class.relallvisible.

The calculus we should use to determine when we need to run vacuum has
changed with index-only scans, and I am not sure we ever fully addressed
this.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-09-06 01:10:06 Re: [PERFORM] encouraging index-only scans
Previous Message Bruce Momjian 2013-09-05 22:36:08 Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2013-09-06 01:10:06 Re: [PERFORM] encouraging index-only scans
Previous Message Tom Lane 2013-09-05 23:50:18 Re: View with and without ::text casting performs differently.