Re: [PERFORM] encouraging index-only scans

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Geoghegan <peter(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PERFORM] encouraging index-only scans
Date: 2014-02-11 19:13:00
Message-ID: CAMkU=1xK7yL9SNH2Xprnws2C-fMtgBd07eAQcpDD8Zhj9LJtUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Tue, Feb 11, 2014 at 9:12 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> On Tue, Feb 11, 2014 at 11:28:36AM -0500, Robert Haas wrote:
> > A sequential scan will set hint bits and will prune the page, but
> > pruning the page doesn't ever mark it all-visible; that logic is
> > entirely in vacuum. If that could be made cheap enough to be
> > negligible, it might well be worth doing in heap_page_prune(). I
> > think there might be a way to do that, but it's a bit tricky because
> > the pruning logic iterates over the page in a somewhat complex way,
> > not just a straightforward scan of all the item pointers the way the
> > existing logic doesn't. It would be pretty cool if we could just use
> > a bit out of the heap-prune xlog record to indicate whether the
> > all-visible bit should be set; then we'd gain the benefit of marking
> > things all-visible much more often without needing vacuum.
> >
> > That doesn't help insert-only tables much, though, because those won't
> > require pruning. We set hint bits (which dirties the page) but
> > currently don't write WAL. We'd have to change that to set the
> > all-visible bit when scanning such a table, and that would be
> > expensive. :-(
>
> Yes, that pretty much sums it up. We introduced index-only scans in 9.2
> (2012) but they still seem to be not usable for insert-only workloads
> two years later. Based on current progress, it doesn't look like this
> will be corrected until 9.5 (2015). I am kind of confused why this has
> not generated more urgency.
>

For insert and select only, they are usable (if your queries are of the
type that could benefit from them), you just have to do some manual
intervention. The list of features that sometimes require a DBA to do
something to make maximum use of them under some circumstance would be a
long one. It would be nice if it were better, but I don't see why this
feature is particularly urgent compared to all the other things that could
be improved. In particular I think the Freezing without IO is much more
important. Freezing is rather unimportant until suddenly it is is the most
important thing in the universe. If we could stop worrying about that, I
think it would free up other aspects of vacuum scheduling to have more
meddling/optimization done to it.

>
> I guess my question is what approach do we want to take to fixing this?
> If we are doing pruning, aren't we emitting WAL? You are right that for
> an insert-only workload, we aren't going to prune, but if pruning WAL
> overhead is acceptable for a sequential scan, isn't index-only
> page-all-visible WAL overhead acceptable?
>

We often don't find that pruning particularly acceptable in seq scans, and
there is a patch pending to conditionally turn it off for them.

>
> Do we want to track the number of inserts in statistics and trigger an
> auto-vacuum after a specified number of inserts?

We track relpages and relallvisible, which seems like a more direct
measure. Once analyze is done (which is already triggered by inserts) and
sets those, it could fire a vacuum based on the ratio of those values, or
the autovac process could just look at the ratio after naptime. So just
introduce autovacuum_vacuum_visible_factor. A problem there is that it
would be a lot of work to aggressively keep the ratio high, and pointless
if the types of queries done on that table don't benefit from IOS anyway,
or if pages are dirtied so rapidly that no amount of vacuuming will keep
the ratio high. Would we try to automatically tell which tables were
which, or rely on the DBA setting per-table
autovacuum_vacuum_visible_factor for tables that differ from the database
norm?

> The problem there is
> that we really don't need to do any index cleanup, which is what vacuum
> typically does --- we just want to scan the table and set the
> all-visible bits, so that approach seems non-optimal.
>

In the case of no updates or deletes (or aborted inserts?), there would be
nothing to clean up in the indexes and that step would be skipped (already
in the current code). And if the indexes do need cleaning up, we certainly
can't set the page all visible without doing that clean up.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-02-11 19:42:00 Re: PostgreSQL Failback without rebuild
Previous Message Bruce Momjian 2014-02-11 19:04:35 Re: [PERFORM] encouraging index-only scans

Browse pgsql-performance by date

  From Date Subject
Next Message Claudio Freire 2014-02-11 19:51:36 Re: [PERFORM] encouraging index-only scans
Previous Message Bruce Momjian 2014-02-11 19:04:35 Re: [PERFORM] encouraging index-only scans