Re: GUC for cleanup indexes threshold.

From: Sokolov Yura <y(dot)sokolov(at)postgrespro(dot)ru>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "Ideriha, Takeshi" <ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
Subject: Re: GUC for cleanup indexes threshold.
Date: 2017-09-22 13:22:16
Message-ID: c4a47caef28024ab7528946bed264058@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017-09-22 11:21, Masahiko Sawada wrote:
> On Fri, Sep 22, 2017 at 4:16 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> At Fri, 22 Sep 2017 15:00:20 +0900, Masahiko Sawada
>> <sawada(dot)mshk(at)gmail(dot)com> wrote in
>> <CAD21AoD6zgb1W6ps1aXj0CcAB_chDYiiTNtEdpMhkefGg13-GQ(at)mail(dot)gmail(dot)com>
>>> On Tue, Sep 19, 2017 at 3:31 PM, Kyotaro HORIGUCHI
>>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> > I was just looking the thread since it is found left alone for a
>>> > long time in the CF app.
>>> >
>>> > At Mon, 18 Sep 2017 16:35:58 -0700, Peter Geoghegan <pg(at)bowt(dot)ie> wrote in <CAH2-WzkhJhAXD+6DdBp7D8WYLfJ3D0m=AZbGsiw=USUjTmuv-g(at)mail(dot)gmail(dot)com>
>>> >> On Wed, Apr 5, 2017 at 3:50 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>>> >> > Hi,
>>> >> >
>>> >> > On 2017-04-01 03:05:07 +0900, Masahiko Sawada wrote:
>>> >> >> On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> >> >> [ lots of valuable discussion ]
>>> >> >
>>> >> > I think this patch clearly still is in the design stage, and has
>>> >> > received plenty feedback this CF. I'll therefore move this to the next
>>> >> > commitfest.
>>> >>
>>> >> Does anyone have ideas on a way forward here? I don't, but then I
>>> >> haven't thought about it in detail in several months.
>>> >
>>> > Is the additional storage in metapage to store the current status
>>> > of vaccum is still unacceptable even if it can avoid useless
>>> > full-page scan on indexes especially for stable tables?
>>> >
>>> > Or, how about additional 1 bit in pg_stat_*_index to indicate
>>> > that the index *don't* require vacuum cleanup stage. (default
>>> > value causes cleanup)
>>>
>>> You meant that "the next cycle" is the lazy_cleanup_index() function
>>> called by lazy_scan_heap()?
>>
>> Both finally call btvacuumscan under a certain condition, but
>> what I meant by "the next cycle" is the lazy_cleanup_index call
>> in the next round of vacuum since abstract layer (index am) isn't
>> conscious of the detail of btree.
>>
>>> > index_bulk_delete (or ambulkdelete) returns the flag in
>>> > IndexBulkDeleteResult then lazy_scan_heap stores the flag in
>>> > stats and in the next cycle it is looked up to decide the
>>> > necessity of index cleanup.
>>> >
>>>
>>> Could you elaborate about this? For example in btree index, the index
>>> cleanup skips to scan on the index scan if index_bulk_delete has been
>>> called during vacuuming because stats != NULL. So I think we don't
>>> need such a flag.
>>
>> The flag works so that successive two index full scans don't
>> happen in a vacuum round. If any rows are fully deleted, just
>> following btvacuumcleanup does nothing.
>>
>> I think what you wanted to solve here was the problem that
>> index_vacuum_cleanup runs a full scan even if it ends with no
>> actual work, when manual or anti-wraparound vacuums. (I'm
>> getting a bit confused on this..) It is caused by using the
>> pointer "stats" as the flag to instruct to do that. If the
>> stats-as-a-flag worked as expected, the GUC doesn't seem to be
>> required.
>
> Hmm, my proposal is like that if a table doesn't changed since the
> previous vacuum much we skip the cleaning up index.
>
> If the table has at least one garbage we do the lazy_vacuum_index and
> then IndexBulkDeleteResutl is stored, which causes to skip doing the
> btvacuumcleanup. On the other hand, if the table doesn't have any
> garbage but some new tuples inserted since the previous vacuum, we
> don't do the lazy_vacuum_index but do the lazy_cleanup_index. In this
> case, we always do the lazy_cleanup_index (i.g, we do the full scan)
> even if only one tuple is inserted. That's why I proposed a new GUC
> parameter which allows us to skip the lazy_cleanup_index in the case.
>
>>
>> Addition to that, as Simon and Peter pointed out
>> index_bulk_delete can leave not-fully-removed pages (so-called
>> half-dead pages and pages that are recyclable but not registered
>> in FSM, AFAICS) in some cases mainly by RecentGlobalXmin
>> interlock. In this case, just inhibiting cleanup scan by a
>> threshold lets such dangling pages persist in the index. (I
>> conldn't make such a many dangling pages, though..)
>>
>> The first patch in the mail (*1) does that. It seems having some
>> bugs, though..
>>
>>
>> Since the dangling pages persist until autovacuum decided to scan
>> the belonging table again, we should run a vacuum round (or
>> index_vacuum_cleanup itself) even having no dead rows if we want
>> to clean up such pages within a certain period. The second patch
>> doesn that.
>>
>
> IIUC half-dead pages are not relevant to this proposal. The proposal
> has two problems;
> * By skipping index cleanup we could leave recyclable pages that are
> not marked as a recyclable.
> * we stash an XID when a btree page is deleted, which is used to
> determine when it's finally safe to recycle the page
>
> Regards,
>
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center

Here is a small patch that skips scanning btree index if no pending
deleted pages exists.

It detects this situation by comparing pages_deleted with pages_free.
If they are equal, then there is no half-deleted pages, and it is
safe to skip next lazy scan.

Flag stored in a btpo_flags. It is unset using general wal before scan.
If no half-deleted pages found, it is set without wal (like hint bit).
(it is safe to miss setting flag, but it is not safe to miss unsetting
flag).

This patch works correctly:
- if rows are only inserted and never deleted, index always skips
scanning (starting with second scan).
- if some rows updated/deleted, then some scans are not skipped. But
when all half-deleted pages are marked as deleted, lazy scans start to
be skipped.

Open question: what to do with index statistic? For simplicity this
patch skips updating stats (just returns NULL from btvacuumcleanup).
Probably, it should detect proportion of table changes, and do not
skip scans if table grows too much.

--
Sokolov Yura
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company

Attachment Content-Type Size
0001-lazy-btree-scan.patch text/x-diff 4.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-09-22 13:29:14 Re: !USE_WIDE_UPPER_LOWER compile errors in v10+
Previous Message Pavel Stehule 2017-09-22 13:16:35 Re: Re: proposal - psql: possibility to specify sort for describe commands, when size is printed