Re: GUC for cleanup indexes threshold.

From: Sokolov Yura <y(dot)sokolov(at)postgrespro(dot)ru>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "Ideriha, Takeshi" <ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, pgsql-hackers-owner(at)postgresql(dot)org
Subject: Re: GUC for cleanup indexes threshold.
Date: 2017-09-22 14:15:08
Message-ID: c7d736b5ea4cda67a644a0247f1a3951@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017-09-22 16:22, Sokolov Yura wrote:
> On 2017-09-22 11:21, Masahiko Sawada wrote:
>> On Fri, Sep 22, 2017 at 4:16 PM, Kyotaro HORIGUCHI
>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> At Fri, 22 Sep 2017 15:00:20 +0900, Masahiko Sawada
>>> <sawada(dot)mshk(at)gmail(dot)com> wrote in
>>> <CAD21AoD6zgb1W6ps1aXj0CcAB_chDYiiTNtEdpMhkefGg13-GQ(at)mail(dot)gmail(dot)com>
>>>> On Tue, Sep 19, 2017 at 3:31 PM, Kyotaro HORIGUCHI
>>>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>> > I was just looking the thread since it is found left alone for a
>>>> > long time in the CF app.
>>>> >
>>>> > At Mon, 18 Sep 2017 16:35:58 -0700, Peter Geoghegan <pg(at)bowt(dot)ie> wrote in <CAH2-WzkhJhAXD+6DdBp7D8WYLfJ3D0m=AZbGsiw=USUjTmuv-g(at)mail(dot)gmail(dot)com>
>>>> >> On Wed, Apr 5, 2017 at 3:50 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>>>> >> > Hi,
>>>> >> >
>>>> >> > On 2017-04-01 03:05:07 +0900, Masahiko Sawada wrote:
>>>> >> >> On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>> >> >> [ lots of valuable discussion ]
>>>> >> >
>>>> >> > I think this patch clearly still is in the design stage, and has
>>>> >> > received plenty feedback this CF. I'll therefore move this to the next
>>>> >> > commitfest.
>>>> >>
>>>> >> Does anyone have ideas on a way forward here? I don't, but then I
>>>> >> haven't thought about it in detail in several months.
>>>> >
>>>> > Is the additional storage in metapage to store the current status
>>>> > of vaccum is still unacceptable even if it can avoid useless
>>>> > full-page scan on indexes especially for stable tables?
>>>> >
>>>> > Or, how about additional 1 bit in pg_stat_*_index to indicate
>>>> > that the index *don't* require vacuum cleanup stage. (default
>>>> > value causes cleanup)
>>>>
>>>> You meant that "the next cycle" is the lazy_cleanup_index() function
>>>> called by lazy_scan_heap()?
>>>
>>> Both finally call btvacuumscan under a certain condition, but
>>> what I meant by "the next cycle" is the lazy_cleanup_index call
>>> in the next round of vacuum since abstract layer (index am) isn't
>>> conscious of the detail of btree.
>>>
>>>> > index_bulk_delete (or ambulkdelete) returns the flag in
>>>> > IndexBulkDeleteResult then lazy_scan_heap stores the flag in
>>>> > stats and in the next cycle it is looked up to decide the
>>>> > necessity of index cleanup.
>>>> >
>>>>
>>>> Could you elaborate about this? For example in btree index, the
>>>> index
>>>> cleanup skips to scan on the index scan if index_bulk_delete has
>>>> been
>>>> called during vacuuming because stats != NULL. So I think we don't
>>>> need such a flag.
>>>
>>> The flag works so that successive two index full scans don't
>>> happen in a vacuum round. If any rows are fully deleted, just
>>> following btvacuumcleanup does nothing.
>>>
>>> I think what you wanted to solve here was the problem that
>>> index_vacuum_cleanup runs a full scan even if it ends with no
>>> actual work, when manual or anti-wraparound vacuums. (I'm
>>> getting a bit confused on this..) It is caused by using the
>>> pointer "stats" as the flag to instruct to do that. If the
>>> stats-as-a-flag worked as expected, the GUC doesn't seem to be
>>> required.
>>
>> Hmm, my proposal is like that if a table doesn't changed since the
>> previous vacuum much we skip the cleaning up index.
>>
>> If the table has at least one garbage we do the lazy_vacuum_index and
>> then IndexBulkDeleteResutl is stored, which causes to skip doing the
>> btvacuumcleanup. On the other hand, if the table doesn't have any
>> garbage but some new tuples inserted since the previous vacuum, we
>> don't do the lazy_vacuum_index but do the lazy_cleanup_index. In this
>> case, we always do the lazy_cleanup_index (i.g, we do the full scan)
>> even if only one tuple is inserted. That's why I proposed a new GUC
>> parameter which allows us to skip the lazy_cleanup_index in the case.
>>
>>>
>>> Addition to that, as Simon and Peter pointed out
>>> index_bulk_delete can leave not-fully-removed pages (so-called
>>> half-dead pages and pages that are recyclable but not registered
>>> in FSM, AFAICS) in some cases mainly by RecentGlobalXmin
>>> interlock. In this case, just inhibiting cleanup scan by a
>>> threshold lets such dangling pages persist in the index. (I
>>> conldn't make such a many dangling pages, though..)
>>>
>>> The first patch in the mail (*1) does that. It seems having some
>>> bugs, though..
>>>
>>>
>>> Since the dangling pages persist until autovacuum decided to scan
>>> the belonging table again, we should run a vacuum round (or
>>> index_vacuum_cleanup itself) even having no dead rows if we want
>>> to clean up such pages within a certain period. The second patch
>>> doesn that.
>>>
>>
>> IIUC half-dead pages are not relevant to this proposal. The proposal
>> has two problems;
>> * By skipping index cleanup we could leave recyclable pages that are
>> not marked as a recyclable.
>> * we stash an XID when a btree page is deleted, which is used to
>> determine when it's finally safe to recycle the page
>>
>> Regards,
>>
>> --
>> Masahiko Sawada
>> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
>> NTT Open Source Software Center
>
> Here is a small patch that skips scanning btree index if no pending
> deleted pages exists.
>
> It detects this situation by comparing pages_deleted with pages_free.
> If they are equal, then there is no half-deleted pages, and it is
> safe to skip next lazy scan.
>
> Flag stored in a btpo_flags. It is unset using general wal before scan.
> If no half-deleted pages found, it is set without wal (like hint bit).
> (it is safe to miss setting flag, but it is not safe to miss unsetting
> flag).
>
> This patch works correctly:
> - if rows are only inserted and never deleted, index always skips
> scanning (starting with second scan).
> - if some rows updated/deleted, then some scans are not skipped. But
> when all half-deleted pages are marked as deleted, lazy scans start to
> be skipped.
>
> Open question: what to do with index statistic? For simplicity this
> patch skips updating stats (just returns NULL from btvacuumcleanup).
> Probably, it should detect proportion of table changes, and do not
> skip scans if table grows too much.

Excuse me, I didn't mean to overwrite "last attachment" on commitfest
page.

--
Sokolov Yura
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-09-22 14:25:59 Re: ICU locales and text/char(n) SortSupport on Windows
Previous Message Tom Lane 2017-09-22 13:33:36 Re: [COMMITTERS] pgsql: Fix bool/int type confusion