Re: GUC for cleanup indexes threshold.

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "Ideriha, Takeshi" <ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
Subject: Re: GUC for cleanup indexes threshold.
Date: 2017-09-22 12:47:58
Message-ID: CAGTBQpYOc-8wBdSr-fbGuOix7ayLz=HRPpXD-H=q8+GxrdjNAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 22, 2017 at 4:46 AM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> I apologize in advance of possible silliness.
>
> At Thu, 21 Sep 2017 13:54:01 -0300, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote in <CAGTBQpYvgdqxVaiyui=BKrzw7ZZfTQi9KECUL4-Lkc2ThqX8QQ(at)mail(dot)gmail(dot)com>
>> On Tue, Sep 19, 2017 at 8:55 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>> > On Tue, Sep 19, 2017 at 4:47 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>> >> Maybe this is looking at the problem from the wrong direction.
>> >>
>> >> Why can't the page be added to the FSM immediately and the check be
>> >> done at runtime when looking for a reusable page?
>> >>
>> >> Index FSMs currently store only 0 or 255, couldn't they store 128 for
>> >> half-recyclable pages and make the caller re-check reusability before
>> >> using it?
>> >
>> > No, because it's impossible for them to know whether or not the page
>> > that their index scan just landed on recycled just a second ago, or
>> > was like this since before their xact began/snapshot was acquired.
>> >
>> > For your reference, this RecentGlobalXmin interlock stuff is what
>> > Lanin & Shasha call "The Drain Technique" within "2.5 Freeing Empty
>> > Nodes". Seems pretty hard to do it any other way.
>>
>> I don't see the difference between a vacuum run and distributed
>> maintainance at _bt_getbuf time. In fact, the code seems to be in
>> place already.
>
> The pages prohibited to register as "free" by RecentGlobalXmin
> cannot be grabbed _bt_getbuf since the page is liked from nowhere
> nor FSM doesn't offer the pages is "free".

Yes, but suppose vacuum did add them to the FSM in the first round,
but with a special marker that differentiates them from immediately
recycleable ones.

>> _bt_page_recyclable seems to prevent old transactions from treating
>> those pages as recyclable already, and the description of the
>> technique in 2.5 doesn't seem to preclude doing the drain while doing
>> other operations. In fact, Lehman even considers the possibility of
>> multiple concurrent garbage collectors.
>
> _bt_page_recyclable prevent a vacuum scan from discarding pages
> that might be looked from any active transaction, and the "drain"
> itself is a technique to prevent freeing still-active pages so a
> scan using the "drain" technique is freely executed
> simultaneously with other transactions. The paper might allow
> concurrent GCs (or vacuums) but our nbtree is saying that no
> concurrent vacuum is assumed. Er... here it is.
>
> nbtpages.c:1589: _bt_unlink_halfdead_page
> | * right. This search could fail if either the sibling or the target page
> | * was deleted by someone else meanwhile; if so, give up. (Right now,
> | * that should never happen, since page deletion is only done in VACUUM
> | * and there shouldn't be multiple VACUUMs concurrently on the same
> | * table.)

Ok, yes, but we're not talking about halfdead pages, but deleted pages
that haven't been recycled yet.

>> It's only a matter of making the page visible in the FSM in a way that
>> can be efficiently skipped if we want to go directly to a page that
>> actually CAN be recycled to avoid looping forever looking for a
>> recyclable page in _bt_getbuf. In fact, that's pretty much Lehman's
>
> Mmm. What _bt_getbuf does is recheck the page given from FSM as a
> "free page". If FSM gives no more page, it just tries to extend
> the index relation. Or am I reading you wrongly?

On non-index FSMs, you can request a page that has at least N free bytes.

Index FSMs always mark pages as fully empty or fully full, no in-betweens,
but suppose we used that capability of the data structure to mark
"maybe recycleable"
pages with 50% free space, and "surely recycleable" pages with 100% free space.

Then _bt_getbuf could request for a 50% free page a few times, check if they're
recycleable (ie: check _bt_page_recyclable), and essentially do microvacuum on
that page, and if it cannot find a recycleable page, then try again with 100%
recycleable ones.

The code is almost there, only thing missing is the distinction
between "maybe recycleable"
and "surely recycleable" pages in the index FSM.

Take this with a grain of salt, I'm not an expert on that code. But it
seems feasible to me.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-09-22 12:51:30 Re: [PATCH] Generic type subscripting
Previous Message Andrew Dunstan 2017-09-22 12:40:11 Re: visual studio 2017 build support