Re: GUC for cleanup indexes threshold.

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "Ideriha, Takeshi" <ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
Subject: Re: GUC for cleanup indexes threshold.
Date: 2017-02-24 22:10:49
Message-ID: CAH2-Wzm8nLp8cbah1Pu=TvZ2OnSZ36wZZpWNGB_vVXb=vBAqBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 24, 2017 at 9:26 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I think this thread is pretty short on evidence that would let us make
> a smart decision about what to do here. I see three possibilities.
> The first is that this patch is a good idea whether we do something
> about the issue of half-dead pages or not. The second is that this
> patch is a good idea if we do something about the issue of half-dead
> pages but a bad idea if we don't. The third is that this patch is a
> bad idea whether or not we do anything about the issue of half-dead
> pages.

Half-dead pages are not really relevant to this discussion, AFAICT. I
think that both you and Simon mean "recyclable" pages. There are
several levels of indirection involved here, to keep the locking very
granular, so it gets tricky to talk about.

B-Tree page deletion is like a page split in reverse. It has a
symmetry with page splits, which have two phases (atomic operations).
There are also two phases for deletion, the first of which leaves the
target page without a downlink in its parent, and marks it half dead.
By the end of the first phase, there are still sibling pointers, so an
index scan can land on them before the second phase of deletion begins
-- they can visit a half-dead page before such time as the second
phase of deletion begins, where the sibling link goes away. So, the
sibling link isn't stale as such, but the page is still morally dead.
(Second phase is where we remove even the sibling links, and declare
it fully dead.)

Even though there are two phases of deletion, the second still occurs
immediately after the first within VACUUM. The need to have two phases
is hard to explain, so I won't try, but it suffices to say that VACUUM
does not actually ever leave a page half dead unless there is a hard
crash.

Recall that VACUUMing of a B-Tree is performed sequentially, so blocks
can be recycled without needing to be found via a downlink or sibling
link by VACUUM. What is at issue here, then, is VACUUM's degree of
"eagerness" around putting *fully* dead B-Tree pages in the FSM for
recycling. The interlock with RecentGlobalXmin is what makes it
impossible for VACUUM to generally fully delete pages, *as well as*
mark them as recyclable (put them in the FSM) all at once.

Maybe you get this already, since, as I said, the terminology is
tricky in this area, but I can't tell.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2017-02-24 22:34:39 Re: btree_gin and btree_gist for enums
Previous Message Andres Freund 2017-02-24 22:10:38 Re: PATCH: two slab-like memory allocators