Re: GiST VACUUM

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Костя Кузнецов <chapaev28(at)ya(dot)ru>
Subject: Re: GiST VACUUM
Date: 2018-07-19 12:28:05
Message-ID: 266d7c7e-7431-b6dc-e4af-7ec9f08b5e52@iki.fi
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 19/07/18 14:42, Andrey Borodin wrote:
>
> 19.07.2018, 15:20, "Heikki Linnakangas" <hlinnaka(at)iki(dot)fi>:
>>
>> On 19/07/18 13:52, Andrey Borodin wrote:
>>
>> Hi!
>>
>> 19 июля 2018 г., в 1:12, Heikki Linnakangas <hlinnaka(at)iki(dot)fi
>> <mailto:hlinnaka(at)iki(dot)fi>>
>> написал(а):
>>
>> Yeah, please, I think this is the way to go.
>>
>>
>> Here's v11 divided into proposed steps.
>>
>>
>> Thanks, one quick question:
>>
>> /* We should not unlock buffer if we are going to
>> jump left */
>> if (needScan)
>> {
>> GistBDItem *ptr = (GistBDItem *)
>> palloc(sizeof(GistBDItem));
>> ptr->buffer = buffer;
>> ptr->next = bufferStack;
>> bufferStack = ptr;
>> }
>> else
>> UnlockReleaseBuffer(buffer);
>>
>>
>> Why? I don't see any need to keep the page locked, when we "jump left".
>>
> Because it can split to the left again, given that we release lock.

Hmm. So, while we are scanning the right sibling, which was moved to
lower-numbered block because of a concurrent split, the original page is
split again? That's OK, we've already scanned all the tuples on the
original page, before we recurse to deal with the right sibling. (The
corresponding B-tree code also releases the lock on the original page
when recursing)

I did some refactoring, to bring this closer to the B-tree code, for the
sake of consistency. See attached patch. This also eliminates the 2nd
pass by gistvacuumcleanup(), in case we did that in the bulkdelete-phase
already.

There was one crucial thing missing: in the outer loop, we must ensure
that we scan all pages, even those that were added after the vacuum
started. There's a comment explaining that in btvacuumscan(). This
version fixes that.

I haven't done any testing on this. Do you have any test scripts you
could share? I think we need some repeatable tests for the concurrent
split cases. Even if it involves gdb or some other hacks that we can't
include in the regression test suite, we need something now, while we're
hacking on this.

One subtle point, that I think is OK, but gave me a pause, and probably
deserves comment somewhere: A concurrent root split can turn a leaf page
into one internal (root) page, and two new leaf pages. The new root page
is placed in the same block as the old page, while both new leaf pages
go to freshly allocated blocks. If that happens while vacuum is running,
might we miss the new leaf pages? As the code stands, we don't do the
"follow-right" dance on internal pages, so we would not recurse into the
new leaf pages. At first, I thought that's a problem, but I think we can
get away with it. The only scenario where a root split happens on a leaf
page, is when the index has exactly one page, a single leaf page. Any
subsequent root splits will split an internal page rather than a leaf
page, and we're not bothered by those. In the case that a root split
happens on a single-page index, we're OK, because we will always scan
that page either before, or after the split. If we scan the single page
before the split, we see all the leaf tuples on that page. If we scan
the single page after the split, it means that we start the scan after
the split, and we will see both leaf pages as we continue the scan.

- Heikki

Attachment Content-Type Size
0001-Physical-GiST-scan-in-VACUUM-v12.patch text/x-patch 15.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2018-07-19 12:36:17 Re: Runtime partition pruning for MergeAppend
Previous Message David Rowley 2018-07-19 12:15:05 Re: Runtime partition pruning for MergeAppend