Re: GiST VACUUM

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Костя Кузнецов <chapaev28(at)ya(dot)ru>
Subject: Re: GiST VACUUM
Date: 2019-03-22 08:00:05
Message-ID: 5742CFFD-3A65-4980-9091-D74BE7C6D9E3@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> 22 марта 2019 г., в 1:04, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> написал(а):
> ...
> When I started testing this, I quickly noticed that empty pages were not being deleted nearly as much as I expected. I tracked it to this check in gistdeletepage:
>
>> + if (GistFollowRight(leafPage)
>> + || GistPageGetNSN(parentPage) > GistPageGetNSN(leafPage))
>> + {
>> + /* Don't mess with a concurrent page split. */
>> + return false;
>> + }
>
> That NSN test was bogus. It prevented the leaf page from being reused, if the parent page was *ever* split after the leaf page was created. I don't see any reason to check the NSN here.
That's true. This check had sense only when parent page was not locked (and seems like comparison should be opposite). When both pages are locked, this test as no sense at all. Check was incorrectly "fixed" by me when transitioning from single-scan delete to two-scan delete during summer 2018. Just wandering how hard is it to simply delete a page...

>> Though, I'm not sure it is important for GIN. Scariest thing that can
>> happen: it will return same tid twice. But it is doing bitmap scan,
>> you cannot return same bit twice...
>
> Hmm. Could it return a completely unrelated tuple?
No, I do not think so, it will do comparisons on posting tree tuples.

> We don't always recheck the original index quals in a bitmap index scan, IIRC. Also, a search might get confused if it's descending down a posting tree, and lands on a different kind of a page, altogether.
Yes, search will return an error, user will reissue query and everything will be almost fine.

> PS. for Gist, we could almost use the LSN / NSN mechanism to detect the case that a deleted page is reused: Add a new field to the GiST page header, to store a new "deleteNSN" field. When a page is deleted, the deleted page's deleteNSN is set to the LSN of the deletion record. When the page is reused, the deleteNSN field is kept unchanged. When you follow a downlink during search, if you see that the page's deleteNSN > parent's LSN, you know that it was concurrently deleted and recycled, and should be ignored. That would allow reusing deleted pages immediately. Unfortunately that would require adding a new field to the gist page header/footer, which requires upgrade work :-(. Maybe one day, we'll bite the bullet. Something to keep in mind, if we have to change the page format anyway, for some reason.
Yeah, the same day we will get rid of invalid tuples. I can make a patch for v13. Actually, I have a lot of patches that I want in GiST in v13. Or v14.

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2019-03-22 08:05:05 Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock
Previous Message Amit Langote 2019-03-22 07:39:30 Re: speeding up planning with partitions