|From:||Andrey Borodin <x4mmm(at)yandex-team(dot)ru>|
|To:||Heikki Linnakangas <hlinnaka(at)iki(dot)fi>|
|Cc:||Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Костя Кузнецов <chapaev28(at)ya(dot)ru>|
|Subject:||Re: GiST VACUUM|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
> 14 июля 2018 г., в 0:28, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> написал(а):
> On 13/07/18 21:28, Andrey Borodin wrote:
>>> 13 июля 2018 г., в 18:25, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
>>> Looking at the second patch, to scan the GiST index in physical
>>> order, that seems totally unsafe, if there are any concurrent page
>>> splits. In the logical scan, pushStackIfSplited() deals with that,
>>> by comparing the page's NSN with the parent's LSN. But I don't see
>>> anything like that in the physical scan code.
>> Leaf page can be pointed by internal page and rightlink
>> simultaneously. The purpose of NSN is to visit this page exactly once
>> by following only on of two links in a scan. This is achieved
>> naturally if we read everything from the beginning to the end. (That
>> is how I understand, I can be wrong)
> The scenario where this fails goes like this:
> 1. Vacuum scans physical pages 1-10
> 2. A concurrent insertion splits page 15. The new left half stays on page 15, but the new right half goes to page 5
> 3. Vacuum scans pages 11-20
> Now, if there were any dead tuples on the right half of the split, moved to page 5, the vacuum would miss them.
> The way this is handled in B-tree is that when a page is split, the page is stamped with current "vacuum cycle id". When the vacuum scan encounters a page with the current cycle id, whose right-link points to a lower-numbered page, it immediately follows the right link, and re-scans it. I.e. in the above example, if it was a B-tree, in step 3 when vacuum scans page 15, it would see that it was concurrently split. It would immediately vacuum page 5 again, before continuing the scan in physical order.
> We'll need to do something similar in GiST.
OK, I will do this.
This is tradeoff between complex concurrency feature and possibility of few dead tuples left after VACUUM. I want to understand: is it something dangerous in this dead tuples?
There is one more serious race condition: result of first scan is just a hint where to look for downlinks to empty pages. If internal page splits between scan and cleanup, offsets of downlinks will be changed, cleanup will lock pages, see non-empty pages and will not delete them (though there are not dead tuples, just not deleted empty leafs).
Best regards, Andrey Borodin.
|Next Message||John Naylor||2018-07-14 08:47:38||Re: missing toast table for pg_policy|
|Previous Message||Michael Paquier||2018-07-14 06:37:56||Re: Fix some error handling for read() and errno|