|From:||Andrey Borodin <x4mmm(at)yandex-team(dot)ru>|
|To:||Heikki Linnakangas <hlinnaka(at)iki(dot)fi>|
|Cc:||Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Костя Кузнецов <chapaev28(at)ya(dot)ru>|
|Subject:||Re: GiST VACUUM|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
> 13 июля 2018 г., в 18:10, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> написал(а):
>>>> But the situation in gistdoinsert(), where you encounter a deleted leaf page, could happen during normal operation, if vacuum runs concurrently with an insert. Insertion locks only one page at a time, as it descends the tree, so after it has released the lock on the parent, but before it has locked the child, vacuum might have deleted the page. In the latest patch, you're checking for that just before swapping the shared lock for an exclusive one, but I think that's wrong; you need to check for that after swapping the lock, because otherwise vacuum might delete the page while you're not holding the lock.
>>> Looks like a valid concern, I'll move that code again.
> Ok, the comment now says:
>> + /*
>> + * Leaf pages can be left deleted but still referenced in case of
>> + * crash during VACUUM's gistbulkdelete()
>> + */
> But that's not accurate, right? You should never see deleted pages after a crash, because the parent is updated in the same WAL record as the child page, right?
Fixed the comment.
> I'm still a bit scared about using pd_prune_xid to store the XID that prevents recycling the page too early. Can we use some field in GISTPageOpaqueData for that, similar to how the B-tree stores it in BTPageOpaqueData?
There is no room in opaque data, but, technically all page is just a tombstone until reused, so we can pick arbitrary place. PFA v7 where xid after delete is stored in opaque data, but we can use other places, like line pointer array or opaque-1
> 13 июля 2018 г., в 18:25, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> написал(а):
> Looking at the second patch, to scan the GiST index in physical order, that seems totally unsafe, if there are any concurrent page splits. In the logical scan, pushStackIfSplited() deals with that, by comparing the page's NSN with the parent's LSN. But I don't see anything like that in the physical scan code.
Leaf page can be pointed by internal page and rightlink simultaneously. The purpose of NSN is to visit this page exactly once by following only on of two links in a scan. This is achieved naturally if we read everything from the beginning to the end. (That is how I understand, I can be wrong)
> I think we can do something similar in the physical scan: remember the current LSN position at the beginning of the vacuum, and compare with that. The B-tree code uses the "cycle ID" for similar purposes.
> Do we still need the separate gistvacuumcleanup() pass, if we scan the index in physical order in the bulkdelete pass already?
We do not need to gather stats there, but we are doing RecordFreeIndexPage() and IndexFreeSpaceMapVacuum(). Is it correct to move this to first scan?
Best regards, Andrey Borodin.
|Next Message||Heikki Linnakangas||2018-07-13 19:22:42||Re: pgsql: Fix parallel index and index-only scans to fall back to serial.|
|Previous Message||Don Seiler||2018-07-13 18:20:15||Re: [PATCH] Include application_name in "connection authorized" log message|