GiST VACUUM

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Костя Кузнецов <chapaev28(at)ya(dot)ru>
Subject: GiST VACUUM
Date: 2018-03-07 06:50:55
Message-ID: 1B9FAC6F-FA19-4A24-8C1B-F4F574844892@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, hackers!

Here's new version of GiST VACUUM patch set aimed at CF 2018-09.
Original thread can be found at [0]

==Purpose==
Currently GiST never reuse pages even if they are completely empty. This can lead to a bloat, if a lot of index tuples are deleted from key space, which do not receive newly inserted tuples.
First patch fixes that issue: empty pages are collected and reused for new page splits.

Second patch improves scanning algorithm to read GiST blocks in physical order. This can dramatically increase speed of scanning, especially on HDD.
This scan is using relatively big chunk of memory to build map of whole GiST graph. If there is not enough maintenance memory, patch had the fallback to old GiST VACUUM (from first patch).

==How logical scan works==
GiST VACUUM scans graph in DFS search to find removed tuples on leaf pages. It remembers internal pages that are referencing completely empty leaf pages.
Then in additional step, these pages are rescanned to delete references and mark leaf pages as free.

==How physical scan works==
Scan builds array of GistPSItem (Physical scan item). GistPSItem contains List of offsets pointing to potentially empty leaf pages and the information necessary to collect that list in one sequential file read.
Array of GistPSItems has one item for each GiST block.
When we encounter leaf page, if it is empty - we mark it empty and jump to parent (if known) to update it's list.
When we encounter internal page, we check GistPSItem of every child block to check if it is empty leaf and to mark parent ptr there.

==Limitations==
At least one reference on each internal pages is left undeleted to preserve balancing of the tree.
Pages that has FOLLOW-RIGHT flag also are not deleted, even if empty.

Thank you for your attention, any thoughts are welcome.

Best regards, Andrey Borodin.

[0] https://www.postgresql.org/message-id/flat/1147341441925550%40web17j(dot)yandex(dot)ru#1147341441925550(at)web17j(dot)yandex(dot)ru

Attachment Content-Type Size
0001-Delete-pages-during-GiST-VACUUM-v3.patch application/octet-stream 13.9 KB
0002-Physical-GiST-scan-during-VACUUM-v3.patch application/octet-stream 15.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2018-03-07 06:56:01 Re: 2018-03 CFM
Previous Message Michael Paquier 2018-03-07 06:49:02 Re: Rewrite of pg_dump TAP tests