Questions/Observations related to Gist vacuum

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Questions/Observations related to Gist vacuum
Date: 2019-10-15 06:37:52
Message-ID: CAA4eK1LGr+MN0xHZpJ2dfS8QNQ1a_aROKowZB+MPNep8FVtwAA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While reviewing a parallel vacuum patch [1], we noticed a few things
about $SUBJECT implemented in commit -
7df159a620b760e289f1795b13542ed1b3e13b87.

1. A new memory context GistBulkDeleteResult->page_set_context has
been introduced, but it doesn't seem to be used.
2. Right now, in gistbulkdelete we make a note of empty leaf pages and
internals pages and then in the second pass during gistvacuumcleanup,
we delete all the empty leaf pages. I was thinking why unlike nbtree,
we have delayed the deletion of empty pages till gistvacuumcleanup. I
don't see any problem if we do this during gistbulkdelete itself
similar to nbtree, also I think there is some advantage in marking the
pages as deleted as early as possible. Basically, if the vacuum
operation is canceled or errored out between gistbulkdelete and
gistvacuumcleanup, then I think the deleted pages could be marked as
recyclable very early in next vacuum operation. The other advantage
of doing this during gistbulkdelete is we can avoid sharing
information between gistbulkdelete and gistvacuumcleanup which is
quite helpful for a parallel vacuum as the information is not trivial
(it is internally stored as in-memory Btree). OTOH, there might be
some advantage for delaying the deletion of pages especially in the
case of multiple scans during a single VACUUM command. We can
probably delete all empty leaf pages in one go which could in some
cases lead to fewer internal page reads. However, I am not sure if it
is really advantageous to postpone the deletion as there seem to be
some downsides to it as well. I don't see it documented why unlike
nbtree we consider delaying deletion of empty pages till
gistvacuumcleanup, but I might be missing something.

Thoughts?

[1] - https://www.postgresql.org/message-id/CAA4eK1JEQ2y3uNucNopDjK8pse6xSe5%3D_oknoWfRQvAF%3DVqsBA%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-10-15 06:55:13 Re: [HACKERS] Block level parallel vacuum
Previous Message Michael Paquier 2019-10-15 06:35:47 Re: v12.0: segfault in reindex CONCURRENTLY