Re: Questions/Observations related to Gist vacuum

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Questions/Observations related to Gist vacuum
Date: 2019-10-15 13:43:25
Message-ID: a8511df2-a906-f9d2-dd9f-780fb6ad32c6@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15/10/2019 09:37, Amit Kapila wrote:
> While reviewing a parallel vacuum patch [1], we noticed a few things
> about $SUBJECT implemented in commit -
> 7df159a620b760e289f1795b13542ed1b3e13b87.
>
> 1. A new memory context GistBulkDeleteResult->page_set_context has
> been introduced, but it doesn't seem to be used.

Oops. internal_page_set and empty_leaf_set were supposed to be allocated
in that memory context. As things stand, we leak them until end of
vacuum, in a multi-pass vacuum.

> 2. Right now, in gistbulkdelete we make a note of empty leaf pages and
> internals pages and then in the second pass during gistvacuumcleanup,
> we delete all the empty leaf pages. I was thinking why unlike nbtree,
> we have delayed the deletion of empty pages till gistvacuumcleanup. I
> don't see any problem if we do this during gistbulkdelete itself
> similar to nbtree, also I think there is some advantage in marking the
> pages as deleted as early as possible. Basically, if the vacuum
> operation is canceled or errored out between gistbulkdelete and
> gistvacuumcleanup, then I think the deleted pages could be marked as
> recyclable very early in next vacuum operation. The other advantage
> of doing this during gistbulkdelete is we can avoid sharing
> information between gistbulkdelete and gistvacuumcleanup which is
> quite helpful for a parallel vacuum as the information is not trivial
> (it is internally stored as in-memory Btree). OTOH, there might be
> some advantage for delaying the deletion of pages especially in the
> case of multiple scans during a single VACUUM command. We can
> probably delete all empty leaf pages in one go which could in some
> cases lead to fewer internal page reads. However, I am not sure if it
> is really advantageous to postpone the deletion as there seem to be
> some downsides to it as well. I don't see it documented why unlike
> nbtree we consider delaying deletion of empty pages till
> gistvacuumcleanup, but I might be missing something.

Hmm. The thinking is/was that removing the empty pages is somewhat
expensive, because it has to scan all the internal nodes to find the
downlinks to the to-be-deleted pages. Furthermore, it needs to scan all
the internal pages (or at least until it has found all the downlinks),
regardless of how many empty pages are being deleted. So it makes sense
to do it only once, for all the empty pages. You're right though, that
there would be advantages, too, in doing it after each pass. All things
considered, I'm not sure which is better.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Steven Winfield 2019-10-15 13:52:52 RE: BUG #16059: Tab-completion of filenames in COPY commands removes required quotes
Previous Message Francisco Olarte 2019-10-15 13:38:15 Re: BUG #16059: Tab-completion of filenames in COPY commands removes required quotes