Re: Questions/Observations related to Gist vacuum

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Questions/Observations related to Gist vacuum
Date: 2019-10-16 10:57:03
Message-ID: CAA4eK1JHR7riKw_mbc7O7ad5pS12WhnLJAiwPER51KFvxGjvgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 15, 2019 at 7:13 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 15/10/2019 09:37, Amit Kapila wrote:
> > 2. Right now, in gistbulkdelete we make a note of empty leaf pages and
> > internals pages and then in the second pass during gistvacuumcleanup,
> > we delete all the empty leaf pages. I was thinking why unlike nbtree,
> > we have delayed the deletion of empty pages till gistvacuumcleanup. I
> > don't see any problem if we do this during gistbulkdelete itself
> > similar to nbtree, also I think there is some advantage in marking the
> > pages as deleted as early as possible. Basically, if the vacuum
> > operation is canceled or errored out between gistbulkdelete and
> > gistvacuumcleanup, then I think the deleted pages could be marked as
> > recyclable very early in next vacuum operation. The other advantage
> > of doing this during gistbulkdelete is we can avoid sharing
> > information between gistbulkdelete and gistvacuumcleanup which is
> > quite helpful for a parallel vacuum as the information is not trivial
> > (it is internally stored as in-memory Btree). OTOH, there might be
> > some advantage for delaying the deletion of pages especially in the
> > case of multiple scans during a single VACUUM command. We can
> > probably delete all empty leaf pages in one go which could in some
> > cases lead to fewer internal page reads. However, I am not sure if it
> > is really advantageous to postpone the deletion as there seem to be
> > some downsides to it as well. I don't see it documented why unlike
> > nbtree we consider delaying deletion of empty pages till
> > gistvacuumcleanup, but I might be missing something.
>
> Hmm. The thinking is/was that removing the empty pages is somewhat
> expensive, because it has to scan all the internal nodes to find the
> downlinks to the to-be-deleted pages. Furthermore, it needs to scan all
> the internal pages (or at least until it has found all the downlinks),
> regardless of how many empty pages are being deleted. So it makes sense
> to do it only once, for all the empty pages. You're right though, that
> there would be advantages, too, in doing it after each pass.
>

I was thinking more about this and it seems that there could be more
cases where delaying the delete mark for pages can further delay the
recycling of pages. It is quite possible that immediately after bulk
delete the value of nextFullXid (used as deleteXid) is X and during
vacuum clean up it can be X + N where the chances of N being large is
more when there are multiple passes of vacuum scan. Now, if we would
have set the value of deleteXid as X, then there are more chances for
the next vacuum to recycle it. I am not sure but it might be that in
the future, we could come up with something (say if we can recompute
RecentGlobalXmin again) where we can recycle pages of first index scan
in the next scan of the index during single vacuum operation.

This is just to emphasize the point that doing the delete marking of
pages in the same pass has advantages, otherwise, I understand that
there are advantages in delaying it as well.

> All things
> considered, I'm not sure which is better.
>

Yeah, this is a tough call to make, but if we can allow it to delete
the pages in bulkdelete conditionally for parallel vacuum workers,
then it would be better.

I think we have below option w.r.t Gist indexes for parallel vacuum
a. don't allow Gist Index to participate in parallel vacuum
b. allow delete of leaf pages in bulkdelete for parallel worker
c. always allow deleting leaf pages in bulkdelete
d. Invent some mechanism to share all the Gist stats via shared memory

(a) is not a very good option, but it is a safe option as we can
extend it in the future and we might decide to go with it especially
if we can't decide among any other options. (b) would serve the need
but would add some additional checks in gistbulkdelete and will look
more like a hack. (c) would be best, but I think it will be difficult
to be sure that is a good decision for all type of cases. (d) can
require a lot of effort and I am not sure if it is worth adding
complexity in the proposed patch.

Do you have any thoughts on this?

Just to give you an idea of the current parallel vacuum patch, the
master backend scans the heap and forms the dead tuple array in dsm
and then we launch one worker for each index based on the availability
of workers and share the dead tuple array with each worker. Each
worker performs bulkdelete for the index. In the end, we perform
cleanup of all the indexes either via worker or master backend based
on some conditions.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2019-10-16 11:04:57 Re: Cache lookup errors with functions manipulation object addresses
Previous Message Andres Freund 2019-10-16 10:02:52 Re: strong memory leak in plpgsql from handled rollback and lazy cast