Re: [HACKERS] Block level parallel vacuum

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Mahendra Singh <mahi6run(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2019-10-14 09:37:00
Message-ID: CAA4eK1JEQ2y3uNucNopDjK8pse6xSe5=_oknoWfRQvAF=VqsBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 12, 2019 at 4:50 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Sat, Oct 12, 2019 at 11:29 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
>
> I see a much bigger problem with the way this patch collects the index
> stats in shared memory. IIUC, it allocates the shared memory (DSM)
> for all the index stats, in the same way, considering its size as
> IndexBulkDeleteResult. For the first time, it gets the stats from
> local memory as returned by ambulkdelete/amvacuumcleanup call and then
> copies it in shared memory space. There onwards, it always updates
> the stats in shared memory by pointing each index stats to that
> memory. In this scheme, you overlooked the point that an index AM
> could choose to return a larger structure of which
> IndexBulkDeleteResult is just the first field. This generally
> provides a way for ambulkdelete to communicate additional private data
> to amvacuumcleanup. We use this idea in the gist index, see how
> gistbulkdelete and gistvacuumcleanup works. The current design won't
> work for such cases.
>

Today, I looked at gistbulkdelete and gistvacuumcleanup closely and I
have a few observations about those which might help us to solve this
problem for gist indexes:
1. Are we using memory context GistBulkDeleteResult->page_set_context?
It seems to me it is not being used.
2. Each time we perform gistbulkdelete, we always seem to reset the
GistBulkDeleteResult stats, see gistvacuumscan. So, how will it
accumulate it for the cleanup phase when the vacuum needs to call
gistbulkdelete multiple times because the available space for
dead-tuple is filled. It seems to me like we only use the stats from
the very last call to gistbulkdelete.
3. Do we really need to give the responsibility of deleting empty
pages (gistvacuum_delete_empty_pages) to gistvacuumcleanup. Can't we
do it in gistbulkdelte? I see one advantage of postponing it till the
cleanup phase which is if somehow we can accumulate stats over
multiple calls of gistbulkdelete, but I am not sure if it is feasible.
At least, the way current code works, it seems that there is no
advantage to postpone deleting empty pages till the cleanup phase.

If we avoid postponing deleting empty pages till the cleanup phase,
then we don't have the problem for gist indexes.

This is not directly related to this patch, so we can discuss these
observations in a separate thread as well, but before that, I wanted
to check your opinion to see if this makes sense to you as this will
help us in moving this patch forward.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2019-10-14 09:39:02 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message nil socket 2019-10-14 09:36:04 Re: Minimal logical decoding on standbys