Re: [HACKERS] Block level parallel vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Mahendra Singh <mahi6run(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2019-10-30 08:59:54
Message-ID: CAD21AoAVgYpBsXoK6Pov3XywhOvvPo2nM2K0e0-MLOoqJYDuRA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 28, 2019 at 3:50 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Sun, Oct 27, 2019 at 12:52 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Fri, Oct 25, 2019 at 9:19 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > >
> > I haven't yet read the new set of the patch. But, I have noticed one
> > thing. That we are getting the size of the statistics using the AM
> > routine. But, we are copying those statistics from local memory to
> > the shared memory directly using the memcpy. Wouldn't it be a good
> > idea to have an AM specific routine to get it copied from the local
> > memory to the shared memory? I am not sure it is worth it or not but
> > my thought behind this point is that it will give AM to have local
> > stats in any form ( like they can store a pointer in that ) but they
> > can serialize that while copying to shared stats. And, later when
> > shared stats are passed back to the Am then it can deserialize in its
> > local form and use it.
> >
>
> You have a point, but after changing the gist index, we don't have any
> current usage for indexes that need something like that. So, on one
> side there is some value in having an API to copy the stats, but on
> the other side without having clear usage of an API, it might not be
> good to expose a new API for the same. I think we can expose such an
> API in the future if there is a need for the same. Do you or anyone
> know of any external IndexAM that has such a need?
>
> Few minor comments while glancing through the latest patchset.
>
> 1. I think you can merge 0001*, 0002*, 0003* patch into one patch as
> all three expose new variable/function from IndexAmRoutine.

Fixed.

>
> 2.
> +prepare_index_statistics(LVShared *lvshared, Relation *Irel, int nindexes)
> +{
> + char *p = (char *) GetSharedIndStats(lvshared);
> + int vac_work_mem = IsAutoVacuumWorkerProcess() &&
> + autovacuum_work_mem != -1 ?
> + autovacuum_work_mem : maintenance_work_mem;
>
> I think this function won't be called from AutoVacuumWorkerProcess at
> least not as of now, so isn't it a better idea to have an Assert for
> it?

Fixed.

>
> 3.
> +void
> +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
>
> This function is for performing a parallel operation on the index, so
> why to start with heap?

Because parallel vacuum supports only indexes that are created on heaps.

> It is better to name it as
> index_parallel_vacuum_main or simply parallel_vacuum_main.

I'm concerned that both names index_parallel_vacuum_main and
parallel_vacuum_main seem to be generic in spite of these codes are
heap-specific code.

>
> 4.
> /* useindex = true means two-pass strategy; false means one-pass */
> @@ -128,17 +280,12 @@ typedef struct LVRelStats
> BlockNumber pages_removed;
> double tuples_deleted;
> BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
> - /* List of TIDs of tuples we intend to delete */
> - /* NB: this list is ordered by TID address */
> - int num_dead_tuples; /* current # of entries */
> - int max_dead_tuples; /* # slots allocated in array */
> - ItemPointer dead_tuples; /* array of ItemPointerData */
> + LVDeadTuples *dead_tuples;
> int num_index_scans;
> TransactionId latestRemovedXid;
> bool lock_waiter_detected;
> } LVRelStats;
>
> -
> /* A few variables that don't seem worth passing around as parameters */
> static int elevel = -1;
>
> It seems like a spurious line removal.

Fixed.

These above comments are incorporated in the latest patch set(v32) [1].

[1] https://www.postgresql.org/message-id/CAD21AoAqT17QwKJ_sWOqRxNvg66wMw1oZZzf9Rt-E-zD%2BXOh_Q%40mail.gmail.com

Regards,

--
Masahiko Sawada

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2019-10-30 10:03:41 Re: v12.0: ERROR: could not find pathkey item to sort
Previous Message Kyotaro Horiguchi 2019-10-30 08:43:04 Re: Problem with synchronous replication