Re: [HACKERS] Block level parallel vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2019-03-04 01:27:32
Message-ID: CAD21AoApAnUCk9oX48t6k+7fn3w=wiWMpcDkkKBXR2khCGkwVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 2, 2019 at 3:54 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Fri, Mar 1, 2019 at 12:19 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > I wonder if we really want this behavior. Should a setting that
> > > controls the degree of parallelism when scanning the table also affect
> > > VACUUM? I tend to think that we probably don't ever want VACUUM of a
> > > table to be parallel by default, but rather something that the user
> > > must explicitly request. Happy to hear other opinions. If we do want
> > > this behavior, I think this should be written differently, something
> > > like this: The PARALLEL N option to VACUUM takes precedence over this
> > > option.
> >
> > For example, I can imagine a use case where a batch job does parallel
> > vacuum to some tables in a maintenance window. The batch operation
> > will need to compute and specify the degree of parallelism every time
> > according to for instance the number of indexes, which would be
> > troublesome. But if we can set the degree of parallelism for each
> > tables it can just to do 'VACUUM (PARALLEL)'.
>
> True, but the setting in question would also affect the behavior of
> sequential scans and index scans. TBH, I'm not sure that the
> parallel_workers reloption is really a great design as it is: is
> hard-coding the number of workers really what people want? Do they
> really want the same degree of parallelism for sequential scans and
> index scans? Why should they want the same degree of parallelism also
> for VACUUM? Maybe they do, and maybe somebody explain why they do,
> but as of now, it's not obvious to me why that should be true.

I think that there are users who want to specify the degree of
parallelism. I think that hard-coding the number of workers would be
good design for something like VACUUM which is a simple operation for
single object; since there are no joins, aggregations it'd be
relatively easy to compute it. That's why the patch introduces
PARALLEL N option as well. I think that a reloption for parallel
vacuum would be just a way to save the degree of parallelism. And I
agree that users don't want to use same degree of parallelism for
VACUUM, so maybe it'd better to add new reloption like
parallel_vacuum_workers. On the other hand, it can be a separate
patch, I can remove the reloption part from this patch and will
propose it when there are requests.

>
> > Since the parallel vacuum uses memory in the same manner as the single
> > process vacuum it's not deteriorated. I'd agree that that patch is
> > more smarter and this patch can be built on top of it but I'm
> > concerned that there two proposals on that thread and the discussion
> > has not been active for 8 months. I wonder if it would be worth to
> > think of improving the memory allocating based on that patch after the
> > parallel vacuum get committed.
>
> Well, I think we can't just say "oh, this patch is going to use twice
> as much memory as before," which is what it looks like it's doing
> right now. If you think it's not doing that, can you explain further?

In the current design, the leader process allocates the whole DSM at
once when starting and records dead tuple's TIDs to the DSM. This is
the same behaviour as before except for it's recording dead tuples TID
to the shared memory segment. Once index vacuuming finished the leader
process re-initialize DSM for the next time. So parallel vacuum uses
the same amount of memory as before during execution.

>
> > Agreed. I'll separate patches and propose it.
>
> Cool. Probably best to keep that on this thread.

Understood.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2019-03-04 01:37:50 Re: jsonpath
Previous Message David Rowley 2019-03-04 01:18:38 Re: POC: converting Lists into arrays