Re: parallel vacuum comments

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel vacuum comments
Date: 2021-11-04 05:24:46
Message-ID: CAD21AoD4ZZPi59UH=sPa51b4-RDSqBxWoVxjvbQJjkh7_=9utA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 3, 2021 at 1:08 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Nov 2, 2021 at 11:17 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Tue, Nov 2, 2021 at 5:57 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > >
> >
> > > Rather than inventing PARALLEL_VACUUM_KEY_INDVAC_CHECK (just for
> > > assert-enabled builds), we should invent PARALLEL_VACUUM_STATS -- a
> > > dedicated shmem area for the array of LVSharedIndStats (no more
> > > storing LVSharedIndStats entries at the end of the LVShared space in
> > > an ad-hoc, type unsafe way). There should be one array element for
> > > each and every index -- even those indexes where parallel index
> > > vacuuming is unsafe or not worthwhile (unsure if avoiding parallel
> > > processing for "not worthwhile" indexes actually makes sense, BTW). We
> > > can then get rid of the bitmap/IndStatsIsNull() stuff entirely. We'd
> > > also add new per-index state fields to LVSharedIndStats itself. We
> > > could directly record the status of each index (e.g., parallel unsafe,
> > > amvacuumcleanup processing done, ambulkdelete processing done)
> > > explicitly. All code could safely subscript the LVSharedIndStats array
> > > directly, using idx style integers. That seems far more robust and
> > > consistent.
> >
> > Sounds good.
> >
> > During the development, I wrote the patch while considering using
> > fewer shared memory but it seems that it brought complexity (and
> > therefore the bug). It would not be harmful even if we allocate index
> > statistics on DSM for unsafe indexes and “not worthwhile" indexes in
> > practice.
> >
>
> If we want to allocate index stats for all indexes in DSM then why not
> consider it on the lines of buf/wal_usage means tack those via
> LVParallelState? And probably replace bitmap with an array of bools
> that indicates which indexes can be skipped by the parallel worker.
>

I've attached a draft patch. The patch incorporated all comments from
Andres except for the last comment that moves parallel related code to
another file. I'd like to discuss how we split vacuumlazy.c.

Regarding tests, I’d like to add tests to check if a vacuum with
multiple index scans (i.g., due to small maintenance_work_mem) works
fine. But a problem is that we need at least about 200,000 garbage
tuples to perform index scan twice even with the minimum
maintenance_work_mem. Which takes a time to load tuples.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachment Content-Type Size
parallel_vacuum_refactor.patch application/octet-stream 36.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2021-11-04 05:35:50 Re: Added schema level support for publication.
Previous Message Peter Smith 2021-11-04 05:19:07 Re: row filtering for logical replication