RE: parallel vacuum comments

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: RE: parallel vacuum comments
Date: 2021-11-05 02:15:23
Message-ID: OS0PR01MB57161B5970B657B31379A1CC948E9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thur, Nov 4, 2021 1:25 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Wed, Nov 3, 2021 at 1:08 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Tue, Nov 2, 2021 at 11:17 AM Masahiko Sawada<sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Nov 2, 2021 at 5:57 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > > >
> > >
> > > > Rather than inventing PARALLEL_VACUUM_KEY_INDVAC_CHECK (just for
> > > > assert-enabled builds), we should invent PARALLEL_VACUUM_STATS --
> > > > a dedicated shmem area for the array of LVSharedIndStats (no more
> > > > storing LVSharedIndStats entries at the end of the LVShared space
> > > > in an ad-hoc, type unsafe way). There should be one array element
> > > > for each and every index -- even those indexes where parallel
> > > > index vacuuming is unsafe or not worthwhile (unsure if avoiding
> > > > parallel processing for "not worthwhile" indexes actually makes
> > > > sense, BTW). We can then get rid of the bitmap/IndStatsIsNull()
> > > > stuff entirely. We'd also add new per-index state fields to
> > > > LVSharedIndStats itself. We could directly record the status of
> > > > each index (e.g., parallel unsafe, amvacuumcleanup processing
> > > > done, ambulkdelete processing done) explicitly. All code could
> > > > safely subscript the LVSharedIndStats array directly, using idx
> > > > style integers. That seems far more robust and consistent.
> > >
> > > Sounds good.
> > >
> > > During the development, I wrote the patch while considering using
> > > fewer shared memory but it seems that it brought complexity (and
> > > therefore the bug). It would not be harmful even if we allocate
> > > index statistics on DSM for unsafe indexes and “not worthwhile"
> > > indexes in practice.
> > >
> >
> > If we want to allocate index stats for all indexes in DSM then why not
> > consider it on the lines of buf/wal_usage means tack those via
> > LVParallelState? And probably replace bitmap with an array of bools
> > that indicates which indexes can be skipped by the parallel worker.
> >
>
> I've attached a draft patch. The patch incorporated all comments from Andres
> except for the last comment that moves parallel related code to another file.
> I'd like to discuss how we split vacuumlazy.c.

Hi,

I was recently reading the parallel vacuum code, and I think the patch can
bring a certain improvement.

Here are a few minor comments about it.

1)

+ * Reset all index status back to invalid (while checking that we have
+ * processed all indexes).
+ */
+ for (int i = 0; i < vacrel->nindexes; i++)
+ {
+ LVSharedIndStats *stats = &(lps->lvsharedindstats[i]);
+
+ Assert(stats->status == INDVAC_STATUS_COMPLETED);
+ stats->status = INDVAC_STATUS_INITIAL;
+ }

Do you think it might be clearer to report an error here ?

2)

+prepare_parallel_index_processing(LVRelState *vacrel, bool vacuum)

For the second paramater 'vacuum'. Would it be clearer if we pass a
LVIndVacStatus type instead of the boolean value ?

Best regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-11-05 02:40:00 Re: inefficient loop in StandbyReleaseLockList()
Previous Message Tomas Vondra 2021-11-05 01:10:55 Re: [sqlsmith] Failed assertion in brin_minmax_multi_distance_float4 on REL_14_STABLE