Re: Parallel heap vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, John Naylor <johncnaylorls(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Parallel heap vacuum
Date: 2025-06-30 13:40:31
Message-ID: CAD21AoC6YdAWw7gnu5fesYiYx_4X7qxBC73j9tMCbHyybawdrg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jun 14, 2025 at 5:05 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Apr 28, 2025 at 11:07 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Sat, Apr 5, 2025 at 1:17 PM Melanie Plageman
> > <melanieplageman(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Apr 4, 2025 at 5:35 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > > I haven't looked closely at this version but I did notice that you do
> > > > > not document that parallel vacuum disables eager scanning. Imagine you
> > > > > are a user who has set the eager freeze related table storage option
> > > > > (vacuum_max_eager_freeze_failure_rate) and you schedule a regular
> > > > > parallel vacuum. Now that table storage option does nothing.
> > > >
> > > > Good point. That restriction should be mentioned in the documentation.
> > > > I'll update the patch.
> > >
> > > Yea, I mean, to be honest, when I initially replied to the thread
> > > saying I thought temporarily disabling eager scanning for parallel
> > > heap vacuuming was viable, I hadn't looked at the patch yet and
> > > thought that there was a separate way to enable the new parallel heap
> > > vacuum (separate from the parallel option for the existing parallel
> > > index vacuuming). I don't like that this disables functionality that
> > > worked when I pushed the eager scanning feature.
> >
> > Thank you for sharing your opinion. I think this is one of the main
> > points that we need to deal with to complete this patch.
> >
> > After considering various approaches to integrating parallel heap
> > vacuum and eager freeze scanning, one viable solution would be to
> > implement a dedicated parallel scan mechanism for parallel lazy scan,
> > rather than relying on the table_block_parallelscan_xxx() facility.
> > This approach would involve dividing the table into chunks of 4,096
> > blocks, same as eager freeze scanning, where each parallel worker
> > would perform eager freeze scanning while maintaining its own local
> > failure count and a shared success count. This straightforward
> > approach offers an additional advantage: since the chunk size remains
> > constant, we can implement the SKIP_PAGES_THRESHOLD optimization
> > consistently throughout the table, including its final sections.
> >
> > However, this approach does present certain challenges. First, we
> > would need to maintain a separate implementation of lazy vacuum's
> > parallel scan alongside the table_block_parallelscan_XXX() facility,
> > potentially increasing maintenance overhead. Additionally, the fixed
> > chunk size across the entire table might prove less efficient when
> > processing blocks near the table's end compared to the dynamic
> > approach used by table_block_parallelscan_nextpage().
> >
>
> I've attached the updated patches for parallel heap vacuum. This
> version includes several updates:
>
> We can use eager scanning mechanisms even during parallel heap vacuum.
> The table is divided into a fixed size chunk (1024 blocks) each of
> which is assigned to a parallel vacuum worker. The eager scanning
> failure count is evenly divided into chunks as the sizes of region and
> chunk are different.
>
> The 0005 patches added a new parallel heap vacuum test to improve the
> coverage. Specifically, it tests the case using injection points where
> the leader launches fewer parallel workers during multiple index
> scans, having the leader complete the unfinished scans at the end of
> the lazy scan heap.
>

I've rebased the patches to the current HEAD.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v18-0005-Add-more-parallel-vacuum-tests.patch application/octet-stream 7.9 KB
v18-0004-Support-parallelism-for-collecting-dead-items-du.patch application/octet-stream 64.0 KB
v18-0002-vacuumparallel.c-Support-parallel-vacuuming-for-.patch application/octet-stream 27.1 KB
v18-0001-Introduces-table-AM-APIs-for-parallel-table-vacu.patch application/octet-stream 6.3 KB
v18-0003-Move-lazy-heap-scan-related-variables-to-new-str.patch application/octet-stream 27.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2025-06-30 13:41:04 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Previous Message Bertrand Drouvot 2025-06-30 13:36:12 Adding wait events statistics