Re: vacuumlazy: Modernize count_nondeletable_pages

From: Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject: Re: vacuumlazy: Modernize count_nondeletable_pages
Date: 2025-07-11 09:37:56
Message-ID: CAOzEurSHS=X9ObQ+0CdHEuAMCx4bdaZxs8AYE4T6ojQ1qukqLQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

On Fri, Jun 27, 2025 at 8:34 PM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
>
> Hi,
>
> Recently I had to debug an issue with VACUUM's truncate stage taking
> an AE lock for a long time, which caused an unpleasant outage due to
> blocked queries on a replica. That, and remembering a thread about
> this over at [0] got me looking at the code that truncates the
> relation.
>
> After looking at the code, I noticed that it has hand-rolled
> prefetching of pages, and in a rather peculiar way. Now that we have
> the read_stream.c API it is much more efficient to make use that
> system, if only to combine IOs and be able to use async IO handling.
>
> While making that change, I also noticed that the current coding does
> not guarantee progress when the relation's AE-lock is heavily
> contended and the process takes long enough to get started:
> When count_nondeletable_pages exits due to lock contention, then
> blockno%32==0. In that case the next iteration will start with that
> same blockno, and this may cause the scan to make no progress at all
> if the time from INSTR_TIME_SET_CURRENT(starttime) to the first
> INSTR_TIME_SET_CURRENT(currenttime) is >20ms; while unlikely this does
> seem possible on a system with high load.

With the previous logic (blockno % 32 == 0), I believe a process
waiting for a lock had to wait for an average of 16 pages to be
processed. With this change, it will now have to wait for 32 pages,
correct? I am not sure if this change is reasonable.

Additionally, if blockno % 32 == 0 in the next loop and there are
still processes waiting for a lock, it seems appropriate to prioritize
those waiting processes and skip the VACUUM TRUNCATE.

> Attached is a patch which improves the status quo for 1, and fixes
> item 2 by increasing the minimum number of pages processed before
> releasing the lock to 31.

Thank you for the patch!

+typedef struct CountNonDeletablePagesScan
+{
+ BlockNumber current_block;
+ BlockNumber target_block;
+} CountNonDeletablePagesScan;

I think it's better to declare structs at the top of the file. What do
you think?

I'm still not sure how to use read_stream, so I don't have any
comments on read_stream at this time.

--
Best regards,
Shinya Kato
NTT OSS Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2025-07-11 09:51:30 Re: analyze-in-stages post upgrade questions
Previous Message wenhui qiu 2025-07-11 09:33:52 Re: Small optimization with expanding dynamic hash table