Re: autovacuum launcher crash: assert in pgstat_count_io_op (IOOP_EXTEND on pg_database's VM)

From: Ewan Young <kdbase(dot)hack(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: autovacuum launcher crash: assert in pgstat_count_io_op (IOOP_EXTEND on pg_database's VM)
Date: 2026-06-02 14:02:54
Message-ID: CAON2xHPxvBO2xwPjEpc0s+np3nWypyOuTThb1iYPfVnR1W7vFA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Melanie,

That all makes sense — your plan sounds right to me. Happy to go with
relaxing pgstat_tracks_io_op() for 19 and leaving the
table_beginscan_catalog() flags work for 20, and to drop my patch.

Reproducer attached.

Thanks so much for the quick and thoughtful response — really appreciate it!

Ewan

On Mon, Jun 1, 2026 at 11:05 PM Melanie Plageman <melanieplageman(at)gmail(dot)com>
wrote:

> On Sun, May 31, 2026 at 12:37 AM Ewan Young <kdbase(dot)hack(at)gmail(dot)com> wrote:
> >
> > I was stress-testing master (commit e2b35735b00, assertions enabled)
> with a
> > workload that does a lot of DDL/DML, including creating and dropping
> > databases in a tight loop, and the autovacuum launcher kept crashing on
> me --
> > every 15-40 minutes or so once it was under load:
> >
> > TRAP: failed Assert("pgstat_tracks_io_op(MyBackendType, io_object,
> > io_context, io_op)"), File: "pgstat_io.c", Line: 74
> > LOG: autovacuum launcher process (PID ...) was terminated by signal 6:
> > Aborted
> >
> > The postmaster recovers fine, but it just starts another launcher that
> hits
> > the exact same assert, so it never really gets out of the loop.
>
> Ouch :( Thanks for the report!
>
> > What surprised me is that the launcher's catalog scan isn't even flagged
> > read-only (table_beginscan_catalog doesn't set SO_HINT_REL_READ_ONLY),
> > so it never actually intends to set the VM -- it just pins/extends it
> anyway.
>
> Yea, so SO_HINT_REL_READ_ONLY is only meant as a hint. I don't
> guarantee that all queries that aren't modifying the relation will
> pass it. It's only a performance optimization. We did touch on
> excluding scans of catalog tables briefly in the thread (albeit deep
> in the thread) [1].
>
> That being said, we still pin the VM (and potentially extend it) even
> if we don't set it. If it does already exist, pinning it lets us check
> for corruption. If we extend it and won't set it, it isn't totally
> wasted work because then we don't have to extend it later. Though it's
> true that if the VM needs to be extended, that part of the VM can't be
> corrupted, I wanted to avoid special casing the presence of the VM
> page.
>
> When I wrote pgstat_tracks_io_op(), I thought through which IO
> operations we should expect for each backend type. The idea was that
> if that were ever to change, we could change pgstat_tracks_io_op().
> There is no reason why the autovacuum launcher inherently shouldn't
> extend the VM. Logically, if it is just reading catalog tables, it
> won't need to extend the actual data fork of the relation. However,
> now that reading tables may cause extending the VM, we can modify
> pgstat_tracks_io_op() like this:
>
> - if ((bktype == B_AUTOVAC_LAUNCHER || bktype == B_BG_WRITER ||
> - bktype == B_CHECKPOINTER) && io_op == IOOP_EXTEND)
> + if ((bktype == B_BG_WRITER || bktype == B_CHECKPOINTER) &&
> + io_op == IOOP_EXTEND)
> return false;
>
> We should probably add a flags argument to table_beginscan_catalog()
> in 20. That along with modifying pgstat_tracks_io_op() is probably the
> right solution IMO. I think we shouldn't do that in 19 since it is
> expanding the feature footprint.
>
> So for fixing 19, we could just alter pgstat_tracks_io_op() which
> would avoid tripping the assert.
>
> The next question is whether or not we should also do the patch you
> proposed, since it is true that if the VM is not extended enough to
> have the relevant VM bits, we can't possibly be fixing corruption. And
> if we are not doing that and won't set the VM, we are prematurely
> doing work.
>
> My hesitation about this is that the logic is a bit confusing. It
> relies on us knowing that visibiltymap_pin() will extend the relation
> and visibilitymap_get_status() won't. Both will read and pin the VM if
> it exists. If in heap_page_prune_and_freeze() we do anything with the
> VM page, it must be pinned. So, you have to know that if it needed to
> be extended, we won't have done that if rel_read_only was passed and
> so vmbuffer will be invalid and visiblitymap_get_status() will have
> returned 0. I don't think it's wrong, it just feels a bit off in a way
> I can't quite put my finger on. Perhaps it is just that I liked the
> invariant that the VM page would always be passed to
> heap_page_prune_and_freeze(). I'll have to think about it a bit more.
>
> On Mon, Jun 1, 2026 at 4:41 AM Ewan Young <kdbase(dot)hack(at)gmail(dot)com> wrote:
> >
> > Two commits combine to cause it: 4f7ecca84dd added the unconditional
> > (extending) visibilitymap_pin() in the on-access prune path, and
> 378a216187a
> > made INSERT set pd_prune_xid, so on-access pruning now fires on
> insert-mostly
> > catalogs like pg_database. That's also why it was hard to reduce: any
> > regular backend or autovacuum worker that scans pg_database recreates the
> > fork harmlessly (they may extend), so the launcher only crashes in the
> brief
> > window before that happens. I can now reproduce it deterministically;
> happy
> > to share the script.
>
> Please do share your reproducer. It's always nice to have those.
>
> - Melanie
>
> [1]
> https://www.postgresql.org/message-id/9468F957-C0ED-4D72-8C89-61162CAA5591%40yandex-team.ru
>

Attachment Content-Type Size
av_launcher_vm_extend_repro.sh text/x-sh 4.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2026-06-02 14:08:12 Re: problems with toast.* reloptions
Previous Message Chao Li 2026-06-02 13:42:23 SERVICEFILE shows wrong file after servicefile fallback