| From: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
|---|---|
| To: | Ewan Young <kdbase(dot)hack(at)gmail(dot)com> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: autovacuum launcher crash: assert in pgstat_count_io_op (IOOP_EXTEND on pg_database's VM) |
| Date: | 2026-06-01 15:05:28 |
| Message-ID: | CAAKRu_birmRc+rZWTewTf-dhCJc9nCzE92cbhoAzpSExHAy9Rw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Sun, May 31, 2026 at 12:37 AM Ewan Young <kdbase(dot)hack(at)gmail(dot)com> wrote:
>
> I was stress-testing master (commit e2b35735b00, assertions enabled) with a
> workload that does a lot of DDL/DML, including creating and dropping
> databases in a tight loop, and the autovacuum launcher kept crashing on me --
> every 15-40 minutes or so once it was under load:
>
> TRAP: failed Assert("pgstat_tracks_io_op(MyBackendType, io_object,
> io_context, io_op)"), File: "pgstat_io.c", Line: 74
> LOG: autovacuum launcher process (PID ...) was terminated by signal 6:
> Aborted
>
> The postmaster recovers fine, but it just starts another launcher that hits
> the exact same assert, so it never really gets out of the loop.
Ouch :( Thanks for the report!
> What surprised me is that the launcher's catalog scan isn't even flagged
> read-only (table_beginscan_catalog doesn't set SO_HINT_REL_READ_ONLY),
> so it never actually intends to set the VM -- it just pins/extends it anyway.
Yea, so SO_HINT_REL_READ_ONLY is only meant as a hint. I don't
guarantee that all queries that aren't modifying the relation will
pass it. It's only a performance optimization. We did touch on
excluding scans of catalog tables briefly in the thread (albeit deep
in the thread) [1].
That being said, we still pin the VM (and potentially extend it) even
if we don't set it. If it does already exist, pinning it lets us check
for corruption. If we extend it and won't set it, it isn't totally
wasted work because then we don't have to extend it later. Though it's
true that if the VM needs to be extended, that part of the VM can't be
corrupted, I wanted to avoid special casing the presence of the VM
page.
When I wrote pgstat_tracks_io_op(), I thought through which IO
operations we should expect for each backend type. The idea was that
if that were ever to change, we could change pgstat_tracks_io_op().
There is no reason why the autovacuum launcher inherently shouldn't
extend the VM. Logically, if it is just reading catalog tables, it
won't need to extend the actual data fork of the relation. However,
now that reading tables may cause extending the VM, we can modify
pgstat_tracks_io_op() like this:
- if ((bktype == B_AUTOVAC_LAUNCHER || bktype == B_BG_WRITER ||
- bktype == B_CHECKPOINTER) && io_op == IOOP_EXTEND)
+ if ((bktype == B_BG_WRITER || bktype == B_CHECKPOINTER) &&
+ io_op == IOOP_EXTEND)
return false;
We should probably add a flags argument to table_beginscan_catalog()
in 20. That along with modifying pgstat_tracks_io_op() is probably the
right solution IMO. I think we shouldn't do that in 19 since it is
expanding the feature footprint.
So for fixing 19, we could just alter pgstat_tracks_io_op() which
would avoid tripping the assert.
The next question is whether or not we should also do the patch you
proposed, since it is true that if the VM is not extended enough to
have the relevant VM bits, we can't possibly be fixing corruption. And
if we are not doing that and won't set the VM, we are prematurely
doing work.
My hesitation about this is that the logic is a bit confusing. It
relies on us knowing that visibiltymap_pin() will extend the relation
and visibilitymap_get_status() won't. Both will read and pin the VM if
it exists. If in heap_page_prune_and_freeze() we do anything with the
VM page, it must be pinned. So, you have to know that if it needed to
be extended, we won't have done that if rel_read_only was passed and
so vmbuffer will be invalid and visiblitymap_get_status() will have
returned 0. I don't think it's wrong, it just feels a bit off in a way
I can't quite put my finger on. Perhaps it is just that I liked the
invariant that the VM page would always be passed to
heap_page_prune_and_freeze(). I'll have to think about it a bit more.
On Mon, Jun 1, 2026 at 4:41 AM Ewan Young <kdbase(dot)hack(at)gmail(dot)com> wrote:
>
> Two commits combine to cause it: 4f7ecca84dd added the unconditional
> (extending) visibilitymap_pin() in the on-access prune path, and 378a216187a
> made INSERT set pd_prune_xid, so on-access pruning now fires on insert-mostly
> catalogs like pg_database. That's also why it was hard to reduce: any
> regular backend or autovacuum worker that scans pg_database recreates the
> fork harmlessly (they may extend), so the launcher only crashes in the brief
> window before that happens. I can now reproduce it deterministically; happy
> to share the script.
Please do share your reproducer. It's always nice to have those.
- Melanie
[1] https://www.postgresql.org/message-id/9468F957-C0ED-4D72-8C89-61162CAA5591%40yandex-team.ru
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | Srinivas Kumar | 2026-06-01 14:57:35 | Re: DBeaver Experiencing timeouts while connecting to New Linux PostgreSql server |