Re: Adding REPACK [concurrently]

From: Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Treat <rob(at)xzilla(dot)net>
Subject: Re: Adding REPACK [concurrently]
Date: 2026-03-18 19:12:19
Message-ID: CAFC+b6qk3-DQTi43QMqvVLP+sudPV4vsLQm5iHfcCeObrNaVyA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

While i was doing concurrency test onn V41 patches ,i found this crash
because of the assert failure,

TRAP: failed Assert("RelationGetRelid(relation) == ((RepackDecodingState *)
ctx->output_writer_private)->relid"), File: "pgoutput_repack.c", Line: 97,
PID: 397007
postgres: REPACK decoding worker for relation "stress_victim"
(ExceptionalCondition+0x98)[0xaaaad9361698]
/home/srinath/Desktop/pgbuild/lib/postgresql/pgoutput_repack.so(+0xfe8)[0xffff90e00fe8]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x679e14)[0xaaaad9049e14]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x689cd0)[0xaaaad9059cd0]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x68a65c)[0xaaaad905a65c]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x68b2f0)[0xaaaad905b2f0]
postgres: REPACK decoding worker for relation "stress_victim"
(ReorderBufferCommit+0x74)[0xaaaad905b374]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x671ec4)[0xaaaad9041ec4]
postgres: REPACK decoding worker for relation "stress_victim"
(xact_decode+0x1a0)[0xaaaad9040edc]
postgres: REPACK decoding worker for relation "stress_victim"
(LogicalDecodingProcessRecord+0xd4)[0xaaaad9040a80]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x33f558)[0xaaaad8d0f558]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x341ccc)[0xaaaad8d11ccc]
postgres: REPACK decoding worker for relation "stress_victim"
(RepackWorkerMain+0x1ac)[0xaaaad8d11bd4]
postgres: REPACK decoding worker for relation "stress_victim"
(BackgroundWorkerMain+0x2b0)[0xaaaad900d21c]
postgres: REPACK decoding worker for relation "stress_victim"
(postmaster_child_launch+0x1f0)[0xaaaad9012070]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x64b974)[0xaaaad901b974]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x64bc64)[0xaaaad901bc64]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x64a3e4)[0xaaaad901a3e4]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x647648)[0xaaaad9017648]
postgres: REPACK decoding worker for relation "stress_victim"
(PostmasterMain+0x160c)[0xaaaad9016d98]
postgres: REPACK decoding worker for relation "stress_victim"
(main+0x3dc)[0xaaaad8ea7a38]
/lib/aarch64-linux-gnu/libc.so.6(+0x284c4)[0xffff9c5c84c4]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98)[0xffff9c5c8598]
postgres: REPACK decoding worker for relation "stress_victim"
(_start+0x30)[0xaaaad8abc970]
2026-03-16 19:40:21.622 IST [393820] LOG: background worker "REPACK
decoding worker" (PID 397007) was terminated by signal 6: Aborted
2026-03-16 19:40:21.622 IST [393820] LOG: terminating any other active
server processes
2026-03-16 19:40:21.632 IST [397036] FATAL: the database system is in
recovery mode

This crash happens if we run REPACK (concurrently) on a table while a heavy
pgbench workload is concurrently executing multi-table(setup.sql)
transactions(dual_chaos.sql).
It triggers after a few back to back REPACK (concurrently) runs.

i think i found the cause for this crash , because there were some changes
which
slipped under the nose of the change_useless_for_repack filter , which led
some
changes which are not related to the relation which we are currently doing
REPACK (concurrently)
got decoded and added into the reorderbuffer queue, the reason for this
is repacked_rel_locator.relNumber
is by default set to InvalidOid, this is actually set to the target
relation during setup_logical_decoding
but this done after DecodingContextFindStartpoint, in
DecodingContextFindStartpoint changes are not
filtered even if its not related to the target relation , because
rm_decode->change_useless_for_repack->am_decoding_for_repack
where repacked_rel_locator.relNumber is still InvalidOid, which makes it
skip the filtering even its not the target relation,
this makes it to be added to reorder buffer queue, so during the processing
of reorder buffer plugin_change is called
where assert fails, i have attached a diff patch to solve this.

thoughts?

--
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/

Attachment Content-Type Size
fix_diff.norobots application/octet-stream 1.6 KB
dual_chaos.sql application/octet-stream 680 bytes
setup.sql application/octet-stream 466 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2026-03-18 19:15:33 Re: Enable -Wstrict-prototypes and -Wold-style-definition by default
Previous Message Tom Lane 2026-03-18 19:10:07 Re: Improve hash join's handling of tuples with null join keys