| From: | Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com> |
|---|---|
| To: | Antonin Houska <ah(at)cybertec(dot)at> |
| Cc: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Treat <rob(at)xzilla(dot)net> |
| Subject: | Re: Adding REPACK [concurrently] |
| Date: | 2026-03-18 19:12:19 |
| Message-ID: | CAFC+b6qk3-DQTi43QMqvVLP+sudPV4vsLQm5iHfcCeObrNaVyA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello,
While i was doing concurrency test onn V41 patches ,i found this crash
because of the assert failure,
TRAP: failed Assert("RelationGetRelid(relation) == ((RepackDecodingState *)
ctx->output_writer_private)->relid"), File: "pgoutput_repack.c", Line: 97,
PID: 397007
postgres: REPACK decoding worker for relation "stress_victim"
(ExceptionalCondition+0x98)[0xaaaad9361698]
/home/srinath/Desktop/pgbuild/lib/postgresql/pgoutput_repack.so(+0xfe8)[0xffff90e00fe8]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x679e14)[0xaaaad9049e14]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x689cd0)[0xaaaad9059cd0]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x68a65c)[0xaaaad905a65c]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x68b2f0)[0xaaaad905b2f0]
postgres: REPACK decoding worker for relation "stress_victim"
(ReorderBufferCommit+0x74)[0xaaaad905b374]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x671ec4)[0xaaaad9041ec4]
postgres: REPACK decoding worker for relation "stress_victim"
(xact_decode+0x1a0)[0xaaaad9040edc]
postgres: REPACK decoding worker for relation "stress_victim"
(LogicalDecodingProcessRecord+0xd4)[0xaaaad9040a80]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x33f558)[0xaaaad8d0f558]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x341ccc)[0xaaaad8d11ccc]
postgres: REPACK decoding worker for relation "stress_victim"
(RepackWorkerMain+0x1ac)[0xaaaad8d11bd4]
postgres: REPACK decoding worker for relation "stress_victim"
(BackgroundWorkerMain+0x2b0)[0xaaaad900d21c]
postgres: REPACK decoding worker for relation "stress_victim"
(postmaster_child_launch+0x1f0)[0xaaaad9012070]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x64b974)[0xaaaad901b974]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x64bc64)[0xaaaad901bc64]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x64a3e4)[0xaaaad901a3e4]
postgres: REPACK decoding worker for relation "stress_victim"
(+0x647648)[0xaaaad9017648]
postgres: REPACK decoding worker for relation "stress_victim"
(PostmasterMain+0x160c)[0xaaaad9016d98]
postgres: REPACK decoding worker for relation "stress_victim"
(main+0x3dc)[0xaaaad8ea7a38]
/lib/aarch64-linux-gnu/libc.so.6(+0x284c4)[0xffff9c5c84c4]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98)[0xffff9c5c8598]
postgres: REPACK decoding worker for relation "stress_victim"
(_start+0x30)[0xaaaad8abc970]
2026-03-16 19:40:21.622 IST [393820] LOG: background worker "REPACK
decoding worker" (PID 397007) was terminated by signal 6: Aborted
2026-03-16 19:40:21.622 IST [393820] LOG: terminating any other active
server processes
2026-03-16 19:40:21.632 IST [397036] FATAL: the database system is in
recovery mode
This crash happens if we run REPACK (concurrently) on a table while a heavy
pgbench workload is concurrently executing multi-table(setup.sql)
transactions(dual_chaos.sql).
It triggers after a few back to back REPACK (concurrently) runs.
i think i found the cause for this crash , because there were some changes
which
slipped under the nose of the change_useless_for_repack filter , which led
some
changes which are not related to the relation which we are currently doing
REPACK (concurrently)
got decoded and added into the reorderbuffer queue, the reason for this
is repacked_rel_locator.relNumber
is by default set to InvalidOid, this is actually set to the target
relation during setup_logical_decoding
but this done after DecodingContextFindStartpoint, in
DecodingContextFindStartpoint changes are not
filtered even if its not related to the target relation , because
rm_decode->change_useless_for_repack->am_decoding_for_repack
where repacked_rel_locator.relNumber is still InvalidOid, which makes it
skip the filtering even its not the target relation,
this makes it to be added to reorder buffer queue, so during the processing
of reorder buffer plugin_change is called
where assert fails, i have attached a diff patch to solve this.
thoughts?
--
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/
| Attachment | Content-Type | Size |
|---|---|---|
| fix_diff.norobots | application/octet-stream | 1.6 KB |
| dual_chaos.sql | application/octet-stream | 680 bytes |
| setup.sql | application/octet-stream | 466 bytes |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bertrand Drouvot | 2026-03-18 19:15:33 | Re: Enable -Wstrict-prototypes and -Wold-style-definition by default |
| Previous Message | Tom Lane | 2026-03-18 19:10:07 | Re: Improve hash join's handling of tuples with null join keys |