Re: Error while processing invalidation message during ATTACH PARTITION leaves invalid relcache entry

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Rahila Syed <rahilasyed90(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Error while processing invalidation message during ATTACH PARTITION leaves invalid relcache entry
Date: 2026-06-21 18:00:00
Message-ID: 217846e2-d03c-4231-9959-3479cd65c4ae@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Rahila,

16.06.2026 07:42, Rahila Syed wrote:
> Hi Alexander,
>
> Thank you for the report. This is an interesting case of incomplete or
> incorrect error handling.
>
> Regarding the code path in LocalExecuteInvalidationMessage:
>
> (This can seem dubious, but I guess there could be other (perhaps more
> sophisticated) ways to trigger an error somewhere inside
> LocalExecuteInvalidationMessage() -> RelationCacheInvalidateEntry() ->
> RelationFlushRelation() -> RelationRebuildRelation() ->
> RelationBuildDesc() -> RelationBuildTupleDesc() -> systable_getnext()...)
>
> I wonder if we should prevent adding CHECK_FOR_INTERRUPTS (CFI) calls
> in this path. A quick search did not reveal any existing CFI calls
> here. In your example, the CFI is triggered by the elog(LOG, "") added
> to the code as part of your testing.

Thank you for the reply!

I've found a way to reproduce this without any code modifications:
for i in {1..100}; do
echo "ITERATION $i"

(for n in {1..10}; do
psql -qAt -c "SELECT pg_cancel_backend(pid) FROM pg_stat_activity WHERE query LIKE 'ALTER TABLE%'";
done;) &

cat << EOF | psql >>psql.log
CREATE TABLE pt (a int, $(seq -s, -f 'c%g int' 100)) PARTITION BY LIST (a);
CREATE INDEX ON pt(a);
CREATE INDEX ON pt(a);
CREATE INDEX ON pt(a);
CREATE INDEX ON pt(a);
CREATE INDEX ON pt(a);
CREATE TABLE tp1 (LIKE pt);
INSERT INTO tp1 (a) VALUES (1);

ALTER TABLE pt ATTACH PARTITION tp1 FOR VALUES IN (1);
DELETE FROM tp1;
EOF
wait

psql -v ON_ERROR_STOP=1 -c "DROP TABLE pt, tp1" || break;
done

It fails for me as below:
ITERATION 73
t
ERROR:  canceling statement due to user request
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
connection to server was lost
psql: error: connection to server on socket "/tmp/.s.PGSQL.15432" failed: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

2026-06-21 08:35:59.128 EEST [1134495:8] psql ERROR:  canceling statement due to user request
2026-06-21 08:35:59.128 EEST [1134495:9] psql BACKTRACE:
ProcessInterrupts at postgres.c:3548:4
heap_multi_insert at heapam.c:2367:6
CatalogTuplesMultiInsertWithInfo at indexing.c:287:11
recordMultipleDependencies at pg_depend.c:159:22
recordDependencyOn at pg_depend.c:56:1
StoreCatalogInheritance1 at tablecmds.c:3650:2
CreateInheritance at tablecmds.c:17688:2
attachPartitionTable at tablecmds.c:20516:2
ATExecAttachPartition at tablecmds.c:20777:24
ATExecCmd at tablecmds.c:5727:15
ATRewriteCatalogs at tablecmds.c:5401:4
ATController at tablecmds.c:4954:2
AlterTable at tablecmds.c:4602:1
ProcessUtilitySlow at utility.c:1327:7
standard_ProcessUtility at utility.c:1072:4
ProcessUtility at utility.c:528:3
PortalRunUtility at pquery.c:1149:2
PortalRunMulti at pquery.c:1307:5
PortalRun at pquery.c:788:5
exec_simple_query at postgres.c:1297:11
PostgresMain at postgres.c:4869:27
BackendInitialize at backend_startup.c:142:1
postmaster_child_launch at launch_backend.c:269:3
BackendStartup at postmaster.c:3627:8
ServerLoop at postmaster.c:1731:10
PostmasterMain at postmaster.c:1415:11
main at main.c:236:2
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b) [0x75a5c9c2a28b]
    postgres: law regression [local] ALTER TABLE(_start+0x25) [0x653adc53c595]
2026-06-21 08:35:59.128 EEST [1134495:10] psql STATEMENT:  ALTER TABLE pt ATTACH PARTITION tp1 FOR VALUES IN (1);

2026-06-21 19:49:42.527 EEST [1783003:16] psql LOG:  statement: DELETE FROM tp1;
TRAP: failed Assert("list != NIL"), File: "../../../../src/include/nodes/pg_list.h", Line: 322, PID: 1783003
ExceptionalCondition at assert.c:51:13
list_last_cell at pg_list.h:323:14
RelationBuildPublicationDesc at relcache.c:5847:23
CheckCmdReplicaIdentity at execReplication.c:1068:5
CheckValidResultRel at execMain.c:1094:7
ExecInitModifyTable at nodeModifyTable.c:5299:16
ExecInitNode at execProcnode.c:177:27
InitPlan at execMain.c:1002:14
standard_ExecutorStart at execMain.c:274:2
ExecutorStart at execMain.c:140:1
ProcessQuery at pquery.c:162:2
PortalRunMulti at pquery.c:1269:5
PortalRun at pquery.c:788:5
exec_simple_query at postgres.c:1297:11
PostgresMain at postgres.c:4860:27
BackendInitialize at backend_startup.c:142:1
postmaster_child_launch at launch_backend.c:269:3
BackendStartup at postmaster.c:3627:8
ServerLoop at postmaster.c:1731:10
PostmasterMain at postmaster.c:1415:11
main at main.c:236:2
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7f6cd522a28b]
postgres: law regression [local] DELETE(_start+0x25)[0x5a801c1ff595]

> To prevent incomplete cache invalidation during an abort, we probably
> need to avoid processing interrupts and ensure the process does not
> error out. Otherwise, as you demonstrated, we risk leaving the
> relcache in an inconsistent state where a stale entry remains even
> after a transaction is rolled back.

Yes, if there is no guarantee that other errors can't occur down that path,
probably just preventing CHECK_FOR_INTERRUPTS won't be sufficient.

Best regards,
Alexander

In response to

Browse pgsql-hackers by date

  From Date Subject
Previous Message Jonathan Gonzalez V. 2026-06-21 17:25:26 Re: Require SSL connection to postgres for oauth