Error while processing invalidation message during ATTACH PARTITION leaves invalid relcache entry

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Error while processing invalidation message during ATTACH PARTITION leaves invalid relcache entry
Date: 2026-06-07 10:00:00
Message-ID: 21363eb7-606c-468d-88f4-c14162ddafc8@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello hackers,

I've come across an interesting anomaly related to error processing. The
following script:
for i in {1..100}; do
echo "ITERATION $i"

(for n in {1..10}; do
psql -qAt -c "SELECT pg_cancel_backend(pid) FROM pg_stat_activity WHERE query LIKE 'ALTER TABLE%'";
done;) &

cat << EOF | psql >>psql.log
CREATE TABLE pt (a int) PARTITION BY LIST (a);
CREATE TABLE tp1 (LIKE pt);
INSERT INTO tp1 VALUES (1);

ALTER TABLE pt ATTACH PARTITION tp1 FOR VALUES IN (1);
DELETE FROM tp1;
EOF
wait

psql -v ON_ERROR_STOP=1 -c "DROP TABLE pt, tp1" || break;
done

With the following patch applied:
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -850,6 +850,7 @@ LocalExecuteInvalidationMessage(SharedInvalidationMessage *msg)
         {
             int         i;

+pg_usleep(1000); elog(LOG, "!!!LocalExecuteInvalidationMessage| msg->rc.relId: %d",  msg->rc.relId);
             if (msg->rc.relId == InvalidOid)
                 RelationCacheInvalidate(false);
             else

(This can seem dubious, but I guess there could be other (perhaps more
sophisticated) ways to trigger an error somewhere inside
LocalExecuteInvalidationMessage() -> RelationCacheInvalidateEntry() ->
RelationFlushRelation() -> RelationRebuildRelation() ->
RelationBuildDesc() -> RelationBuildTupleDesc() -> systable_getnext()...)

--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -5838,6 +5838,7 @@ RelationBuildPublicationDesc(Relation relation, PublicationDesc *pubdesc)
     schemaid = RelationGetNamespace(relation);
     puboids = list_concat_unique_oid(puboids, GetSchemaPublications(schemaid));

+elog(LOG, "!!!RelationBuildPublicationDesc| relid: %d, relation->rd_rel->relispartition: %d", relid,
relation->rd_rel->relispartition);
     if (relation->rd_rel->relispartition)
     {
         Oid         last_ancestor_relid;

triggers a server crash for me as below:
ITERATION 9
t
ERROR:  canceling statement due to user request
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

server.log contains (with backtrace_functions = 'ProcessInterrupts'):
2026-06-07 12:13:34.271 EEST [3362235] LOG: !!!LocalExecuteInvalidationMessage| msg->rc.relId: 16436
2026-06-07 12:13:34.271 EEST [3362235] STATEMENT:  ALTER TABLE pt ATTACH PARTITION tp1 FOR VALUES IN (1);
2026-06-07 12:13:34.272 EEST [3362235] LOG: !!!LocalExecuteInvalidationMessage| msg->rc.relId: 16433
2026-06-07 12:13:34.272 EEST [3362235] STATEMENT:  ALTER TABLE pt ATTACH PARTITION tp1 FOR VALUES IN (1);
2026-06-07 12:13:34.273 EEST [3362235] ERROR:  canceling statement due to user request
2026-06-07 12:13:34.273 EEST [3362235] BACKTRACE:
ProcessInterrupts at postgres.c:3539:4
errfinish at elog.c:630:1
LocalExecuteInvalidationMessage at inval.c:854:15
ProcessInvalidationMessages at inval.c:578:2
CommandEndInvalidationMessages at inval.c:1421:6
AtCCI_LocalCache at xact.c:1634:1
CommandCounterIncrement at xact.c:1171:1
StorePartitionBound at heap.c:4146:3
attachPartitionTable at tablecmds.c:20544:2
ATExecAttachPartition at tablecmds.c:20809:24
ATExecCmd at tablecmds.c:5727:15
ATRewriteCatalogs at tablecmds.c:5401:4
ATController at tablecmds.c:4954:2
AlterTable at tablecmds.c:4602:1
ProcessUtilitySlow at utility.c:1327:7
...
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b) [0x791d2462a28b]
    postgres: law regression [local] ALTER TABLE(_start+0x25) [0x55960ca1e135]
2026-06-07 12:13:34.273 EEST [3362235] STATEMENT:  ALTER TABLE pt ATTACH PARTITION tp1 FOR VALUES IN (1);
2026-06-07 12:13:34.273 EEST [3362235] LOG: !!!RelationBuildPublicationDesc| relid: 16436,
relation->rd_rel->relispartition: 1
2026-06-07 12:13:34.273 EEST [3362235] STATEMENT:  DELETE FROM tp1;
TRAP: failed Assert("list != NIL"), File: "../../../../src/include/nodes/pg_list.h", Line: 322, PID: 3362235
ExceptionalCondition at assert.c:51:13
list_last_cell at pg_list.h:323:14
RelationBuildPublicationDesc at relcache.c:5848:23
CheckCmdReplicaIdentity at execReplication.c:1068:5
CheckValidResultRel at execMain.c:1094:7
ExecInitModifyTable at nodeModifyTable.c:5302:16
ExecInitNode at execProcnode.c:177:27
InitPlan at execMain.c:1002:14
standard_ExecutorStart at execMain.c:274:2
...
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x791d2462a28b]
postgres: law regression [local] DELETE(_start+0x25)[0x55960ca1e135]
2026-06-07 12:13:34.437 EEST [3361937] LOG:  client backend (PID 3362235) was terminated by signal 6: Aborted
2026-06-07 12:13:34.437 EEST [3361937] DETAIL:  Failed process was running: DELETE FROM tp1;

That is, despite the ATTACH PARTITION transaction rolled back, local
relcache contains the stale entry for tp1, which has relispartition set.

Best regards,
Alexander

Browse pgsql-hackers by date

  From Date Subject
Next Message JoongHyuk Shin 2026-06-07 10:30:02 Re: [PATCH] Don't call ereport(ERROR) from recovery target GUC assign hooks
Previous Message Pavel Stehule 2026-06-07 08:39:21 Re: bugfix - fix broken output in expanded aligned format, when data are too short