| From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
|---|---|
| To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Error while processing invalidation message during ATTACH PARTITION leaves invalid relcache entry |
| Date: | 2026-06-07 10:00:00 |
| Message-ID: | 21363eb7-606c-468d-88f4-c14162ddafc8@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello hackers,
I've come across an interesting anomaly related to error processing. The
following script:
for i in {1..100}; do
echo "ITERATION $i"
(for n in {1..10}; do
psql -qAt -c "SELECT pg_cancel_backend(pid) FROM pg_stat_activity WHERE query LIKE 'ALTER TABLE%'";
done;) &
cat << EOF | psql >>psql.log
CREATE TABLE pt (a int) PARTITION BY LIST (a);
CREATE TABLE tp1 (LIKE pt);
INSERT INTO tp1 VALUES (1);
ALTER TABLE pt ATTACH PARTITION tp1 FOR VALUES IN (1);
DELETE FROM tp1;
EOF
wait
psql -v ON_ERROR_STOP=1 -c "DROP TABLE pt, tp1" || break;
done
With the following patch applied:
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -850,6 +850,7 @@ LocalExecuteInvalidationMessage(SharedInvalidationMessage *msg)
{
int i;
+pg_usleep(1000); elog(LOG, "!!!LocalExecuteInvalidationMessage| msg->rc.relId: %d", msg->rc.relId);
if (msg->rc.relId == InvalidOid)
RelationCacheInvalidate(false);
else
(This can seem dubious, but I guess there could be other (perhaps more
sophisticated) ways to trigger an error somewhere inside
LocalExecuteInvalidationMessage() -> RelationCacheInvalidateEntry() ->
RelationFlushRelation() -> RelationRebuildRelation() ->
RelationBuildDesc() -> RelationBuildTupleDesc() -> systable_getnext()...)
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -5838,6 +5838,7 @@ RelationBuildPublicationDesc(Relation relation, PublicationDesc *pubdesc)
schemaid = RelationGetNamespace(relation);
puboids = list_concat_unique_oid(puboids, GetSchemaPublications(schemaid));
+elog(LOG, "!!!RelationBuildPublicationDesc| relid: %d, relation->rd_rel->relispartition: %d", relid,
relation->rd_rel->relispartition);
if (relation->rd_rel->relispartition)
{
Oid last_ancestor_relid;
triggers a server crash for me as below:
ITERATION 9
t
ERROR: canceling statement due to user request
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
server.log contains (with backtrace_functions = 'ProcessInterrupts'):
2026-06-07 12:13:34.271 EEST [3362235] LOG: !!!LocalExecuteInvalidationMessage| msg->rc.relId: 16436
2026-06-07 12:13:34.271 EEST [3362235] STATEMENT: ALTER TABLE pt ATTACH PARTITION tp1 FOR VALUES IN (1);
2026-06-07 12:13:34.272 EEST [3362235] LOG: !!!LocalExecuteInvalidationMessage| msg->rc.relId: 16433
2026-06-07 12:13:34.272 EEST [3362235] STATEMENT: ALTER TABLE pt ATTACH PARTITION tp1 FOR VALUES IN (1);
2026-06-07 12:13:34.273 EEST [3362235] ERROR: canceling statement due to user request
2026-06-07 12:13:34.273 EEST [3362235] BACKTRACE:
ProcessInterrupts at postgres.c:3539:4
errfinish at elog.c:630:1
LocalExecuteInvalidationMessage at inval.c:854:15
ProcessInvalidationMessages at inval.c:578:2
CommandEndInvalidationMessages at inval.c:1421:6
AtCCI_LocalCache at xact.c:1634:1
CommandCounterIncrement at xact.c:1171:1
StorePartitionBound at heap.c:4146:3
attachPartitionTable at tablecmds.c:20544:2
ATExecAttachPartition at tablecmds.c:20809:24
ATExecCmd at tablecmds.c:5727:15
ATRewriteCatalogs at tablecmds.c:5401:4
ATController at tablecmds.c:4954:2
AlterTable at tablecmds.c:4602:1
ProcessUtilitySlow at utility.c:1327:7
...
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b) [0x791d2462a28b]
postgres: law regression [local] ALTER TABLE(_start+0x25) [0x55960ca1e135]
2026-06-07 12:13:34.273 EEST [3362235] STATEMENT: ALTER TABLE pt ATTACH PARTITION tp1 FOR VALUES IN (1);
2026-06-07 12:13:34.273 EEST [3362235] LOG: !!!RelationBuildPublicationDesc| relid: 16436,
relation->rd_rel->relispartition: 1
2026-06-07 12:13:34.273 EEST [3362235] STATEMENT: DELETE FROM tp1;
TRAP: failed Assert("list != NIL"), File: "../../../../src/include/nodes/pg_list.h", Line: 322, PID: 3362235
ExceptionalCondition at assert.c:51:13
list_last_cell at pg_list.h:323:14
RelationBuildPublicationDesc at relcache.c:5848:23
CheckCmdReplicaIdentity at execReplication.c:1068:5
CheckValidResultRel at execMain.c:1094:7
ExecInitModifyTable at nodeModifyTable.c:5302:16
ExecInitNode at execProcnode.c:177:27
InitPlan at execMain.c:1002:14
standard_ExecutorStart at execMain.c:274:2
...
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x791d2462a28b]
postgres: law regression [local] DELETE(_start+0x25)[0x55960ca1e135]
2026-06-07 12:13:34.437 EEST [3361937] LOG: client backend (PID 3362235) was terminated by signal 6: Aborted
2026-06-07 12:13:34.437 EEST [3361937] DETAIL: Failed process was running: DELETE FROM tp1;
That is, despite the ATTACH PARTITION transaction rolled back, local
relcache contains the stale entry for tp1, which has relispartition set.
Best regards,
Alexander
| From | Date | Subject | |
|---|---|---|---|
| Next Message | JoongHyuk Shin | 2026-06-07 10:30:02 | Re: [PATCH] Don't call ereport(ERROR) from recovery target GUC assign hooks |
| Previous Message | Pavel Stehule | 2026-06-07 08:39:21 | Re: bugfix - fix broken output in expanded aligned format, when data are too short |