| From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
|---|---|
| To: | Rahila Syed <rahilasyed90(at)gmail(dot)com> |
| Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Error while processing invalidation message during ATTACH PARTITION leaves invalid relcache entry |
| Date: | 2026-06-21 18:00:00 |
| Message-ID: | 217846e2-d03c-4231-9959-3479cd65c4ae@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello Rahila,
16.06.2026 07:42, Rahila Syed wrote:
> Hi Alexander,
>
> Thank you for the report. This is an interesting case of incomplete or
> incorrect error handling.
>
> Regarding the code path in LocalExecuteInvalidationMessage:
>
> (This can seem dubious, but I guess there could be other (perhaps more
> sophisticated) ways to trigger an error somewhere inside
> LocalExecuteInvalidationMessage() -> RelationCacheInvalidateEntry() ->
> RelationFlushRelation() -> RelationRebuildRelation() ->
> RelationBuildDesc() -> RelationBuildTupleDesc() -> systable_getnext()...)
>
> I wonder if we should prevent adding CHECK_FOR_INTERRUPTS (CFI) calls
> in this path. A quick search did not reveal any existing CFI calls
> here. In your example, the CFI is triggered by the elog(LOG, "") added
> to the code as part of your testing.
Thank you for the reply!
I've found a way to reproduce this without any code modifications:
for i in {1..100}; do
echo "ITERATION $i"
(for n in {1..10}; do
psql -qAt -c "SELECT pg_cancel_backend(pid) FROM pg_stat_activity WHERE query LIKE 'ALTER TABLE%'";
done;) &
cat << EOF | psql >>psql.log
CREATE TABLE pt (a int, $(seq -s, -f 'c%g int' 100)) PARTITION BY LIST (a);
CREATE INDEX ON pt(a);
CREATE INDEX ON pt(a);
CREATE INDEX ON pt(a);
CREATE INDEX ON pt(a);
CREATE INDEX ON pt(a);
CREATE TABLE tp1 (LIKE pt);
INSERT INTO tp1 (a) VALUES (1);
ALTER TABLE pt ATTACH PARTITION tp1 FOR VALUES IN (1);
DELETE FROM tp1;
EOF
wait
psql -v ON_ERROR_STOP=1 -c "DROP TABLE pt, tp1" || break;
done
It fails for me as below:
ITERATION 73
t
ERROR: canceling statement due to user request
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost
psql: error: connection to server on socket "/tmp/.s.PGSQL.15432" failed: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
2026-06-21 08:35:59.128 EEST [1134495:8] psql ERROR: canceling statement due to user request
2026-06-21 08:35:59.128 EEST [1134495:9] psql BACKTRACE:
ProcessInterrupts at postgres.c:3548:4
heap_multi_insert at heapam.c:2367:6
CatalogTuplesMultiInsertWithInfo at indexing.c:287:11
recordMultipleDependencies at pg_depend.c:159:22
recordDependencyOn at pg_depend.c:56:1
StoreCatalogInheritance1 at tablecmds.c:3650:2
CreateInheritance at tablecmds.c:17688:2
attachPartitionTable at tablecmds.c:20516:2
ATExecAttachPartition at tablecmds.c:20777:24
ATExecCmd at tablecmds.c:5727:15
ATRewriteCatalogs at tablecmds.c:5401:4
ATController at tablecmds.c:4954:2
AlterTable at tablecmds.c:4602:1
ProcessUtilitySlow at utility.c:1327:7
standard_ProcessUtility at utility.c:1072:4
ProcessUtility at utility.c:528:3
PortalRunUtility at pquery.c:1149:2
PortalRunMulti at pquery.c:1307:5
PortalRun at pquery.c:788:5
exec_simple_query at postgres.c:1297:11
PostgresMain at postgres.c:4869:27
BackendInitialize at backend_startup.c:142:1
postmaster_child_launch at launch_backend.c:269:3
BackendStartup at postmaster.c:3627:8
ServerLoop at postmaster.c:1731:10
PostmasterMain at postmaster.c:1415:11
main at main.c:236:2
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b) [0x75a5c9c2a28b]
postgres: law regression [local] ALTER TABLE(_start+0x25) [0x653adc53c595]
2026-06-21 08:35:59.128 EEST [1134495:10] psql STATEMENT: ALTER TABLE pt ATTACH PARTITION tp1 FOR VALUES IN (1);
2026-06-21 19:49:42.527 EEST [1783003:16] psql LOG: statement: DELETE FROM tp1;
TRAP: failed Assert("list != NIL"), File: "../../../../src/include/nodes/pg_list.h", Line: 322, PID: 1783003
ExceptionalCondition at assert.c:51:13
list_last_cell at pg_list.h:323:14
RelationBuildPublicationDesc at relcache.c:5847:23
CheckCmdReplicaIdentity at execReplication.c:1068:5
CheckValidResultRel at execMain.c:1094:7
ExecInitModifyTable at nodeModifyTable.c:5299:16
ExecInitNode at execProcnode.c:177:27
InitPlan at execMain.c:1002:14
standard_ExecutorStart at execMain.c:274:2
ExecutorStart at execMain.c:140:1
ProcessQuery at pquery.c:162:2
PortalRunMulti at pquery.c:1269:5
PortalRun at pquery.c:788:5
exec_simple_query at postgres.c:1297:11
PostgresMain at postgres.c:4860:27
BackendInitialize at backend_startup.c:142:1
postmaster_child_launch at launch_backend.c:269:3
BackendStartup at postmaster.c:3627:8
ServerLoop at postmaster.c:1731:10
PostmasterMain at postmaster.c:1415:11
main at main.c:236:2
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7f6cd522a28b]
postgres: law regression [local] DELETE(_start+0x25)[0x5a801c1ff595]
> To prevent incomplete cache invalidation during an abort, we probably
> need to avoid processing interrupts and ensure the process does not
> error out. Otherwise, as you demonstrated, we risk leaving the
> relcache in an inconsistent state where a stale entry remains even
> after a transaction is rolled back.
Yes, if there is no guarantee that other errors can't occur down that path,
probably just preventing CHECK_FOR_INTERRUPTS won't be sufficient.
Best regards,
Alexander
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | Jonathan Gonzalez V. | 2026-06-21 17:25:26 | Re: Require SSL connection to postgres for oauth |