| From: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
|---|---|
| To: | PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
| Subject: | Two bugs around ALTER TYPE |
| Date: | 2026-04-14 11:53:09 |
| Message-ID: | 39977E94-40DD-4176-A191-CE3D120968B2@yandex-team.ru |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
Hi,
Users reported a bug involving ALTER TYPE/TABLE and set-returning PL/pgSQL
functions, so I took a stab at fixing them.
Bug 1: ALTER TYPE/TABLE ADD COLUMN does not invalidate cached SETOF plans
When a PL/pgSQL function executes RETURN QUERY against a SETOF function
returning a named composite type, the SPI plan for that inner query is
cached. If ALTER TYPE ... ADD ATTRIBUTE (or ALTER TABLE ... ADD COLUMN
when the table's rowtype is the return type) is executed afterward, the
cached plan is not marked stale and the next call raises:
ERROR: structure of query does not match function result type
DETAIL: Number of returned columns (3) does not match expected column
count (2).
Reproduction (single session):
CREATE TYPE planinv_ct AS (a int, b int);
CREATE TABLE planinv_tbl (a int, b int);
INSERT INTO planinv_tbl VALUES (1, 2);
CREATE FUNCTION planinv_srf() RETURNS SETOF planinv_ct
LANGUAGE sql STABLE SECURITY DEFINER AS $$
SELECT * FROM planinv_tbl $$;
CREATE FUNCTION planinv_caller() RETURNS SETOF planinv_ct
LANGUAGE plpgsql AS $$
BEGIN
RETURN QUERY SELECT r.* FROM planinv_srf() r;
END; $$;
SELECT * FROM planinv_caller(); -- warms up plan cache
ALTER TYPE planinv_ct ADD ATTRIBUTE c int;
ALTER TABLE planinv_tbl ADD COLUMN c int DEFAULT 99;
SELECT * FROM planinv_caller(); -- ERROR without fix, (1,2,99) with fix
Root cause:
When a plan is finalised, setrefs.c records OIDs of referenced relations
in glob->relationOids so that relcache invalidations can mark it stale.
For RTE_FUNCTION nodes only the function OID is tracked. If the
function's declared return type is a named composite, ALTER TYPE (or
ALTER TABLE for table rowtypes) updates pg_class for that composite
without touching the function OID, so the cached plan never sees the
invalidation.
Note: SECURITY DEFINER is required in the reproducer to prevent the
planner from inlining planinv_srf() into the caller; the bug affects any
non-inlined SETOF function returning a named composite type.
Fix:
Add a helper add_function_rte_relation_deps() in setrefs.c that, for
each RTE_FUNCTION node, resolves the typrelid of the function's return
type via typeOrDomainTypeRelid() and appends it to glob->relationOids.
Call it from add_rte_to_flat_rtable(), flatten_rtes_walker(), and
extract_query_dependencies_walker(). ALTER TYPE/TABLE then triggers
relcache invalidation and seems to force a re-plan.
Bug 2: Concurrent ALTER TYPE/TABLE mid-execution causes mismatch in PL/pgSQL
I was looking for workarounds and proposed the user to add something
that would invalidate plan when they alter a type:
CREATE OR REPLACE FUNCTION planinv_srf()
Users replied that they already do it, and it does not help.
Together we found out that a race window exists inside a single PL/pgSQL call.
The function enters with a 2-column tuple descriptor;
a concurrent transaction commits ALTER TYPE ADD ATTRIBUTE plus CREATE OR
REPLACE FUNCTION while RETURN QUERY is still running. The inner SRF
is replanned with 3 columns, but the tstoreReceiver was already set up
for 2, giving the same error mid-execution.
The race requires precise timing and is reproduced reliably only with
injection points (TAP test included in the patch series). In brief:
Session A: SELECT * FROM planinv_caller(); -- suspended just before
-- RETURN QUERY executes
Session B: BEGIN;
ALTER TYPE planinv_ct ADD ATTRIBUTE c int;
CREATE OR REPLACE FUNCTION planinv_srf() ... 3 columns ...
COMMIT;
Session A resumes: ERROR
The same race exists with ALTER TABLE when the return type is a table
rowtype that is not itself scanned by the function body. When the
function does scan the same table it returns, ALTER TABLE is naturally
blocked by the AccessShareLock already held by the scan; but for
standalone composite types there is no such automatic protection.
Root cause:
plpgsql_estate_setup() captures the return type descriptor with no lock
preventing ALTER TYPE from modifying the composite type's pg_class row
between estate setup and the RETURN QUERY re-plan. Within a single SQL
statement the row shape should be stable, analogous to snapshot
semantics for catalog changes.
Fix:
In plpgsql_estate_setup(), when the function is a SETOF returning a
named composite type (retisset && retistuple), resolve typrelid via
typeOrDomainTypeRelid() and call LockRelationOid(typrelid,
AccessShareLock). The lock is released at transaction end.
This mirrors how table scans acquire AccessShareLock to pin the schema
for statement duration. ALTER TYPE/TABLE requires AccessExclusiveLock
on the same relid, so it blocks behind the in-progress call.
Thanks to Victor Popov for reporting and adapting single-session test case
for the race condition.
Regards, Andrey Borodin.
| Attachment | Content-Type | Size |
|---|---|---|
| 0001-Add-regression-test-for-SETOF-composite-invalidation.patch | application/octet-stream | 5.1 KB |
| 0002-Track-RTE_FUNCTION-composite-rowtype-dependencies.patch | application/octet-stream | 4.1 KB |
| 0003-Add-TAP-test-for-plpgsql-RETURN-QUERY-DDL-race.patch | application/octet-stream | 5.0 KB |
| 0004-Lock-SETOF-composite-return-type-during-plpgsql-exec.patch | application/octet-stream | 3.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | Henson Choi | 2026-04-14 10:47:07 | Re: BUG #19354: JOHAB rejects valid byte sequences |