[BUG] Excessive memory usage with update on STORED generated columns.

From: "Anton A(dot) Melnikov" <a(dot)melnikov(at)postgrespro(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: [BUG] Excessive memory usage with update on STORED generated columns.
Date: 2026-03-30 14:25:29
Message-ID: ddc34dbd-1efc-4710-824c-e101e7eb63e6@postgrespro.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

My colleagues found that a queries like that:

\timing

DROP TABLE IF EXISTS t;

CREATE TABLE t (
id int,
a int,
b int,
g text GENERATED ALWAYS AS (
lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0') ||
lpad((CASE WHEN a <> 0 THEN a ELSE b END)::text, 10, '0')
) STORED
);

INSERT INTO t
SELECT 1, 100, 0
FROM generate_series(1, 1000000);

UPDATE t SET id = 2; -- < problem query

lead to excessive memory consumption up to 10Gb in this example and
query execution time up to ~1,5min.

Bisect shows that the problem appeared after commit 83ea6c540
(Virtual generated columns).

Before this commit the update query took only ~8s and the memory
consumption did not exceed 150Mb for this backend.
MemoryContextStats reports only a small amount of memory usage, while
malloc_stats() confirms large allocations outside PostgreSQL memory
contexts.

With help of massif tool i found repeated allocations originating from:

ExecInitGenerated
→ build_column_default
→ stringToNode

This indicates that generated expressions are reparsed multiple times,
once per row to be updated instead of being reused.

There is a problem call stack during UPDATE t SET id = 2;
execution: see attached bt.txt, please.

Before the above-mentioned commit, ExecInitGenerated() was effectively
invoked once per ResultRelInfo, so this behavior was not observable.

I would like to propose a fix that add a caching of the the parsed
expression trees (Node *) in ResultRelInfo, so that build_column_default()
and stringToNode() are executed at most once per attribute per query.

With this fix, the query execution time
and memory consumption return to normal:

postgres=# UPDATE t SET id = 2;
UPDATE 1000000
Time: 11522,621 ms (00:11,523)

A patch for this approach for current master is attached here.

Would be glad for any feedback.

Best regards,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
bt.txt text/plain 13.4 KB
v1-0001-Avoid-repeated-parsing-of-generated-column-exprs.patch text/x-patch 3.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2026-03-30 14:30:35 Re: Thread-safe getopt()
Previous Message Daniil Davydov 2026-03-30 14:24:01 Re: Get rid of redundant StringInfo accumulation