Re: generic plans and "initial" pruning

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Tender Wang <tndrwang(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Daniel Gustafsson <daniel(at)yesql(dot)se>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Thom Brown <thom(at)linux(dot)com>
Subject: Re: generic plans and "initial" pruning
Date: 2025-05-20 03:06:21
Message-ID: 605328.1747710381@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Amit Langote <amitlangote09(at)gmail(dot)com> writes:
> Pushed after some tweaks to comments and the test case.

My attention was drawn to commit 525392d57 after observing that
Valgrind complained about a memory leak in some code that commit added
to BuildCachedPlan(). I tried to make sense of said code so I could
remove the leak, and eventually arrived at the attached patch, which
is part of a series of leak-fixing things hence the high sequence
number.

Unfortunately, the bad things I speculated about in the added comments
seem to be reality. The second attached file is a test case that
triggers

TRAP: failed Assert("list_length(plan_list) == list_length(plan->stmt_list)"), File: "plancache.c", Line: 1259, PID: 602087

because it adds a DO ALSO rule that causes the rewriter to generate
more PlannedStmts than it did before.

This is quite awful, because it does more than simply break the klugy
(and undocumented) business about keeping the top-level List in a
different context. What it means is that any outside code that is
busy iterating that List is very fundamentally broken: it's not clear
what List index it ought to resume at, except that "the one it was at"
is demonstrably incorrect.

I also don't really believe the (also undocumented) assumption that
such outside code is in between executions of PlannedStmts of the
List and hence can tolerate those being ripped out and replaced.
I have not attempted to build an example, because the one I have
seems sufficiently damning. But I bet that a recursive function
could be constructed in such a way that an outer execution is
still in progress when an inner call triggers UpdateCachedPlan.

Another small problem (much more easily fixable than the above,
probably) is that summarily setting "plan->is_valid = true"
at the end is not okay. We could already have received an
invalidation that should result in marking the plan stale.
(Holding locks on the tables involved is not sufficient to
prevent that, as there are other sources of inval events.)

It's possible that this code can be fixed, but I fear it's
going to involve some really fundamental redesign, which
probably shouldn't be happening after beta1. I think there
is no alternative but to revert for v18.

regards, tom lane

Attachment Content-Type Size
v2-0010-Partially-fix-some-extremely-broken-code-from-52.patch text/x-diff 3.7 KB
break_cached_plan.sql text/plain 778 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2025-05-20 03:08:39 Re: Conflict detection for update_deleted in logical replication
Previous Message Nisha Moond 2025-05-20 03:05:26 Re: Logical Replication of sequences