Re: Damage control for planner's get_actual_variable_endpoint() runaway

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
Subject: Re: Damage control for planner's get_actual_variable_endpoint() runaway
Date: 2022-11-21 14:48:27
Message-ID: CA+TgmoYNPP4NgwhuPeL1Xb0QCaNYQ4B_z4DH5sEXVXVEob_xhA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 21, 2022 at 7:22 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> On 2022-Nov-21, Jakub Wartak wrote:
> > b) I was wondering about creating a new wait class "Planner" with the
> > event "ReadingMinMaxIndex" (or similar). The obvious drawback is the
> > naming/categorization as wait events are ... well.. as the name "WAIT"
> > implies, while those btree lookups could easily be ON-CPU activity.
>
> I think we should definitely do this, regardless of any other fixes we
> add, so that this condition can be identified more easily. I wonder if
> we can backpatch it safely.

I don't think this is safe at all. Wait events can only bracket
individual operations, like the reads of the individual index blocks,
not report on which phase of a larger operation is in progress. If we
try to make them do the latter, we will have a hot mess on our hands.
It might not be a bad capability to have, but it's a different system.

But that doesn't mean we can't do anything about this problem, and I
think we should do something about this problem. It's completely crazy
that after this many years of people getting hosed by this, we haven't
taken more than half measures to fix the problem. I think commit
3ca930fc39ccf987c1c22fd04a1e7463b5dd0dfd was the last time we poked at
this, and before that there was commit
fccebe421d0c410e6378fb281419442c84759213, but neither of those
prevented us from scanning an unbounded number of index tuples before
finding one that we're willing to use, as the commit messages pretty
much admit.

What we need is a solution that avoids reading an unbounded number of
tuples under any circumstances. I previously suggested using
SnapshotAny here, but Tom didn't like that. I'm not sure if there are
safety issues there or if Tom was just concerned about the results
being misleading. Either way, maybe there's some variant on that theme
that could work. For instance, could we teach the index scan to stop
if the first 100 tuples that it finds are all invisible? Or to reach
at most 1 page, or at most 10 pages, or something? If we don't find a
match, we could either try to use a dead tuple, or we could just
return false which, I think, would end up using the value from
pg_statistic rather than any updated value. That is of course not a
great outcome, but it is WAY WAY BETTER than spending OVER AN HOUR
looking for a more suitable tuple, as Jakub describes having seen on a
production system.

I really can't understand why this is even mildly controversial. What
exactly to do here may be debatable, but the idea that it's OK to
spend an unbounded amount of resources during any phase of planning is
clearly wrong. We can say that at the time we wrote the code we didn't
know it was going to actually ever happen, and that is fine and true.
But there have been multiple reports of this over the years and we
know *for sure* that spending totally unreasonable amounts of time
inside this function is a real-world problem that actually brings down
production systems. Unless and until we find a way of putting a tight
bound on the amount of effort that can be expended here, that's going
to keep happening.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-11-21 15:01:15 Re: Damage control for planner's get_actual_variable_endpoint() runaway
Previous Message Alexander Pyhalov 2022-11-21 14:44:22 Re: CREATE INDEX CONCURRENTLY on partitioned index