Re: Add sub-transaction overflow status in pg_stat_activity

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add sub-transaction overflow status in pg_stat_activity
Date: 2022-11-14 15:09:57
Message-ID: CA+TgmoZT0sB99dK2LWrS3txmiEDnOHNdAEi+eSYdUkK4qqqCow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 21, 2022 at 7:45 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > It feels to me like far too much effort is being invested in fundamentally
> > the wrong direction here. If the subxact overflow business is causing
> > real-world performance problems, let's find a way to fix that, not put
> > effort into monitoring tools that do little to actually alleviate anyone's
> > pain.
>
> There seems to be some agreement on this (I certainly do agree). Thus it seems
> we should mark the CF entry as rejected?

I don't think I agree with this outcome, for two reasons.

First, we're just talking about an extra couple of columns in
pg_stat_activity here, which does not seem like a heavy price to pay.
I'm not even sure we need two columns; I think we could get down to
one pretty easily. Rough idea: number of cached subtransaction XIDs if
not overflowed, else NULL. Or if that's likely to create 0/NULL
confusion, then maybe just a Boolean, overflowed or not.

Second, the problem seems pretty fundamental to me. Shared memory is
fixed size, so we cannot use it to store an unbounded number of
subtransaction IDs. We could perhaps rejigger things to be more
memory-efficient in some way, but no matter how many subtransaction
XIDs you can keep in shared memory, the user can always consume that
number plus one -- unless you allow for ~2^31 in shared memory, which
seems unrealistic. To me, that means that overflowed snapshots are not
going away. We could make them less painful by rewriting the SLRU
stuff to be more efficient, and I bet that's possible, but I think
it's probably hard, or someone would have gotten it done by now. This
has been sucking for a long time and I see no evidence that progress
is imminent. Even if it happens, it is unlikely that it will be a full
solution. If it were possible to make SLRU lookups fast enough not to
matter, we wouldn't need to have hint bits, but in reality we do have
them and attempts to get rid of them have not gone well up until now,
and in my opinion probably never will.

The way that I view this problem is that it is relatively rare but
hard for some users to troubleshoot. I think I've seen it come up
multiple times, and judging from the earlier responses on this thread,
several other people here have, too. In my experience, the problem is
inevitably that someone has a DML statement inside a plpgsql EXCEPTION
block inside a plpgsql loop. Concurrently with that, they are running
a lot of queries that look at recently modified data, so that the
overflowed snapshot trigger SLRU lookups often enough to matter. How
is a user supposed to identify which backend is causing the problem,
as things stand today? I have generally given people the advice to go
find the DML inside of a plpgsql EXCEPTION block inside of a loop, but
some users have trouble doing that. The DBA who is observing the
performance problem is not necessarily the developer who wrote all of
the PL code, and the PL code may be large and badly formatted and
there could be a bunch of EXCEPTION blocks and it might not be clear
which one is the problem. The exception block could be calling another
function or procedure that does the actual DML rather than doing it
directly, and the loop surrounding it might not be in the same
function or procedure but in some other one that calls it, or it could
be called repeatedly from the SQL level.

I think I fundamentally disagree with the idea that we should refuse
to expose instrumentation data because some day the internals might
change. If we accepted that argument categorically, we wouldn't have
things like backend_xmin or backend_xid in pg_stat_activity, or wait
events either, but we do have those things and users find them useful.
They suck in the sense that you need to know quite a bit about how the
internals work in order to use them to find problems, but people who
want to support production PostgreSQL instances have to learn about
how those internals work one way or the other because they
demonstrably matter. It is absolutely stellar when we can say "hey, we
don't need to have a way for users to see what's going on here
internally because they don't ever need to care," but once it is
established that they do need to care, we should let them see directly
the data they need to care about rather than forcing them to
troubleshoot the problem in some more roundabout way like auditing all
of the code and guessing which part is the problem, or writing custom
dtrace scripts to run on their production instances.

If and when it happens that a field like backend_xmin or the new ones
proposed here are no longer relevant, we can just remove them from the
monitoring views. Yeah, that's a backward compatibility break, and
there's some pain associated with that. But we have demonstrated that
we are perfectly willing to incur the pain associated with adding new
columns when there is new and valuable information to display, and
that is equally a compatibility break, in the sense that it has about
the same chance of making pg_upgrade fail.

In short, I think this is a good idea, and if somebody thinks that we
should solve the underlying problem instead, I'd like to hear what
people think a realistic solution might be. Because to me, it looks
like we're refusing to commit a patch that probably took an hour to
write because with 10 years of engineering effort we could *maybe* fix
the root cause.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2022-11-14 15:18:47 Re: Small TAP improvements
Previous Message Tom Lane 2022-11-14 14:57:18 List of Bitmapset (was Re: ExecRTCheckPerms() and many prunable partitions)