Re: Add sub-transaction overflow status in pg_stat_activity

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add sub-transaction overflow status in pg_stat_activity
Date: 2022-11-23 20:25:39
Message-ID: CA+Tgmoa1NpWkLzAsiyyZUnicpvd9Eq-FLYHZqvgUjei2tiAycw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 23, 2022 at 2:01 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> I originally thought having this value in pg_stat_activity was overkill,
> but seeing the other internal/warning columns in that view, I think it
> makes sense. Oddly, is our 64 snapshot performance limit even
> documented anywhere? I know it is in Simon's patch I am working on.

If it is, I'm not aware of it. We often don't document things that are
as internal as that.

One thing that I'd really like to see better documented is exactly
what it is that causes a problem. But first we'd have to understand it
ourselves. It's not as simple as "if you have more than 64 subxacts in
any top-level xact, kiss performance good-bye!" because for there to
be a problem, at least one backend (and probably many) have to take
snapshots that include that see that overflowed subxact cache and thus
get marked suboverflowed. Then after that, those snapshots have to be
used often enough that the additional visibility-checking cost becomes
a problem. But it's also not good enough to just use those snapshots
against any old tuples, because tuples that are older than the
snapshot's xmin aren't going to cause additional lookups, nor are
tuples newer than the snapshot's xmax.

So it feels a bit complicated to me to think through the workload
where this really hurts. What I'm imagining is that you need a
relatively long-running transaction that overflows its subxact
limitation but then doesn't commit, so that lots of other backends get
overflowed snapshots, and also so that the xmin and xmax of the
snapshots being taken get further apart. Or maybe you can have a
series short-running transactions that each overflow their subxact
cache briefly, but they overlap, so that there's usually at least 1
around in that state, but in that case I think you need a separate
long-running transaction to push xmin and xmax further apart. Either
way, the backends that get the overflowed snapshots then need to go
look at some table data that's been recently modified, so that there
are xmin and xmax values newer than the snapshot's xmin.

Intuitively, I feel like this should be pretty rare, and largely
avoidable if you just don't use long-running transactions, which is a
good thing to avoid for other reasons anyway. But there may be more to
it than I'm realizing, because I've seen customers hit this issue
multiple times. I wonder whether there's some subtlety to the
triggering conditions that I'm not fully understanding.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-11-23 20:29:43 Re: Document parameter count limit
Previous Message Daniel Verite 2022-11-23 20:18:57 Re: Tests for psql \g and \o