Re: visibility of open cursors in pg_stat_activity

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: visibility of open cursors in pg_stat_activity
Date: 2023-10-26 17:41:36
Message-ID: 20231026174136.4et3ktuegmtlgxfs@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-10-26 11:47:32 -0400, Robert Haas wrote:
> I've seen situations a few times now where somebody has sessions that
> are "idle in transaction" for a long time but they feel like it should
> be harmless because the transaction has no XID. However, the fact that
> the transaction is idle doesn't mean it isn't running a query, because
> there could be a cursor from which some but not all results were
> fetched. That query is suspended, but still holds a snapshot and thus
> still holds back xmin. You can see this from pg_stat_activity because
> backend_xmin will be set, but I've found that this is easily missed
> and sometimes confusing even when noticed. People don't necessarily
> understand how it's possible to have a snapshot if the session is
> idle. And even if somebody has great understanding of system
> internals, pg_stat_activity doesn't distinguish between a session that
> holds a snapshot because (a) the transaction was started with
> repeatable read or serializable and it has already executed at least
> one command that acquired a snapshot or alternatively (b) the
> transaction has opened some cursors which it has not closed. (Is there
> a (c)? As far as I know, it has to be one of those two things.)

Does it really matter on that level for the user whether a snapshot exists
because of repeatable read or because of a cursor? If users don't understand
backend_xmin - likely largely true - then the consequences of holding a
snapshot open because of repeatable read (or even just catalog snapshots!) is
as severe as an open cursor.

> So I think it would be useful to improve the pg_stat_activity output
> in some way. For instance, the output could say "idle in transaction
> (with open cursors)" or something like that.

Given snapshots held for other reasons, I think we should expose them
similarly, if we do something for cursors. Otherwise people might start to
worry only about idle-txn-with-cursors and not the equally harmful
idle-txn-with-snapshot.

Maybe something roughly like
idle in transaction [with {snapshot|cursor|locks}]
?

> Or we could add a whole new column that specifically gives a count of how
> many cursors the session has open, or how many active cursors, or something
> like that. I'm not exactly clear on the terminology here.

Portals are very weirdly underdocumented and surprisingly complicated :/

> It seems like the thing we internally called a portal is basically a cursor,
> except there's also an unnamed portal that gets used when you run a query
> without using a cursor.

I think you can also basically use an unnamed portal as a cursor with the
extended protocol. The only thing is that there can only be one of them.

The interesting distinction likely is whether we have cursors that are not
active.

> But I think it would be nice to do something, because the current
> situation seems like it's more confusing than it needs to be.

I think it'd be nice to make idle-in-txn a bit more informative. Not sure
though how much that helps most users, it's still quite granular information.

I still would like a view that shows what's holding back the horizon on a
system wide basis. Something like a view with the following columns and one
row for each database

datname
horizon
horizon_cause = {xid, snapshot, prepared_xact, replication_connection, ...}
xid_horizon
xid_horizon_pid
snapshot_horizon
snapshot_horizon_pid
prepared_xact_horizon
prepared_xact_horizon_id
replication_connection_horizon
replication_connection_horizon_pid
physical_slot_horizon
physical_slot_horizon_pid
physical_slot_horizon_name
logical_slot_horizon
logical_slot_horizon_pid
logical_slot_horizon_name

Perhaps with one additional row with a NULL datname showing the system wide
horizons (one database could have the oldest xid_horizon and another the
oldest logical_slot_horizon, so it's not a simple order by).

I recently mused in some other thread that I really would like to have an
approximate xid->timestamp mapping, so that we could assign an age to these in
a unit that makes sense to humans. Particularly snapshots / xmin can be very
confusing in that regard because a relatively recent transaction can hold back
the overall horizon further than the time the transaction started, if some old
transaction was still running at the time.

Perhaps we could add at least timestamps to these in some other
way. E.g. recording a timestamp whenever a transaction is prepared, a slot is
released... Likely recording one whenever a snapshot is acquired would be too
expensive tho - but we could use state_change as an approximation?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2023-10-26 18:42:27 Re: Does UCS_BASIC have the right CTYPE?
Previous Message Bharath Rupireddy 2023-10-26 17:25:00 Re: Introduce a new view for checkpointer related stats