identifying the backend that owns a temporary schema

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: identifying the backend that owns a temporary schema
Date: 2022-08-15 20:58:11
Message-ID: 20220815205811.GA250990@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

As Greg Stark noted elsewhere [0], it is presently very difficult to
identify the PID of the session using a temporary schema, which is
particularly unfortunate when a temporary table is putting a cluster in
danger of transaction ID wraparound. I noted [1] that the following query
can be used to identify the PID for a given backend ID:

SELECT bid, pg_stat_get_backend_pid(bid) AS pid FROM pg_stat_get_backend_idset() bid;

But on closer inspection, this is just plain wrong. The backend IDs
returned by pg_stat_get_backend_idset() might initially bear some
resemblance to the backend IDs stored in PGPROC, so my suggested query
might work some of the time, but the pg_stat_get_backend_* backend IDs
typically diverge from the PGPROC backend IDs as sessions connect and
disconnect.

I think it would be nice to have a reliable way to discover the PID for a
given temporary schema via SQL. The other thread [2] introduces a helpful
log message that indicates the PID for temporary tables that are in danger
of causing transaction ID wraparound, and I intend for this proposal to be
complementary to that work.

At first, I thought about adding a new function for retrieving the PGPROC
backend IDs, but I am worried that having two sets of backend IDs would be
even more confusing than having one set that can't reliably be used for
temporary schemas. Instead, I tried adjusting the pg_stat_get_backend_*()
suite of functions to use the PGPROC backend IDs. This ended up being
simpler than anticipated. I added a backend_id field to the
LocalPgBackendStatus struct (which is populated within
pgstat_read_current_status()), and I changed pgstat_fetch_stat_beentry() to
bsearch() for the entry with the given PGPROC backend ID.

This does result in a small behavior change. Currently,
pg_stat_get_backend_idset() simply returns a range of numbers (1 to the
number of active backends). With the attached patch, this function will
still return a set of numbers, but there might be gaps between the IDs, and
the maximum backend ID will usually be greater than the number of active
backends. I suppose this might break some existing uses, but I'm not sure
how much we should worry about that. IMO uniting the backend IDs is a net
improvement.

Thoughts?

[0] https://postgr.es/m/CAM-w4HPCOuJDs4fdkgNdA8FFMeYMULPCAxjPpsOgvCO24KOAVg%40mail.gmail.com
[1] https://postgr.es/m/DDF0D1BC-261D-45C2-961C-5CBDBB41EE71%40amazon.com
[2] https://commitfest.postgresql.org/39/3358/

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v1-0001-Adjust-pg_stat_get_backend_-to-use-backends-PGPRO.patch text/x-diff 7.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-08-15 21:23:04 Re: [PATCH] Optimize json_lex_string by batching character copying
Previous Message Thomas Munro 2022-08-15 20:26:45 Re: Cleaning up historical portability baggage