Re: A micro-optimisation for ProcSendSignal()

From: Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, ashwinstar(at)gmail(dot)com
Subject: Re: A micro-optimisation for ProcSendSignal()
Date: 2021-07-24 05:26:17
Message-ID: CAE-ML+__zDoUfwh9ntAhwMF=rJkPoK3iF=RY5Du05FSmHidBEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

HI Thomas,

On Tue, Jul 20, 2021 at 10:40 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:

> > Slightly tangential: we should add a comment to PGPROC.pgprocno, for more
> > immediate understandability:
> >
> > + int pgprocno; /* index of this PGPROC in ProcGlobal->allProcs */
>
> I wonder why we need this member anyway, when you can compute it from
> the address... #define GetPGProcNumber(p) ((p) - ProcGlobal->allProcs)
> or something like that? Kinda wonder why we don't use
> GetPGProcByNumber() in more places instead of open-coding access to
> ProcGlobal->allProcs, too...

I tried this out. See attached v4 of your patch with these changes.

> > Also, why don't we take the opportunity to get rid of SERIALIZABLEXACT->pid? We
> > took a stab. Attached is v2 of your patch with these changes.
>
> SERIALIZABLEXACT objects can live longer than the backends that
> created them. They hang around to sabotage other transactions' plans,
> depending on what else they overlapped with before they committed.
> With that change, the pg_locks view might show the pid of some
> unrelated session that moves into the same PGPROC.

I see.

>
> It's only an "informational" pid, and pids are imperfect information
> anyway because (1) they are themselves recycled, and (2) they won't be
> interesting in a hypothetical multi-threaded future. One solution
> would be to hide the pids from the view after the backend disconnects
> (somehow -- add a generation number?), but they're also still kinda
> useful, despite the weaknesses. I wonder what the ideal way would be
> to refer to sessions, anyway, including those that are no longer
> active; perhaps we could invent a new "session ID" concept.

Updating the pg_locks view:

Yes, the pids may be valuable for future debugging/audit purposes. Also,
systems where pid_max is high enough to not see wraparound, will have
pids that are not recycled. I would lean towards showing the pid even
after the backend has exited.

Perhaps we could overload the stored pid to be negated (i.e. a backend
with pid 20000 will become -20000) to indicate that the pid belongs to
a backend that has exited. Additionally, we could introduce a boolean
field in pg_locks "backendisalive", so that the end user doesn't have
to reason about negative pids.

Session ID:

Interesting, Greenplum uses the concept of session ID pervasively. Being
a distributed database, the session ID helps to tie individual backends
on each PG instance to the same session. The session ID of course comes
with its share of bookkeeping:

* These IDs are incrementally dished out with a counter
(with pg_atomic_add_fetch_u32), in increments of 1, on the Greenplum
coordinator PG instance in InitProcess.

* The counter is a part of ProcGlobal and itself is initialized to 0 in
InitProcGlobal, which means that session IDs are reset on cluster
restart.

* The sessionID tied to each proceess is maintained in PGPROC.

* The sessionID finds its way into PgBackendStatus, which is further
reported with pg_stat_get_activity.

A session ID seems a bit heavy just to indicate whether a backend has
exited.

Regards,
Soumyadeep

Attachment Content-Type Size
v4-0001-Optimize-ProcSendSignal.patch text/x-patch 19.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2021-07-24 05:39:53 proposal: plpgsql: special comments that will be part of AST
Previous Message Robert Haas 2021-07-24 01:02:29 Re: Followup Timestamp to timestamp with TZ conversion