Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:
Date: 2022-05-08 23:30:06
Message-ID: CA+hUKGKNffOrUqWNzMEf=TGauGVLqz=QB5Kz9axmTw0BgV-a+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, May 7, 2022 at 4:52 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> I think we'll probably also want to invent a way
> to report which backend is holding up progress, since without that
> it's practically impossible for an end user to understand why their
> command is hanging.

Simple idea: how about logging the PID of processes that block
progress for too long? In the attached, I arbitrarily picked 5
seconds as the wait time between LOG messages. Also, DEBUG1 messages
let you see the processing speed on eg build farm animals. Thoughts?

To test this, kill -STOP a random backend, and then try an ALTER
DATABASE SET TABLESPACE in another backend. Example output:

DEBUG: waiting for all backends to process ProcSignalBarrier generation 1
LOG: still waiting for pid 1651417 to accept ProcSignalBarrier
STATEMENT: alter database mydb set tablespace ts1;
LOG: still waiting for pid 1651417 to accept ProcSignalBarrier
STATEMENT: alter database mydb set tablespace ts1;
LOG: still waiting for pid 1651417 to accept ProcSignalBarrier
STATEMENT: alter database mydb set tablespace ts1;
LOG: still waiting for pid 1651417 to accept ProcSignalBarrier
STATEMENT: alter database mydb set tablespace ts1;

... then kill -CONT:

DEBUG: finished waiting for all backends to process ProcSignalBarrier
generation 1

Another thought is that it might be nice to be able to test with a
dummy PSB that doesn't actually do anything. You could use it to see
how fast your system processes it, while doing various other things,
and to find/debug problems in other code that fails to handle
interrupts correctly. Here's an attempt at that. I guess it could go
into a src/test/modules/something instead of core, but on the other
hand the PSB itself has to be in core anyway, so maybe not. Thoughts?
No documentation yet, just seeing if people think this is worth
having... better names/ideas welcome.

To test this, just SELECT pg_test_procsignal_barrier().

Attachment Content-Type Size
0001-Add-logging-for-ProcSignalBarrier-mechanism.patch text/x-patch 2.1 KB
0002-Add-pg_test_procsignal_barrier.patch text/x-patch 2.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2022-05-09 00:01:08 should check interrupts in BuildRelationExtStatistics ?
Previous Message Andres Freund 2022-05-08 22:11:39 Re: failures in t/031_recovery_conflict.pl on CI