Re: Generalize ereport_startup_progress infrastructure

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Generalize ereport_startup_progress infrastructure
Date: 2022-09-29 11:57:54
Message-ID: CALj2ACVBej9d55dvaaC74MMrxxDNv-orirEArDZnXLSgpQPWDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 17, 2022 at 8:44 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> Well, I don't agree that either of the proposed new uses of this
> infrastructure are the right way to solve the problems in question, so
> worrying about how to name the GUCs when we have a bunch of uses of
> this infrastructure seems to me to be premature.

Agreed.

> The proposed use in
> the postmaster doesn't look very safe, so you either need to give up
> on that or figure out a way to make it safe.

Is registering a SIGALRM handler in postmaster not a good idea? Is
setting the MyLatch conditionally [1] a concern?

I agree that the handle_sig_alarm() code for postmaster may not look
good as it holds interrupts and does a bunch of other things. But is
it a bigger issue?

> The proposed use in the
> checkpointer looks like it needs more design work, because it's not
> clear whether or how it should interact with log_checkpoints. While I
> agree that changing log_checkpoints into an integer value doesn't
> necessarily make sense, having some kind of new checkpoint logging
> that is completely unrelated to existing checkpoint logging doesn't
> necessarily make sense to me either.

Hm. Yes, we cannot forget about log_checkpoints while considering
adding more logs and controls with other GUCs. We could say that one
needs to enable both log_checkpoints and the progress report GUC, but
that's not great from usability perspective.

> I do have some sympathy with the idea that if people care about
> operations that unexpectedly run for a long time, they probably care
> about all of them, and probably don't care about changing the timeout
> or even the enable switch for each one individually.

I've seen the cases myself and asked by many about the server being
unresponsive in the cases where it processes files, for instance, temp
files in postmaster after a restart or snapshot or mapping or
BufferSync() during checkpoint where this sort of progress reporting
would've helped.

Thinking of another approach for reporting file processing alone - a
GUC log_file_processing_traffic = {none, medium, high} or {0, 1, 2,
..... limit} that users can set to emit a file processing log after a
certain number of files. It doesn't require a timeout mechanism, so it
can be used by any process. But, it is specific to just files.

Similar to above but a bit generic, not specific to just file
processing, a GUC log_processing_traffic = {none, medium, high} or {0,
1, 2, ..... limit}.

Thoughts?

[1]
/*
* SIGALRM is always cause for waking anything waiting on the process
* latch.
+ *
+ * Postmaster has no latch associated with it.
*/
- SetLatch(MyLatch);
+ if (MyLatch)
+ SetLatch(MyLatch);

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Damir Belyalov 2022-09-29 13:18:51 Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)
Previous Message Aleksander Alekseev 2022-09-29 11:47:51 Re: Refactor UnpinBuffer()