Re: when the startup process doesn't (logging startup delays)

From: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Stephen Frost <sfrost(at)snowman(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: when the startup process doesn't (logging startup delays)
Date: 2021-07-23 10:39:47
Message-ID: CAMm1aWb2OzbvOBxB-M+UUEXt0h2nmtNybo5x30b4vyCOLZB_dA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> I still don't see the need for two functions LogStartupProgress and
> LogStartupProgressComplete. Most of the code is duplicate. I think we
> can just do it with a single function something like [1]:

Initially I had written a common function for these 2. You can see
that in the earlier version of the patch. Later separated it since it
was too much for one function. If others also agree to make it common,
I can make that change.

> Why isn't there a
> LogStartupProgressComplete(STARTUP_PROCESS_OP_REDO)? Is it because of
> the below existing log message?
> ereport(LOG,
> (errmsg("redo done at %X/%X system usage: %s",
> LSN_FORMAT_ARGS(ReadRecPtr),
> pg_rusage_show(&ru0))));

Yes. Adding another log message makes it redundant.

> I think it should be, "," after occurred instead of "."
> + * elapsed or not. TRUE if timeout occurred, FALSE otherwise.
> instead of
> + * elapsed or not. TRUE if timeout occurred. FALSE otherwise.

Fixed.

> I think walkdir() should only call LogStartupProgress(FSYNC_IN_PROGRESS, path);
> when action == datadir_fsync_fname.

I agree and fixed it.

> ResetUnloggedRelations() is calling
> ResetUnloggedRelationsInTablespaceDir("base", op);
> before calling InitStartupProgress().

Fixed.

> This shows StartupXLOG() calling ResetUnloggedRelations() twice.
> Should they both be shown ? If so, maybe they should be distinguished as here:
>
> elog(DEBUG1, "resetting unlogged relations: cleanup %d init %d",
> (op & UNLOGGED_RELATION_CLEANUP) != 0,
> (op & UNLOGGED_RELATION_INIT) != 0);

Fixed. Added separate codes to distinguish.

Please find the patch attached.

On Wed, Jul 21, 2021 at 6:43 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
> I think walkdir() should only call LogStartupProgress(FSYNC_IN_PROGRESS, path);
> when action == datadir_fsync_fname.
>
> ResetUnloggedRelations() is calling
> ResetUnloggedRelationsInTablespaceDir("base", op);
> before calling InitStartupProgress().
>
> This shows StartupXLOG() calling ResetUnloggedRelations() twice.
> Should they both be shown ? If so, maybe they should be distinguished as here:
>
> elog(DEBUG1, "resetting unlogged relations: cleanup %d init %d",
> (op & UNLOGGED_RELATION_CLEANUP) != 0,
> (op & UNLOGGED_RELATION_INIT) != 0);
>
> On Wed, Jul 21, 2021 at 12:52:24PM +0530, Nitin Jadhav wrote:
> > 2021-07-20 18:47:32.046 IST [102230] LOG: listening on IPv4 address "127.0.0.1", port 5445
> > 2021-07-20 18:47:32.048 IST [102230] LOG: listening on Unix socket "/tmp/.s.PGSQL.5445"
> > 2021-07-20 18:47:32.051 IST [102234] LOG: database system was interrupted; last known up at 2021-07-20 18:46:27 IST
> > 2021-07-20 18:47:32.060 IST [102234] LOG: data directory sync (fsync) complete after 0.00 s
> > 2021-07-20 18:47:32.060 IST [102234] LOG: database system was not properly shut down; automatic recovery in progress
> > 2021-07-20 18:47:32.063 IST [102234] LOG: unlogged relations reset after 0.00 s
> > 2021-07-20 18:47:32.063 IST [102234] LOG: redo starts at 0/14EF418
> > 2021-07-20 18:47:33.063 IST [102234] LOG: redo in progress, elapsed time: 1.00 s, current LSN: 0/5C13D28
> > 2021-07-20 18:47:34.063 IST [102234] LOG: redo in progress, elapsed time: 2.00 s, current LSN: 0/A289160
> > 2021-07-20 18:47:35.063 IST [102234] LOG: redo in progress, elapsed time: 3.00 s, current LSN: 0/EE2DE10
> > 2021-07-20 18:47:35.563 IST [102234] LOG: invalid record length at 0/115C63E0: wanted 24, got 0
> > 2021-07-20 18:47:35.563 IST [102234] LOG: redo done at 0/115C63B8 system usage: CPU: user: 3.58 s, system: 0.14 s, elapsed: 3.50 s
> > 2021-07-20 18:47:35.564 IST [102234] LOG: unlogged relations reset after 0.00 s
> > 2021-07-20 18:47:35.706 IST [102230] LOG: database system is ready to accept connections
>
> --
> Justin

Attachment Content-Type Size
v6-0001-startup-process-progress.patch application/x-patch 14.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-07-23 10:58:03 Re: Avoiding data loss with synchronous replication
Previous Message Amit Kapila 2021-07-23 10:16:23 Re: row filtering for logical replication