Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman
Date: 2022-04-07 17:52:10
Message-ID: 20220407175210.q44nnrvkovprxo2a@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Hi,

On 2022-04-07 13:40:30 -0400, Tom Lane wrote:
> Michael Paquier <michael(at)paquier(dot)xyz> writes:
> > Add TAP test for archive_cleanup_command and recovery_end_command
>
> grassquit just showed a non-reproducible failure in this test [1]:

I was just staring at that as well.

> # Postmaster PID for node "standby" is 291160
> ok 1 - check content from archives
> not ok 2 - archive_cleanup_command executed on checkpoint
>
> # Failed test 'archive_cleanup_command executed on checkpoint'
> # at t/002_archiving.pl line 74.
>
> This test is sending a CHECKPOINT command to the standby and
> expecting it to run the archive_cleanup_command, but it looks
> like the standby did not actually run any checkpoint:
>
> 2022-04-07 16:11:33.060 UTC [291806][not initialized][:0] LOG: connection received: host=[local]
> 2022-04-07 16:11:33.078 UTC [291806][client backend][2/15:0] LOG: connection authorized: user=bf database=postgres application_name=002_archiving.pl
> 2022-04-07 16:11:33.084 UTC [291806][client backend][2/16:0] LOG: statement: CHECKPOINT
> 2022-04-07 16:11:33.092 UTC [291806][client backend][:0] LOG: disconnection: session time: 0:00:00.032 user=bf database=postgres host=[local]
>
> I am suspicious that the reason is that ProcessUtility does not
> ask for a forced checkpoint when in recovery:
>
> RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_WAIT |
> (RecoveryInProgress() ? 0 : CHECKPOINT_FORCE));
>
> The trouble with this theory is that this test has been there for
> nearly six months and this is the first such failure (I scraped the
> buildfarm logs to be sure). Seems like failures should be a lot
> more common than that.

> I wondered if the recent pg_stats changes could have affected this, but I
> don't really see how.

I don't really see either. It's a bit more conceivable that the recovery
prefetching changes could affect the timing sufficiently?

It's also possible that it requires an animal of a certain speed to happen -
we didn't have an -fsanitize=address animal until recently.

I guess we'll have to wait and see what the frequency of the problem is?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2022-04-07 17:57:45 Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman
Previous Message Tom Lane 2022-04-07 17:40:30 Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-04-07 17:53:59 Re: [PATCH] Add native windows on arm64 support
Previous Message Tom Lane 2022-04-07 17:42:49 Re: [PATCH] Add native windows on arm64 support