Re: Random pg_upgrade test failure on drongo

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "andrew(at)dunslane(dot)net" <andrew(at)dunslane(dot)net>
Subject: Re: Random pg_upgrade test failure on drongo
Date: 2024-01-09 09:00:00
Message-ID: 685bc1bd-fd46-2747-b45f-5c700e5a7c65@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Kuroda-san,

09.01.2024 08:49, Hayato Kuroda (Fujitsu) wrote:
> Based on the suggestion by Amit, I have created a patch with the alternative
> approach. This just does GUC settings. The reported failure is only for
> 003_logical_slots, but the patch also includes changes for the recently added
> test, 004_subscription. IIUC, there is a possibility that 004 would fail as well.
>
> Per our understanding, this patch can stop random failures. Alexander, can you
> test for the confirmation?
>

Yes, the patch fixes the issue for me (without the patch I observe failures
on iterations 1-2, with 10 tests running in parallel, but with the patch
10 iterations succeeded).

But as far I can see, 004_subscription is not affected by the issue,
because it doesn't enable streaming for nodes new_sub, new_sub1.
As I noted before, I could see the failure only with
shared_buffers = 1MB (which is set with allows_streaming => 'logical').
So I'm not sure, whether we need to modify 004 (or any other test that
runs pg_upgrade).

As to checkpoint_timeout, personally I would not increase it, because it
seems unbelievable to me that pg_restore (with the cluster containing only
two empty databases) can run for longer than 5 minutes. I'd rather
investigate such situation separately, in case we encounter it, but maybe
it's only me.
On the other hand, if a checkpoint could occur by some reason within a
shorter time span, then increasing the timeout would not matter, I suppose.
(I've also tested the bgwriter_lru_maxpages-only modification of your patch
and can confirm that it works as well.)

Best regards,
Alexander

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2024-01-09 09:00:45 Re: Add BF member koel-like indentation checks to SanityCheck CI
Previous Message vignesh C 2024-01-09 08:40:17 Re: [HACKERS] make async slave to wait for lsn to be replayed