Quick Links

Re: Random pg_upgrade test failure on drongo

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "andrew(at)dunslane(dot)net" <andrew(at)dunslane(dot)net>
Subject:	Re: Random pg_upgrade test failure on drongo
Date:	2024-01-10 09:31:30
Message-ID:	CAA4eK1Lq75HXRxucGrKzWNk8540kdk9dj0B4-6DMcHAZ+CE5+Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Jan 9, 2024 at 4:30 PM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
>
> 09.01.2024 13:08, Amit Kapila wrote:
> >
> >> As to checkpoint_timeout, personally I would not increase it, because it
> >> seems unbelievable to me that pg_restore (with the cluster containing only
> >> two empty databases) can run for longer than 5 minutes. I'd rather
> >> investigate such situation separately, in case we encounter it, but maybe
> >> it's only me.
> >>
> > I feel it is okay to set a higher value of checkpoint_timeout due to
> > the same reason though the probability is less. I feel here it is
> > important to explain in the comments why we are using these settings
> > in the new test. I have thought of something like: "During the
> > upgrade, bgwriter or checkpointer could hold the file handle for some
> > removed file. Now, during restore when we try to create the file with
> > the same name, it errors out. This behavior is specific to only some
> > specific Windows versions and the probability of seeing this behavior
> > is higher in this test because we use wal_level as logical via
> > allows_streaming => 'logical' which in turn sets shared_buffers as
> > 1MB."
> >
> > Thoughts?
>
> I would describe that behavior as "During upgrade, when pg_restore performs
> CREATE DATABASE, bgwriter or checkpointer may flush buffers and hold a file
> handle for pg_largeobject, so later TRUNCATE pg_largeobject command will
> fail if OS (such as older Windows versions) doesn't remove an unlinked file
> completely till it's open. ..."
>

I am slightly hesitant to add any particular system table name in the
comments as this can happen for any other system table as well, so
slightly adjusted the comments in the attached. However, I think it is
okay to mention the particular system table name in the commit
message. Let me know what do you think.

--
With Regards,
Amit Kapila.

Attachment	Content-Type	Size
v2-0001-Fix-an-intermetant-BF-failure-in-003_logical_slot.patch	application/octet-stream	2.3 KB

In response to

Re: Random pg_upgrade test failure on drongo at 2024-01-09 11:00:01 from Alexander Lakhin

Responses

Re: Random pg_upgrade test failure on drongo at 2024-01-10 10:00:01 from Alexander Lakhin

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Kapila	2024-01-10 09:34:15	Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Previous Message	Shlok Kyal	2024-01-10 09:29:22	Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication