Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:
Date: 2022-03-03 18:28:29
Message-ID: 20220303182829.segxxtcedf2meecw@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-03-03 13:11:17 -0500, Robert Haas wrote:
> On Wed, Mar 2, 2022 at 3:00 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2022-03-02 14:52:01 -0500, Robert Haas wrote:
> > > - I am having some trouble understanding clearly what 0001 is doing.
> > > I'll try to study it further.
> >
> > It tests for the various scenarios I could think of that could lead to FD
> > reuse, to state the obvious ;). Anything particularly unclear.
>
> As I understand it, the basic idea of this test is:
>
> 1. create a database called 'conflict_db' containing a table called 'large'
> 2. also create a table called 'replace_sb' in the postgres db.
> 3. have a long running transaction open in 'postgres' database that
> repeatedly accesses the 'replace_sb' table to evict the 'conflict_db'
> table, sometimes causing write-outs
> 4. try to get it to write out the data to a stale file descriptor by
> fiddling with various things, and then detect the corruption that
> results
> 5. use both a primary and a standby not because they are intrinsically
> necessary but because you want to test that both cases work

Right.

I wasn't proposing to commit the test as-is, to be clear. It was meant as a
demonstration of the types of problems I can see, and what's needed to fix
them...

> I can't remember that verify() is the one that accesses conflict.db large
> while cause_eviction() is the one that accesses postgres.replace_sb for more
> than like 15 seconds.

For more than 15seconds? The whole test runs in a few seconds for me...

> As to (5), the identifiers are just primary and standby, without
> mentioning the database name, which adds to the confusion, and there
> are no comments explaining why we have two nodes.

I don't follow this one - physical rep doesn't do anything below the database
level?

> I think it would be helpful to find a way to divide the test case up
> into sections that are separated from each other visually, and explain
> the purpose of each test a little more in a comment. For example I'm
> struggling to understand the exact purpose of this bit:
>
> +$node_primary->safe_psql('conflict_db', "VACUUM FULL large;");
> +$node_primary->safe_psql('conflict_db', "UPDATE large SET datab = 3;");
> +
> +verify($node_primary, $node_standby, 3,
> + "restored contents as expected");
>
> I'm all for having test coverage of VACUUM FULL, but it isn't clear to
> me what this does that is related to FD reuse.

It's just trying to clean up prior damage, so the test continues to later
ones. Definitely not needed once there's a fix. But it's useful while trying
to make the test actually detect various corruptions.

> Similarly for the later ones. I generally grasp that you are trying to
> make sure that things are OK with respect to FD reuse in various
> scenarios, but I couldn't explain to you what we'd be losing in terms
> of test coverage if we removed this line:
>
> +$node_primary->safe_psql('conflict_db', "UPDATE large SET datab = 7;");
>
> I am not sure how much all of this potential cleanup matters because
> I'm pretty skeptical that this would be stable in the buildfarm.

Which "potential cleanup" are you referring to?

> It relies on a lot of assumptions about timing and row sizes and details of
> when invalidations are sent that feel like they might not be entirely
> predictable at the level you would need to have this consistently pass
> everywhere. Perhaps I am too pessimistic?

I don't think the test passing should be dependent on row size, invalidations
etc to a significant degree (unless the tables are too small to reach s_b
etc)? The exact symptoms of failures are highly unstable, but obviously we'd
fix them in-tree before committing a test. But maybe I'm missing a dependency?

FWIW, the test did pass on freebsd, linux, macos and windows with the
fix. After a few iterations of improving the fix ;)

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-03-03 18:47:20 Re: wrong fds used for refilenodes after pg_upgrade relfilenode changes Reply-To:
Previous Message Julien Rouhaud 2022-03-03 18:28:02 Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)