Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Date: 2022-03-31 02:37:58
Message-ID: CAH2-WzmRZEzeGvLv8yDW0AbFmSvJjTziORqjVUrf74mL4GL0Ww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 30, 2022 at 7:00 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> Something vaguely like EXPECT_EQ_U32 in regress.c. Maybe
> AssertCmp(type, a, op, b),
>
> Then the assertion could have been something like
> AssertCmp(int32, diff, >, 0)

I'd definitely use them if they were there.

> Does the line number in the failed run actually correspond to the xid, rather
> than the mxid case? I didn't check.

Yes, I verified -- definitely relfrozenxid.

> You can do something similar locally on linux with
> make -Otarget -C contrib/ -j48 -s USE_MODULE_DB=1 installcheck prove_installcheck=true
> (the prove_installcheck=true to prevent tap tests from running, we don't seem
> to have another way for that)
>
> I don't think windows uses USE_MODULE_DB=1, but it allows to cause a lot more
> load concurrently than running tests serially...

Can't get it to fail locally with that recipe.

> > Assert(vacrel->NewRelfrozenXid == OldestXmin ||
> > TransactionIdPrecedesOrEquals(vacrel->relfrozenxid,
> > vacrel->NewRelfrozenXid));
>
> The comment in your patch says "is either older or newer than FreezeLimit" - I
> assume that's some rephrasing damage?

Both the comment and the assertion are correct. I see what you mean, though.

> Perhaps it's worth commiting improved assertions on master? If this is indeed
> a pre-existing bug, and we're just missing due to slightly less stringent
> asserts, we could rectify that separately.

I don't think there's much chance of the assertion actually hitting
without the rest of the patch series. The new relfrozenxid value is
always going to be OldestXmin - vacuum_min_freeze_age on HEAD, while
with the patch it's sometimes close to OldestXmin. Especially when you
have lots of dead tuples that you churn through constantly (like
pgbench_tellers, or like these system catalogs on the CI test
machine).

> Hm. This triggers some vague memories. There's some oddities around shared
> relations being vacuumed separately in all the databases and thus having
> separate horizons.

That's what I was thinking of, obviously.

> After "remembering" that, I looked in the cirrus log for the failed run, and
> the worker was processing shared a shared relation last:
>
> 2022-03-30 03:48:30.238 GMT [5984][autovacuum worker] LOG: automatic analyze of table "contrib_regression.pg_catalog.pg_authid"

I noticed the same thing myself. Should have said sooner.

> Perhaps this ought to be an elog() instead of an Assert()? Something has gone
> pear shaped if we get here... It's a bit annoying though, because it'd have to
> be a PANIC to be visible on the bf / CI :(.

Yeah, a WARNING would be good here. I can write a new version of my
patch series with a separation patch for that this evening. Actually,
better make it a PANIC for now...

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-03-31 02:51:21 Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Previous Message Michael Paquier 2022-03-31 02:34:52 Re: Add LZ4 compression in pg_dump