From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, MBeena Emerson <mbeena(dot)emerson(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net> |
Subject: | Re: recovering from "found xmin ... from before relfrozenxid ..." |
Date: | 2020-09-21 03:31:48 |
Message-ID: | CAA4eK1KsVpDQPMj5hB5tT9=Pu8-9OG2EOcLHb+aeEg8YSV1YjA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Sep 20, 2020 at 10:43 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I wrote:
> > So this confirms the suspicion that the cause of the buildfarm
> > failures is a concurrently-open transaction, presumably from
> > autovacuum. I don't have time to poke further right now.
>
..
> 2. As Amit suspected, there's an inconsistency between pruneheap.c's
> rules for which tuples are removable and vacuum.c's rules for that.
> This seems like a massive bug in its own right: what's the point of
> pruneheap.c going to huge effort to decide whether it should keep a
> tuple if vacuum will then kill it anyway? I do not understand why
> whoever put in the GlobalVisState stuff only applied it in pruneheap.c
> and not VACUUM proper.
>
> These two points interact, in that we don't get to the "cannot freeze"
> failure except when pruneheap has decided not to remove something that
> is removable according to VACUUM's rules. (VACUUM doesn't actually
> remove it, because lazy_scan_heap won't try to remove HeapOnly tuples
> even when it thinks they're HEAPTUPLE_DEAD; but then it tries to freeze
> the tuple, and heap_prepare_freeze_tuple spits up.) However, if I revert
> a7212be8b then the pg_surgery test still fails in the presence of a
> concurrent transaction (both of the expected "skipping TID" notices
> disappear). So reverting that patch wouldn't get us out of trouble.
>
> I think to move forward, we need to figure out what the freezing
> behavior ought to be for temp tables. We could make it the same
> as it was before a7212be8b, which'd just require some more complexity
> in vacuum_set_xid_limits. However, that negates the idea that we'd
> like VACUUM's behavior on a temp table to be fully independent of
> whether concurrent transactions exist. I'd prefer to allow a7212be8b's
> behavior to stand, but then it seems we need to lobotomize the error
> check in heap_prepare_freeze_tuple to some extent.
>
> Independently of that, it seems like we need to fix things so that
> when pruneheap.c is called by vacuum, it makes EXACTLY the same
> dead-or-not-dead decisions that the main vacuum code makes. This
> business with applying some GlobalVisState rule or other instead
> seems just as unsafe as can be.
>
Yeah, on a quick look it seems before commit dc7420c2c9 the
pruneheap.c and the main Vacuum code use to make the same decision and
that is commit which has introduced GlobalVisState stuff.
> AFAICS, there is no chance of the existing pg_surgery regression test
> being fully stable if we don't fix both things.
>
What if ensure that it runs with autovacuum = off and there is no
parallel test running? I am not sure about the second part but if we
can do that then the test will be probably stable.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Andy Fan | 2020-09-21 03:41:09 | Re: Dynamic gathering the values for seq_page_cost/xxx_cost |
Previous Message | Ashutosh Sharma | 2020-09-21 02:39:20 | Re: recovering from "found xmin ... from before relfrozenxid ..." |