|From:||Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>|
|To:||Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>|
|Cc:||Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, MBeena Emerson <mbeena(dot)emerson(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>|
|Subject:||Re: recovering from "found xmin ... from before relfrozenxid ..."|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
> So this confirms the suspicion that the cause of the buildfarm
> failures is a concurrently-open transaction, presumably from
> autovacuum. I don't have time to poke further right now.
I spent some more time analyzing this, and there seem to be two distinct
1. My patch a7212be8b does indeed have a problem. It will allow
vacuum_set_xid_limits to compute freezeLimit = nextXid for a temp
table if freeze_min_age is zero (ie VACUUM FREEZE). If there's
any concurrent transactions, this falls foul of
heap_prepare_freeze_tuple's expectation that
* NB: cutoff_xid *must* be <= the current global xmin, to ensure that any
* XID older than it could neither be running nor seen as running by any
* open transaction. This ensures that the replacement will not change
* anyone's idea of the tuple state.
The "cannot freeze committed xmax" error message appears to be banking on
the assumption that we'd not reach heap_prepare_freeze_tuple for any
committed-dead tuple unless its xmax is past the specified cutoff_xid.
2. As Amit suspected, there's an inconsistency between pruneheap.c's
rules for which tuples are removable and vacuum.c's rules for that.
This seems like a massive bug in its own right: what's the point of
pruneheap.c going to huge effort to decide whether it should keep a
tuple if vacuum will then kill it anyway? I do not understand why
whoever put in the GlobalVisState stuff only applied it in pruneheap.c
and not VACUUM proper.
These two points interact, in that we don't get to the "cannot freeze"
failure except when pruneheap has decided not to remove something that
is removable according to VACUUM's rules. (VACUUM doesn't actually
remove it, because lazy_scan_heap won't try to remove HeapOnly tuples
even when it thinks they're HEAPTUPLE_DEAD; but then it tries to freeze
the tuple, and heap_prepare_freeze_tuple spits up.) However, if I revert
a7212be8b then the pg_surgery test still fails in the presence of a
concurrent transaction (both of the expected "skipping TID" notices
disappear). So reverting that patch wouldn't get us out of trouble.
I think to move forward, we need to figure out what the freezing
behavior ought to be for temp tables. We could make it the same
as it was before a7212be8b, which'd just require some more complexity
in vacuum_set_xid_limits. However, that negates the idea that we'd
like VACUUM's behavior on a temp table to be fully independent of
whether concurrent transactions exist. I'd prefer to allow a7212be8b's
behavior to stand, but then it seems we need to lobotomize the error
check in heap_prepare_freeze_tuple to some extent.
Independently of that, it seems like we need to fix things so that
when pruneheap.c is called by vacuum, it makes EXACTLY the same
dead-or-not-dead decisions that the main vacuum code makes. This
business with applying some GlobalVisState rule or other instead
seems just as unsafe as can be.
AFAICS, there is no chance of the existing pg_surgery regression test
being fully stable if we don't fix both things.
BTW, attached is a quick-hack patch to allow automated testing
of this scenario, along the lines I sketched yesterday. This
test passes if you run the two scripts serially, but not when
you run them in parallel. I'm not proposing this for commit;
it's a hack and its timing behavior is probably not stable enough
for the buildfarm. But it's pretty useful for poking at these
regards, tom lane
|Next Message||Ranier Vilela||2020-09-20 17:36:00||Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)|
|Previous Message||Ranier Vilela||2020-09-20 16:50:34||Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)|