cannot freeze committed xmax

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: cannot freeze committed xmax
Date: 2020-10-28 13:44:12
Message-ID: f4aa20ba-7793-31b9-28ac-e34205362535@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

The following error was encountered by our customers:
Them have  very huge catalog (size of pg_class relation is more than
30Gb)  blowned by temporary relations.
When them try to vacuum it, the following error is reported:

vacuum full analyze pg_catalog.pg_class;
ERROR:  cannot freeze committed xmax 596099954

The following records are present in pg_class:

(standard input)-10436009-<Data> ------
(standard input)-10436010- Item   1 -- Length:  229  Offset: 7936
(0x1f00)  Flags: NORMAL
(standard input):10436011:  XMIN: 596098791  XMAX: 596099954 CID|XVAC:
1  OID: 930322390
(standard input)-10436012-  Block Id: 108700  linp Index: 17 Attributes:
33   Size: 32
(standard input)-10436013-  infomask: 0x290b
(HASNULL|HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)
(standard input)-10436014-  t_bits: [0]: 0xff [1]: 0xff [2]: 0xff [3]: 0x7f
(standard input)-10436015-          [4]: 0x00
(standard input)-10436016-
(standard input)-10436017- Item   2 -- Length:  184  Offset: 7752
(0x1e48)  Flags: NORMAL
(standard input):10436018:  XMIN: 596098791  XMAX: 596099954 CID|XVAC:
2  OID: 930322393
(standard input)-10436019-  Block Id: 108700  linp Index: 18 Attributes:
33   Size: 32
(standard input)-10436020-  infomask: 0x2909
(HASNULL|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)
(standard input)-10436021-  t_bits: [0]: 0xff [1]: 0xff [2]: 0xff [3]: 0x3f
(standard input)-10436022-          [4]: 0x00
(standard input)-10436023-
(standard input)-10436024- Item   3 -- Length:  184  Offset: 7568
(0x1d90)  Flags: NORMAL
(standard input):10436025:  XMIN: 596098791  XMAX: 596099954 CID|XVAC:
3  OID: 930322395
(standard input)-10436026-  Block Id: 108700  linp Index: 19 Attributes:
33   Size: 32
(standard input)-10436027-  infomask: 0x2909
(HASNULL|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)
(standard input)-10436028-  t_bits: [0]: 0xff [1]: 0xff [2]: 0xff [3]: 0x3f
(standard input)-10436029-          [4]: 0x00

This error is reported in heap_prepare_freeze_tuple:

    /*
     * Process xmax.  To thoroughly examine the current Xmax value we
need to
     * resolve a MultiXactId to its member Xids, in case some of them are
     * below the given cutoff for Xids.  In that case, those values
might need
     * freezing, too.  Also, if a multi needs freezing, we cannot
simply take
     * it out --- if there's a live updater Xid, it needs to be kept.
     *
     * Make sure to keep heap_tuple_needs_freeze in sync with this.
     */
    xid = HeapTupleGetRawXmax(htup);

    if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
    {
          ...

    }
    else if (TransactionIdIsNormal(xid))
    {

        ...

        if (TransactionIdPrecedes(xid, cutoff_xid))
        {
            /*
             * If we freeze xmax, make absolutely sure that it's not an XID
             * that is important.  (Note, a lock-only xmax can be removed
             * independent of committedness, since a committed lock
holder has
             * released the lock).
             */
            if (!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) &&
                TransactionIdDidCommit(xid))
                ereport(ERROR,
                        (errcode(ERRCODE_DATA_CORRUPTED),
                         errmsg_internal("cannot freeze committed xmax
" XID_FMT,
                                         xid)));
            freeze_xmax = true;
        }
        else
            freeze_xmax = false;
       ...
    }
    else if ((tuple->t_infomask & HEAP_XMAX_INVALID) ||
             !TransactionIdIsValid(HeapTupleGetRawXmax(htup)))
    {
        freeze_xmax = false;
        xmax_already_frozen = true;
    }

So, as you can see, in all this records HEAP_XMAX_INVALID is set, but
xmax is normal transaction id.
This is why we produce error before check for HEAP_XMAX_INVALID in the
subsequent if.
I do not know value of cutoff_xid, because do not have access to the
debugger at customer site.

I will be please or any help how to localize the source of the problem.

Looks like there is no assumption that xmax should be set to
InvalidTransactionId when HEAP_XMAX_INVALID bit is set.
And I didn't find any check  preventing cutoff_xid to be greater than
XID of some transaction which was aborted long time ago.

So is there some logical error that xmax is compared with cutoff_xid
before HEAP_XMAX_INVALID bit is checked?
Otherwise, where this constraint most likely be violated?

It is PG 11.7 version of Postgres.
Thanks is advance,

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2020-10-28 13:49:55 Re: A new function to wait for the backend exit after termination
Previous Message John Naylor 2020-10-28 13:27:43 Re: cutting down the TODO list thread