Re: HOT chain validation in verify_heapam()

From: Himanshu Upadhyaya <upadhyaya(dot)himanshu(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: HOT chain validation in verify_heapam()
Date: 2022-11-16 07:11:13
Message-ID: CAPF61jC05EgtWmjA4F-dKsvU9+a0zv5SWumgBpkOWf+G_3FVPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 10, 2022 at 3:38 AM Andres Freund <andres(at)anarazel(dot)de> wrote:

>
> > + }
> > +
> > + /*
> > + * Loop over offset and populate predecessor array from
> all entries
> > + * that are present in successor array.
> > + */
> > + ctx.attnum = -1;
> > + for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff;
> > + ctx.offnum = OffsetNumberNext(ctx.offnum))
> > + {
> > + ItemId curr_lp;
> > + ItemId next_lp;
> > + HeapTupleHeader curr_htup;
> > + HeapTupleHeader next_htup;
> > + TransactionId curr_xmax;
> > + TransactionId next_xmin;
> > +
> > + OffsetNumber nextoffnum = successor[ctx.offnum];
> > +
> > + curr_lp = PageGetItemId(ctx.page, ctx.offnum);
>
> Why do we get the item when nextoffnum is 0?
>
> Yes, right, I will move this call to PageGetItemId, just after the next
"if" condition in the patch.

>
> > + if (nextoffnum == 0 || !lp_valid[ctx.offnum] ||
> !lp_valid[nextoffnum])
> > + {
> > + /*
> > + * This is either the last updated tuple
> in the chain or a
> > + * corruption raised for this tuple.
> > + */
>
> "or a corruption raised" isn't quite right grammatically.
>
> will change to "This is either the last updated tuple in the chain or
corruption has been raised for this tuple"

>
> > + continue;
> > + }
> > + if (ItemIdIsRedirected(curr_lp))
> > + {
> > + next_lp = PageGetItemId(ctx.page,
> nextoffnum);
> > + if (ItemIdIsRedirected(next_lp))
> > + {
> > + report_corruption(&ctx,
> > +
> psprintf("redirected line pointer pointing to another redirected line
> pointer at offset %u",
> > +
> (unsigned) nextoffnum));
> > + continue;
> > + }
> > + next_htup = (HeapTupleHeader) PageGetItem(
> ctx.page, next_lp);
> > + if (!HeapTupleHeaderIsHeapOnly(next_htup))
> > + {
> > + report_corruption(&ctx,
> > +
> psprintf("redirected tuple at line pointer offset %u is not heap only
> tuple",
> > +
> (unsigned) nextoffnum));
> > + }
> > + if ((next_htup->t_infomask & HEAP_UPDATED)
> == 0)
> > + {
> > + report_corruption(&ctx,
> > +
> psprintf("redirected tuple at line pointer offset %u is not heap updated
> tuple",
> > +
> (unsigned) nextoffnum));
> > + }
> > + continue;
> > + }
> > +
> > + /*
> > + * Add a line pointer offset to the predecessor
> array if xmax is
> > + * matching with xmin of next tuple (reaching via
> its t_ctid).
> > + * Prior to PostgreSQL 9.4, we actually changed
> the xmin to
> > + * FrozenTransactionId
>
> I'm doubtful it's a good idea to try to validate the 9.4 case. The
> likelihood
> of getting that right seems low and I don't see us gaining much by even
> trying.
>
>
> > so we must add offset to predecessor
> > + * array(irrespective of xmax-xmin matching) if
> updated tuple xmin
> > + * is frozen, so that we can later do validation
> related to frozen
> > + * xmin. Raise corruption if we have two tuples
> having the same
> > + * predecessor.
> > + * We add the offset to the predecessor array
> irrespective of the
> > + * transaction (t_xmin) status. We will do
> validation related to
> > + * the transaction status (and also all other
> validations) when we
> > + * loop over the predecessor array.
> > + */
> > + curr_htup = (HeapTupleHeader) PageGetItem(ctx.page,
> curr_lp);
> > + curr_xmax = HeapTupleHeaderGetUpdateXid(curr_htup);
> > + next_lp = PageGetItemId(ctx.page, nextoffnum);
> > + next_htup = (HeapTupleHeader) PageGetItem(ctx.page,
> next_lp);
> > + next_xmin = HeapTupleHeaderGetXmin(next_htup);
> > + if (TransactionIdIsValid(curr_xmax) &&
> > + (TransactionIdEquals(curr_xmax, next_xmin)
> ||
> > + next_xmin == FrozenTransactionId))
> > + {
> > + if (predecessor[nextoffnum] != 0)
> > + {
> > + report_corruption(&ctx,
> > +
> psprintf("updated version at offset %u is also the updated version of
> tuple at offset %u",
> > +
> (unsigned) nextoffnum, (unsigned) predecessor[nextoffnum]));
> > + continue;
>
> I doubt it is correct to enter this path with next_xmin ==
> FrozenTransactionId. This is following a ctid chain that we normally
> wouldn't
> follow, because it doesn't satisfy the t_self->xmax == t_ctid->xmin
> condition.
>
> I don't immediately see what prevents the frozen tuple being from an
> entirely
> different HOT chain than the two tuples pointing to it.
>
>
>
> Prior to 9.4 we can have xmin updated with FrozenTransactionId but with
9.4 (or later) we set XMIN_FROZEN bit in t_infomask. if updated tuple
is via prior of 9.4 then "TransactionIdEquals(curr_xmax, next_xmin)" will
be false for Frozen Tuple.
The Intention of adding "next_xmin == FrozenTransactionId" to the path is
because we wanted to do validation around Frozen Tuple when we loop over
predecessor array.

I need to look at the isolation test in details to understand how this can
provide false alarm and but if there is a valid case then we can remove
logic of raising corruption related with Frozen Tuple?

> > + }
> > +
> > + /* Loop over offsets and validate the data in the
> predecessor array. */
> > + for (OffsetNumber currentoffnum = FirstOffsetNumber;
> currentoffnum <= maxoff;
> > + currentoffnum = OffsetNumberNext(currentoffnum))
> > + {
> > + HeapTupleHeader pred_htup;
> > + HeapTupleHeader curr_htup;
> > + TransactionId pred_xmin;
> > + TransactionId curr_xmin;
> > + ItemId pred_lp;
> > + ItemId curr_lp;
> > +
> > + ctx.offnum = predecessor[currentoffnum];
> > + ctx.attnum = -1;
> > +
> > + if (ctx.offnum == 0)
> > + {
> > + /*
> > + * Either the root of the chain or an
> xmin-aborted tuple from
> > + * an abandoned portion of the HOT chain.
> > + */
>
> Hm - couldn't we check that the tuple could conceivably be at the root of a
> chain? I.e. isn't HEAP_HOT_UPDATED? Or alternatively has an aborted xmin?
>
>
I don't see a way to check if tuple is at the root of HOT chain because
predecessor array will always be having either xmin from non-abandoned
transaction or it will be zero. We can't differentiate root or tuple
inserted via abandoned transaction.

> > + continue;
> > + }
> > +
> > + curr_lp = PageGetItemId(ctx.page, currentoffnum);
> > + curr_htup = (HeapTupleHeader) PageGetItem(ctx.page,
> curr_lp);
> > + curr_xmin = HeapTupleHeaderGetXmin(curr_htup);
> > +
> > + ctx.itemid = pred_lp = PageGetItemId(ctx.page,
> ctx.offnum);
> > + pred_htup = (HeapTupleHeader) PageGetItem(ctx.page,
> pred_lp);
> > + pred_xmin = HeapTupleHeaderGetXmin(pred_htup);
> > +
> > + /*
> > + * If the predecessor's xmin is aborted or in
> progress, the
> > + * current tuples xmin should be aborted or in
> progress
> > + * respectively. Also both xmin's must be equal.
> > + */
> > + if (!TransactionIdEquals(pred_xmin, curr_xmin) &&
> > + !TransactionIdDidCommit(pred_xmin))
> > + {
> > + report_corruption(&ctx,
> > +
> psprintf("tuple with uncommitted xmin %u was updated to produce a tuple at
> offset %u with differing xmin %u",
> > +
> (unsigned) pred_xmin, (unsigned) currentoffnum, (unsigned)
> curr_xmin));
>
> Is this necessarily true? What about a tuple that was inserted in a
> subtransaction and then updated in another subtransaction of the same
> toplevel
> transaction?
>
>
not sure if I am getting? I have tried with below test and don't see any
issue,

‘postgres[14723]=#’drop table test2;
DROP TABLE
‘postgres[14723]=#’create table test2 (a int, b int primary key);
CREATE TABLE
‘postgres[14723]=#’insert into test2 values (1,1);
INSERT 0 1
‘postgres[14723]=#’BEGIN;
BEGIN
‘postgres[14723]=#*’update test2 set a =2 where a =1;
UPDATE 1
‘postgres[14723]=#*’savepoint s1;
SAVEPOINT
‘postgres[14723]=#*’update test2 set a =6;
UPDATE 1
‘postgres[14723]=#*’rollback to savepoint s1;
ROLLBACK
‘postgres[14723]=#*’update test2 set a =6;
UPDATE 1
‘postgres[14723]=#*’savepoint s2;
SAVEPOINT
‘postgres[14723]=#*’update test2 set a =7;
UPDATE 1
‘postgres[14723]=#*’end;
COMMIT
‘postgres[14723]=#’SELECT lp as tuple, t_xmin, t_xmax, t_field3 as t_cid,
t_ctid,tuple_data_split('test2'::regclass, t_data, t_infomask, t_infomask2,
t_bits), heap_tuple_infomask_flags(t_infomask, t_infomask2) FROM
heap_page_items(get_raw_page('test2', 0));
tuple | t_xmin | t_xmax | t_cid | t_ctid | tuple_data_split |
heap_tuple_infomask_flags
-------+--------+--------+-------+--------+-------------------------------+---------------------------------------------------------------------------
1 | 1254 | 1255 | 0 | (0,2) | {"\\x01000000","\\x01000000"} |
("{HEAP_XMIN_COMMITTED,HEAP_HOT_UPDATED}",{})
2 | 1255 | 1257 | 1 | (0,4) | {"\\x02000000","\\x01000000"} |
("{HEAP_COMBOCID,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE}",{})
3 | 1256 | 0 | 1 | (0,3) | {"\\x06000000","\\x01000000"} |
("{HEAP_XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE}",{})
4 | 1257 | 1258 | 2 | (0,5) | {"\\x06000000","\\x01000000"} |
("{HEAP_COMBOCID,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE}",{})
5 | 1258 | 0 | 3 | (0,5) | {"\\x07000000","\\x01000000"} |
("{HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE}",{})
(5 rows)

>
> > + }
> > +
> > + /*
> > + * If the predecessor's xmin is not frozen, then
> current tuple's
> > + * shouldn't be either.
> > + */
> > + if (pred_xmin != FrozenTransactionId && curr_xmin
> == FrozenTransactionId)
> > + {
> > + report_corruption(&ctx,
> > +
> psprintf("unfrozen tuple was updated to produce a tuple at offset %u which
> is frozen",
> > +
> (unsigned) currentoffnum));
> > + }
>
> Can't we have a an update chain that is e.g.
> xmin 10, xmax 5 -> xmin 5, xmax invalid
>
> and a vacuum cutoff of 7? That'd preent the first tuple from being removed,
> but would allow 5 to be frozen.
>
> I think there were recent patches proposing we don't freeze in that case,
> but
> we'll having done that in the past....
>
>
Not very sure about this, was trying with such case but found hard to
reproduce this.

--
Regards,
Himanshu Upadhyaya
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-11-16 07:23:41 Re: Avoid overhead open-close indexes (catalog updates)
Previous Message Michael Paquier 2022-11-16 07:06:54 Re: [PoC] Let libpq reject unexpected authentication requests