Re: Undo logs

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, dilip(dot)kumar(at)enterprisedb(dot)com, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Undo logs
Date: 2018-12-08 10:52:31
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On Tue, Dec 4, 2018 at 3:00 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> On Sat, Dec 1, 2018 at 12:58 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> >
> > 13.
> > PrepareUndoInsert()
> > {
> > ..
> > if (!UndoRecPtrIsValid(prepared_urec_ptr))
> > + urecptr = UndoRecordAllocate(urec, 1, upersistence, txid);
> > + else
> > + urecptr = prepared_urec_ptr;
> > +
> > + size = UndoRecordExpectedSize(urec);
> > ..
> >
> > I think we should make above code a bit more bulletproof. As it is
> > written, there is no guarantee the size we have allocated is same as
> > we are using in this function.
> I agree
> How about if we take 'size' as output
> > parameter from UndoRecordAllocate and then use it in this function?
> > Additionally, we can have an Assert that the size returned by
> > UndoRecordAllocate is same as UndoRecordExpectedSize.
> With this change we will be able to guarantee when we are allocating
> single undo record
> but multi prepare will still be a problem. I haven't fix this as of
> now. I will think on how
> to handle both the cases when we have to prepare one time or when we
> have to allocate
> once and prepare multiple time.

Yeah, this appears tricky. I think we can leave it as it is unless we
get some better idea.

+UndoRecordOnUndoLogChange(UndoPersistence persistence)
+ prev_txid[persistence] = InvalidTransactionId;

Are we using this API in this or any other patch or in zheap? I see
this can be useful API, but unless we have a plan to use it, I don't
think we should retain it.

+ * Handling multi log:
+ *
+ * It is possible that the undo record of a transaction can be spread across
+ * multiple undo log. And, we need some special handling while inserting the
+ * undo for discard and rollback to work sanely.
+ *
+ * If the undorecord goes to next log then we insert a transaction header for
+ * the first record in the new log and update the transaction header with this
+ * new log's location. This will allow us to connect transactions across logs
+ * when the same transaction span across log (for this we keep track of the
+ * previous logno in undo log meta) which is required to find the latest undo
+ * record pointer of the aborted transaction for executing the undo actions
+ * before discard. If the next log get processed first in that case we
+ * don't need to trace back the actual start pointer of the transaction,
+ * in such case we can only execute the undo actions from the current log
+ * because the undo pointer in the slot will be rewound and that
will be enough
+ * to avoid executing same actions. However, there is possibility that after
+ * executing the undo actions the undo pointer got discarded, now in later
+ * stage while processing the previous log it might try to fetch the undo
+ * record in the discarded log while chasing the transaction header chain.
+ * To avoid this situation we first check if the next_urec of the transaction
+ * is already discarded then no need to access that and start executing from
+ * the last undo record in the current log.

I think I see the problem in the discard mechanism when the log is
spread across multiple logs. Basically, if the second log contains
undo of some transaction prior to the transaction which has just
decided to spread it's undo in the chosen undo log, then we might
discard the undo log of some transaction(s) inadvertently. Am, I
missing something? If not, then I guess we need to ensure that we
don't immediately discard the undo in the second log when a single
transactions undo is spreaded across two logs

Before choosing a new undo log to span the undo for a transaction, do
we ensure that it is not already linked with some other undo log for a
similar reason?

One more thing in this regard:
+UndoRecordAllocate(UnpackedUndoRecord *undorecords, int nrecords,
+ TransactionId txid, UndoPersistence upersistence)
+ if (InRecovery)
+ urecptr = UndoLogAllocateInRecovery(txid, size, upersistence);
+ else
+ urecptr = UndoLogAllocate(size, upersistence);
+ log = UndoLogGet(UndoRecPtrGetLogNo(urecptr), false);
+ /*
+ * By now, we must be attached to some undo log unless we are in recovery.
+ */
+ Assert(AmAttachedToUndoLog(log) || InRecovery);
+ /*
+ * We can consider the log as switched if this is the first record of the
+ * log and not the first record of the transaction i.e. same transaction
+ * continued from the previous log.
+ */
+ if ((UndoRecPtrGetOffset(urecptr) == UndoLogBlockHeaderSize) &&
+ log->meta.prevlogno != InvalidUndoLogNumber)
+ log_switched = true;

Isn't there a hidden assumption in the above code that you will always
get a fresh undo log if the undo doesn't fit in the currently attached
log? What is the guarantee of same?

+ * Call PrepareUndoInsert to tell the undo subsystem about the undo record you
+ * intended to insert. Upon return, the necessary undo buffers are pinned and
+ * locked.
+ * This should be done before any critical section is established, since it
+ * can fail.
+ *
+ * If not in recovery, 'xid' should refer to the top transaction id because
+ * undo log only stores mapping for the top most transactions.
+ * If in recovery, 'xid' refers to the transaction id stored in WAL.
+ */
+extern UndoRecPtr PrepareUndoInsert(UnpackedUndoRecord *, TransactionId xid,
+ UndoPersistence);
+ * Insert a previously-prepared undo record. This will write the actual undo
+ * record into the buffers already pinned and locked in PreparedUndoInsert,
+ * and mark them dirty. For persistent undo, this step should be performed
+ * after entering a critical section; it should never fail.
+ */
+extern void InsertPreparedUndo(void);

This text is duplicate of what is mentioned in .c file, so I have
removed it in delta patch. Similarly, I have removed duplicate text
atop other functions exposed via undorecord.h

+ * Forget about any previously-prepared undo record. Error recovery calls
+ * this, but it can also be used by other code that changes its mind about
+ * inserting undo after having prepared a record for insertion.
+ */
+extern void CancelPreparedUndo(void);

This API is nowhere defined or used. What is the intention?

+typedef struct UndoRecordHeader
+ /*
+ * Transaction id that has modified the tuple present in this undo record.
+ * If this is older then RecentGlobalXmin, then we can consider the tuple
+ * in this undo record as visible.
+ */
+ TransactionId urec_prevxid;


I think we need to mention oldestXidWithEpochHavingUndo instead of

+ * If UREC_INFO_BLOCK is set, an UndoRecordBlock structure follows.
+ *
+ * If UREC_INFO_PAYLOAD is set, an UndoRecordPayload structure follows.
+ *
+ * When (as will often be the case) multiple structures are present, they
+ * appear in the same order in which the constants are defined here. That is,
+ * UndoRecordRelationDetails appears first.
+ */

I have expanded this comments in the attached delta patch. I think we
should remove the define UREC_INFO_PAYLOAD_CONTAINS_SLOT from the
patch as this is zheap specific and should be added later along with
the zheap code.

+ * Additional information about a relation to which this record pertains,
+ * namely the tablespace OID and fork number. If the tablespace OID is
+ * DEFAULTTABLESPACE_OID and the fork number is MAIN_FORKNUM, this structure
+ * can (and should) be omitted.
+ */
+typedef struct UndoRecordRelationDetails
+ ForkNumber urec_fork; /* fork number */
+} UndoRecordRelationDetails;

This comment seems to be out-dated, so modified in the attached delta patch.

+typedef struct UndoRecordTransaction
+ uint32 urec_progress; /* undo applying progress. */
+ uint32 urec_xidepoch; /* epoch of the current transaction */

Can you expand comments about how the progress is defined and used?
Also, write a few sentences about why epoch is captured and or used?

+#define urec_next_pos \
+ (SizeOfUndoRecordTransaction - SizeOfUrecNext)

What is its purpose?

+ * Set uur_info for an UnpackedUndoRecord appropriately based on which
+ * other fields are set.
+ */
+extern void UndoRecordSetInfo(UnpackedUndoRecord *uur);
+ * Compute the number of bytes of storage that will be required to insert
+ * an undo record. Sets uur->uur_info as a side effect.
+ */
+extern Size UndoRecordExpectedSize(UnpackedUndoRecord *uur);

Again, I see duplicate text in .h and .c files, so removed this and
similar comments from .h files. I have tried to move some part of
comments from .h to .c file, so that it is easier to read from one
place rather than referring at two places. See, if I have missed

Apart from the above, I have done few other cosmetic changes in the
attached delta patch, see, if you like those, kindly include it in
the main patch.

With Regards,
Amit Kapila.

Attachment Content-Type Size
0003-undo-interface-v10-delta-amit.patch application/octet-stream 18.4 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Cramer 2018-12-08 10:53:46 Re: extended query protcol violation?
Previous Message Tatsuo Ishii 2018-12-08 10:16:36 Re: extended query protcol violation?