Re: Undo logs

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: dilip(dot)kumar(at)enterprisedb(dot)com
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Undo logs
Date: 2018-12-12 09:33:46
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On Wed, Dec 12, 2018 at 11:18 AM Dilip Kumar
<dilip(dot)kumar(at)enterprisedb(dot)com> wrote:
> On Sat, Dec 8, 2018 at 7:52 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> I think I see the problem in the discard mechanism when the log is
>> spread across multiple logs. Basically, if the second log contains
>> undo of some transaction prior to the transaction which has just
>> decided to spread it's undo in the chosen undo log, then we might
>> discard the undo log of some transaction(s) inadvertently. Am, I
>> missing something?
> Actually, I don't see exactly this problem here because we only process one undo log at a time, so we will not got to the next undo log and discard some transaction for which we supposed to retain the undo.

How will rollbacks work for such a case? I have more to say about
this, see below.

>> If not, then I guess we need to ensure that we
>> don't immediately discard the undo in the second log when a single
>> transactions undo is spreaded across two logs
>> Before choosing a new undo log to span the undo for a transaction, do
>> we ensure that it is not already linked with some other undo log for a
>> similar reason?

You seem to forget answering this.

>> One more thing in this regard:
>> +UndoRecordAllocate(UnpackedUndoRecord *undorecords, int nrecords,
>> + TransactionId txid, UndoPersistence upersistence)
>> Isn't there a hidden assumption in the above code that you will always
>> get a fresh undo log if the undo doesn't fit in the currently attached
>> log? What is the guarantee of same?
> Yeah it's a problem, we might get the undo which may not be empty. One way to avoid this could be that instead of relying on the check "UndoRecPtrGetOffset(urecptr) == UndoLogBlockHeaderSize". We can add some flag to the undo log meta data that whether it's the first record after attach or not and based on that we can decide. But, I want to think some better solution where we can identify without adding anything extra to undo meta.

I think what we need to determine here is whether we have switched the
log for some non-first record of the transaction. If so, can't we
detect by something like:

log = GetLatestUndoLogAmAttachedTo();
if (!need_xact_hdr)
currnet_log = GetLatestUndoLogAmAttachedTo();
if (currnet_log is not same as log)
Assert(currnet_log->meta.prevlogno == log->logno);
log_switched = true;

Won't the similar problem happens when we read undo records during
rollback? In code below:

+UndoGetPrevUndoRecptr(UndoRecPtr urp, uint16 prevlen)
+ UndoLogNumber logno = UndoRecPtrGetLogNo(urp);
+ UndoLogOffset offset = UndoRecPtrGetOffset(urp);
+ /*
+ * We have reached to the first undo record of this undo log, so fetch the
+ * previous undo record of the transaction from the previous log.
+ */
+ if (offset == UndoLogBlockHeaderSize)
+ {
+ UndoLogControl *prevlog,

We seem to be assuming here that the new log starts from the
beginning. IIUC, we can read the record of some other transaction if
the transaction's log spans across two logs and the second log
contains some other transaction's log in the beginning.

>> 8.
>> +typedef struct UndoRecordTransaction
>> +{
>> + uint32 urec_progress; /* undo applying progress. */
>> + uint32 urec_xidepoch; /* epoch of the current transaction */
>> Can you expand comments about how the progress is defined and used?
> Moved your comment from UnpackedUndoRecord to this structure and in UnpackedUndoRecord I have mentioned that we can refer detailed comment in this structure.
>> Also, write a few sentences about why epoch is captured and or used?
> urec_xidepoch is captured mainly for the zheap visibility purpose so isn't it good that we mention it there?

Okay, you can leave it here as it is. One small point about this structure:

+ uint64 urec_next; /* urec pointer of the next transaction */
+} UndoRecordTransaction;
+#define SizeOfUrecNext (sizeof(UndoRecPtr))
+#define SizeOfUndoRecordTransaction \
+ (offsetof(UndoRecordTransaction, urec_next) + SizeOfUrecNext)

Isn't it better to define urec_next as UndoRecPtr, even though it is
technically the same as per the current code?

With Regards,
Amit Kapila.

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2018-12-12 10:23:41 Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query
Previous Message Simon Riggs 2018-12-12 08:50:16 Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name