Re: Undo logs

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, dilip(dot)kumar(at)enterprisedb(dot)com, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Undo logs
Date: 2018-11-10 03:57:43
Message-ID: CAA4eK1K+WwS-_K_2-eTw2G_y_XzpW9CuPDsza7qC4rmQHNq5Sg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 5, 2018 at 5:10 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
[review for undo record layer (0003-undo-interface-v3)]

I might sound repeating myself, but just to be clear, I was involved
in the design of this patch as well and I have given a few high-level
inputs for this patch. I have used this interface in the zheap
development, but haven't done any sort of detailed review which I am
doing now. I encourage others also to review this patch.

1.
* NOTES:
+ * Handling multilog -
+ * It is possible that the undo record of a transaction can be spread across
+ * multiple undo log. And, we need some special handling while inserting the
+ * undo for discard and rollback to work sanely.

I think before describing how the undo record is spread across
multiple logs, you can explain how it is laid out when that is not the
case. You can also explain how undo record headers are linked. I am
not sure file header is the best place or it should be mentioned in
README, but I think for now we can use file header for this purpose
and later we can move it to README if required.

2.
+/*
+ * FIXME: Do we want to support undo tuple size which is more than the BLCKSZ
+ * if not than undo record can spread across 2 buffers at the max.
+ */

+#define MAX_BUFFER_PER_UNDO 2

I think here the right question is what is the possibility of undo
record to be greater than BLCKSZ? For zheap, as of today, we don'
have any such requirement as the largest undo record is written for
update or multi_insert and in both cases we don't exceed the limit of
BLCKSZ. I guess some user other than zheap could probably have such
requirement and I don't think it is impossible to enhance this if we
have any requirement.

If anybody else has an opinion here, please feel to share it.

3.
+/*
+ * FIXME: Do we want to support undo tuple size which is more than the BLCKSZ
+ * if not than undo record can spread across 2 buffers at the max.
+ */
+#define MAX_BUFFER_PER_UNDO 2
+
+/*
+ * Consider buffers needed for updating previous transaction's
+ * starting undo record. Hence increased by 1.
+ */
+#define MAX_UNDO_BUFFERS (MAX_PREPARED_UNDO + 1) * MAX_BUFFER_PER_UNDO
+
+/* Maximum number of undo record that can be prepared before calling insert. */
+#define MAX_PREPARED_UNDO 2

I think it is better to define MAX_PREPARED_UNDO before
MAX_UNDO_BUFFERS as the first one is used in the definition of a
second.

4.
+/*
+ * Call PrepareUndoInsert to tell the undo subsystem about the undo record you
+ * intended to insert. Upon return, the necessary undo buffers are pinned.
+ * This should be done before any critical section is established, since it
+ * can fail.
+ *
+ * If not in recovery, 'xid' should refer to the top transaction id because
+ * undo log only stores mapping for the top most transactions.
+ * If in recovery, 'xid' refers to the transaction id stored in WAL.
+ */
+UndoRecPtr
+PrepareUndoInsert(UnpackedUndoRecord *urec, UndoPersistence upersistence,
+ TransactionId xid)

This function locks the buffer as well which is right as we need to do
that before critical section, but the function header comments doesn't
indicate it. You can modify it as:
"Upon return, the necessary undo buffers are pinned and locked."

Note that similar modification is required in .h file as well.

5.
+/*
+ * Insert a previously-prepared undo record. This will lock the buffers
+ * pinned in the previous step, write the actual undo record into them,
+ * and mark them dirty. For persistent undo, this step should be performed
+ * after entering a critical section; it should never fail.
+ */
+void
+InsertPreparedUndo(void)

Here, the comments are wrong as buffers are already locked in the
previous step. A similar change is required in .h file as well.

6.
+InsertPreparedUndo(void)
{
..
/*
+ * Try to insert the record into the current page. If it doesn't
+ * succeed then recall the routine with the next page.
+ */
+ if (InsertUndoRecord(uur, page, starting_byte, &already_written, false))
+ {
+ undo_len += already_written;
+ MarkBufferDirty(buffer);
+ break;
+ }
+
+ MarkBufferDirty(buffer);
..
}

Here, you are writing into a shared buffer and marking it dirty, isn't
it a good idea to Assert for being in the critical section?

7.
+/* Maximum number of undo record that can be prepared before calling insert. */
+#define MAX_PREPARED_UNDO 2

/record/records

I think this definition doesn't define the maximum number of undo
records that can be prepared as the caller can use UndoSetPrepareSize
to change it. I think you can modify the comment as below or
something on those lines:
"This defines the number of undo records that can be prepared before
calling insert by default. If you need to prepare more than
MAX_PREPARED_UNDO undo records, then you must call UndoSetPrepareSize
first."

8.
+ * Caller should call this under log->discard_lock
+ */
+static bool
+IsPrevTxnUndoDiscarded(UndoLogControl *log, UndoRecPtr prev_xact_urp)
+{
+ if (log->oldest_data == InvalidUndoRecPtr)
..

Isn't it a good idea to have an Assert that we already have discard_lock?

9.
+ UnpackedUndoRecord uur; /* prev txn's first undo record.*/
+} PreviousTxnInfo;

Extra space at the of comment is required.

10.
+/*
+ * Check if previous transactions undo is already discarded.
+ *
+ * Caller should call this under log->discard_lock
+ */
+static bool
+IsPrevTxnUndoDiscarded(UndoLogControl *log, UndoRecPtr prev_xact_urp)
+{

The name suggests that this function is doing something special for
the previous transaction whereas this it just checks whether undo is
discarded corresponding to a particular undo location. Isn't it
better if we name it as UndoRecordExists or UndoRecordIsValid? Then
explain in comments when do you consider particular record exists.

Another point to note is that you are not releasing the lock in all
paths, so it is better to mention in comments when will it be released
and when not.

11.
+IsPrevTxnUndoDiscarded(UndoLogControl *log, UndoRecPtr prev_xact_urp)
+{
+ if (log->oldest_data == InvalidUndoRecPtr)
+ {
+ /*
+ * oldest_data is not yet initialized. We have to check
+ * UndoLogIsDiscarded and if it's already discarded then we have
+ * nothing to do.
+ */
+ LWLockRelease(&log->discard_lock);
+ if (UndoLogIsDiscarded(prev_xact_urp))
+ return true;

The comment in above code is just trying to write the code in words.
I think here you should tell why we need to call UndoLogIsDiscarded
when oldest_data is not initialized and or the scenario when
oldest_data will not be initialized.

12.
+ * The actual update will be done by UndoRecordUpdateTransInfo under the
+ * critical section.
+ */
+void
+PrepareUndoRecordUpdateTransInfo(UndoRecPtr urecptr, bool log_switched)
+{
..

I find this function name bit awkward. How about UndoRecordPrepareTransInfo?

13.
+PrepareUndoRecordUpdateTransInfo(UndoRecPtr urecptr, bool log_switched)
{
..
+ /*
+ * TODO: For now we don't know how to build a transaction chain for
+ * temporary undo logs. That's because this log might have been used by a
+ * different backend, and we can't access its buffers. What should happen
+ * is that the undo data should be automatically discarded when the other
+ * backend detaches, but that code doesn't exist yet and the undo worker
+ * can't do it either.
+ */
+ if (log->meta.persistence == UNDO_TEMP)
+ return;

Aren't we already dealing with this case in the other patch [1]?
Basically, I think we should discard it at commit time and or when the
backend is detached.

14.
+PrepareUndoRecordUpdateTransInfo(UndoRecPtr urecptr, bool log_switched)
{
..
/*
+ * If previous transaction's urp is not valid means this backend is
+ * preparing its first undo so fetch the information from the undo log
+ * if it's still invalid urp means this is the first undo record for this
+ * log and we have nothing to update.
+ */
+ if (!UndoRecPtrIsValid(prev_xact_urp))
+ return;
..

This comment is confusing. It appears to be saying same thing twice.
You can write it along something like:

"The absence of previous transaction's undo indicate that this backend
is preparing its first undo in which case we have nothing to update.".

15.
+PrepareUndoRecordUpdateTransInfo(UndoRecPtr urecptr, bool log_switched)
/*
+ * Acquire the discard lock before accessing the undo record so that
+ * discard worker doen't remove the record while we are in process of
+ * reading it.
+ */

Typo doen't/doesn't.

I think you can use 'can't' instead of doesn't.

This is by no means a complete review, rather just noticed a few
things while reading the patch.

[1] - https://www.postgresql.org/message-id/CAFiTN-t8fv-qYG9zynhS-1jRrvt_o5C-wCMRtzOsK8S%3DMXvKKw%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-11-10 05:01:27 Re: Skylake-S warning
Previous Message David Steele 2018-11-10 03:50:44 Re: pgsql: Make WAL segment size configurable at initdb time.