Re: POC: Cleaning up orphaned files using undo logs

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2019-07-31 13:20:24
Message-ID: CAA4eK1LybnO3FGBG4DgNHXOfwLV5Zp98oVQm_+78Ay6pJAsy6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 24, 2019 at 10:00 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Mon, Jul 22, 2019 at 3:51 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> Please find my review comments for
> 0013-Allow-foreground-transactions-to-perform-undo-action
>
> + /* initialize undo record locations for the transaction */
> + for (i = 0; i < UndoLogCategories; i++)
> + {
> + s->start_urec_ptr[i] = InvalidUndoRecPtr;
> + s->latest_urec_ptr[i] = InvalidUndoRecPtr;
> + s->undo_req_pushed[i] = false;
> + }
>
> Can't we just memset this memory?
>

Yeah, that sounds better, so changed.

>
> + * We can't postpone applying undo actions for subtransactions as the
> + * modifications made by aborted subtransaction must not be visible even if
> + * the main transaction commits.
> + */
> + if (IsSubTransaction())
> + return;
>
> I am not completely sure but is it possible that the outer function
> CommitTransactionCommand/AbortCurrentTransaction can avoid
> calling this function in the switch case based on the current state,
> so that under subtransaction this will never be called?
>

I have already explained as a separate response to this email why I
don't think this is a very good idea.

>
> + /*
> + * Prepare required undo request info so that it can be used in
> + * exception.
> + */
> + ResetUndoRequestInfo(&urinfo);
> + urinfo.dbid = dbid;
> + urinfo.full_xid = fxid;
> + urinfo.start_urec_ptr = start_urec_ptr[per_level];
> +
>
> I see that we are preparing urinfo before execute_undo_actions so that
> in case of an error in CATCH we can use that to
> insert into the queue, but can we just initialize urinfo right there
> before inserting into the queue, we have all the information
> Am I missing something?
>

IIRC, the only idea was that we can use the same variable
(urinfo.full_xid) in execute_undo_actions call and in the catch block,
but I think your suggestion sounds better as we can avoid declaring
urinfo as volatile in that case.

> +
> + /*
> + * We need the locations of the start and end undo record pointers when
> + * rollbacks are to be performed for prepared transactions using undo-based
> + * relations. We need to store this information in the file as the user
> + * might rollback the prepared transaction after recovery and for that we
> + * need it's start and end undo locations.
> + */
> + UndoRecPtr start_urec_ptr[UndoLogCategories];
> + UndoRecPtr end_urec_ptr[UndoLogCategories];
>
> it's -> its
>
>
..
>
> We must have some comments to explain how performUndoActions is used,
> where it's set. If it's explained somewhere else then we can
> give reference to that code.
>
> + for (i = 0; i < UndoLogCategories; i++)
> + {
> + if (s->latest_urec_ptr[i])
> + {
> + s->performUndoActions = true;
> + break;
> + }
> + }
>
> I think we should chek UndoRecPtrIsValid(s->latest_urec_ptr[i])
>

Changed as per suggestion.

> + PG_TRY();
> + {
> + /*
> + * Prepare required undo request info so that it can be used in
> + * exception.
> + */
> + ResetUndoRequestInfo(&urinfo);
> + urinfo.dbid = dbid;
> + urinfo.full_xid = fxid;
> + urinfo.start_urec_ptr = start_urec_ptr[per_level];
> +
> + /* for subtransactions, we do partial rollback. */
> + execute_undo_actions(urinfo.full_xid,
> + end_urec_ptr[per_level],
> + start_urec_ptr[per_level],
> + !isSubTrans);
> + }
> + PG_CATCH();
>
> Wouldn't it be good to explain in comments that we are not rethrowing
> the error in PG_CATCH but because we don't want the main
> transaction to get an error if there is an error while applying to
> undo action for the main transaction and we will abort the transaction
> in the caller of this function?
>

I have added a comment atop of the function containing this code.

> +tables are only accessible in the backend that has created them. We can't
> +postpone applying undo actions for subtransactions as the modifications
> +made by aborted subtransaction must not be visible even if the main transaction
> +commits.
>
> I think we need to give detail reasoning why subtransaction changes
> will be visible if we don't apply it's undo and the main
> the transaction commits by mentioning that we don't use separate
> transaction id for the subtransaction and that will make all the
> changes of the transaction id visible when it commits.
>

I have added a detailed explanation in execute_undo_actions() and
given a reference of same here.

The changes are present in the patch series just posted by me [1].

[1] - https://www.postgresql.org/message-id/CAA4eK1KKAFBCJuPnFtgdc89djv4xO%3DZkAdXvKQinqN4hWiRbvA%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-07-31 13:21:53 Re: POC: Cleaning up orphaned files using undo logs
Previous Message Jeremy Finzel 2019-07-31 13:18:56 Re: proposal - patch: psql - sort_by_size