Re: POC: Cleaning up orphaned files using undo logs

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2019-07-19 08:57:59
Message-ID: CAA4eK1L2Vqw90qE5_rpgX+uDpgfmf_KqPsYeZS65fFDzdQVpeA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 11, 2019 at 9:17 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Thu, Jul 11, 2019 at 12:38 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > I don't like the fact that undoaccess.c has a new global,
> > undo_compression_info. I haven't read the code thoroughly, but do we
> > really need that? I think it's never modified (so it could just be
> > declared const),
>
> Actually, this will get modified otherwise across undo record
> insertion how we will know what was the values of the common fields in
> the first record of the page. Another option could be that every time
> we insert the record, read the value from the first complete undo
> record on the page but that will be costly because for every new
> insertion we need to read the first undo record of the page.
>

This information won't be shared across transactions, so can't we keep
it in top transaction's state? It seems to me that will be better
than to maintain it as a global state.

Few more comments on this patch:
1.
PrepareUndoInsert()
{
..
+ if (logswitched)
+ {
..
+ }
+ else
+ {
..
+ resize = true;
..
+ }
+
..
+
+ do
+ {
+ bufidx = UndoGetBufferSlot(context, rnode, cur_blk, rbm);
..
+ rbm = RBM_ZERO;
+ cur_blk++;
+ } while (cur_size < size);
+
+ /*
+ * Set/overwrite compression info if required and also exclude the common
+ * fields from the undo record if possible.
+ */
+ if (UndoSetCommonInfo(compression_info, urec, urecptr,
+ context->prepared_undo_buffers[prepared_undo->undo_buffer_idx[0]].buf))
+ resize = true;
+
+ if (resize)
+ size = UndoRecordExpectedSize(urec);

I see that in some cases where resize is possible are checked before
buffer allocation and some are after. Isn't it better to do all these
checks before buffer allocation? Also, isn't it better to even
compute changed size before buffer allocation as that might sometimes
help in lesser buffer allocations?

Can you find a better way to write
:context->prepared_undo_buffers[prepared_undo->undo_buffer_idx[0]].buf)?
It makes the line too long and difficult to understand. Check for
similar instances in the patch and if possible, change them as well.

2.
+InsertPreparedUndo(UndoRecordInsertContext *context)
{
..
/*
+ * Try to insert the record into the current page. If it
+ * doesn't succeed then recall the routine with the next page.
+ */
+ InsertUndoData(&ucontext, page, starting_byte);
+ if (ucontext.stage == UNDO_PACK_STAGE_DONE)
+ {
+ MarkBufferDirty(buffer);
+ break;
+ }
+ MarkBufferDirty(buffer);
..
}

Can't we call MarkBufferDirty(buffer) just before 'if' check? That
will avoid calling it twice.

3.
+ * Later, during insert phase we will write actual records into thse buffers.
+ */
+struct PreparedUndoBuffer

/thse/these

4.
+ /*
+ * If we are writing first undo record for the page the we can set the
+ * compression so that subsequent records from the same transaction can
+ * avoid including common information in the undo records.
+ */
+ if (first_complete_undo)

/page the we/page then we

5.
PrepareUndoInsert()
{
..
After
+ * allocation We'll only advance by as many bytes as we turn out to need.
+ */
+ UndoRecordSetInfo(urec);

Change the beginning of comment as: "After allocation, we'll .."

6.
PrepareUndoInsert()
{
..
* TODO: instead of storing this in the transaction header we can
+ * have separate undo log switch header and store it there.
+ */
+ prevlogurp =
+ MakeUndoRecPtr(UndoRecPtrGetLogNo(prevlog_insert_urp),
+ (UndoRecPtrGetOffset(prevlog_insert_urp) - prevlen));
+

I don't think this TODO is valid anymore because now the patch has a
separate log-switch header.

7.
/*
+ * If undo log is switched then set the logswitch flag and also reset the
+ * compression info because we can use same compression info for the new
+ * undo log.
+ */
+ if (UndoRecPtrIsValid(prevlog_xact_start))

/can/can't

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2019-07-19 09:02:32 Re: SQL/JSON path issues/questions
Previous Message Amit Langote 2019-07-19 08:52:20 Re: partition routing layering in nodeModifyTable.c