Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2020-01-30 10:41:44
Message-ID: CAA4eK1KbvOAiUnQQxtXAXSQAAroU9h0msp6KXrysYOEVHb34vA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 10, 2020 at 10:14 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Mon, Jan 6, 2020 at 2:11 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
>
> >
> > Few more comments:
> > --------------------------------
> > v4-0007-Implement-streaming-mode-in-ReorderBuffer
> > 1.
> > +ReorderBufferStreamTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
> > {
> > ..
> > + /*
> > + * TOCHECK: We have to rebuild historic snapshot to be sure it includes all
> > + * information about
> > subtransactions, which could arrive after streaming start.
> > + */
> > + if (!txn->is_schema_sent)
> > + snapshot_now
> > = ReorderBufferCopySnap(rb, txn->base_snapshot,
> > + txn,
> > command_id);
> > ..
> > }
> >
> > Why are we using base snapshot here instead of the snapshot we saved
> > the first time streaming has happened? And as mentioned in comments,
> > won't we need to consider the snapshots for subtransactions that
> > arrived after the last time we have streamed the changes?
> Fixed
>

+static void
+ReorderBufferStreamTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
{
..
+ /*
+ * We can not use txn->snapshot_now directly because after we there
+ * might be some new sub-transaction which after the last streaming run
+ * so we need to add those sub-xip in the snapshot.
+ */
+ snapshot_now = ReorderBufferCopySnap(rb, txn->snapshot_now,
+ txn, command_id);

"because after we there", you seem to forget a word between 'we' and
'there'. So as we are copying it now, does this mean it will consider
the snapshots for subtransactions that arrived after the last time we
have streamed the changes? If so, have you tested it and can we add
the same in comments.

Also, if we need to copy the snapshot here, then do we need to again
copy it in ReorderBufferProcessTXN(in below code and in catch block in
the same function).

{
..
+ /*
+ * Remember the command ID and snapshot if transaction is streaming
+ * otherwise free the snapshot if we have copied it.
+ */
+ if (streaming)
+ {
+ txn->command_id = command_id;
+ txn->snapshot_now = ReorderBufferCopySnap(rb, snapshot_now,
+ txn, command_id);
+ }
+ else if (snapshot_now->copied)
+ ReorderBufferFreeSnap(rb, snapshot_now);
..
}

> >
> > 4. In ReorderBufferStreamTXN(), don't we need to set some of the txn
> > fields like origin_id, origin_lsn as we do in ReorderBufferCommit()
> > especially to cover the case when it gets called due to memory
> > overflow (aka via ReorderBufferCheckMemoryLimit).
> We get origin_lsn during commit time so I am not sure how can we do
> that. I have also noticed that currently, we are not using origin_lsn
> on the subscriber side. I think need more investigation that if we
> want this then do we need to log it early.
>

Have you done any investigation of this point? You might want to look
at pg_replication_origin* APIs. Today, again looking at this code, I
think with current coding, it won't be used even when we encounter
commit record. Because ReorderBufferCommit calls
ReorderBufferStreamCommit which will make sure that origin_id and
origin_lsn is never sent. I think at least that should be fixed, if
not, probably, we need a comment with reasoning why we think it is
okay not to do in this case.

+ /*
+ * If we are streaming the in-progress transaction then Discard the

/Discard/discard

> >
> > v4-0006-Gracefully-handle-concurrent-aborts-of-uncommitte
> > 1.
> > + /*
> > + * If CheckXidAlive is valid, then we check if it aborted. If it did, we
> > + * error out
> > + */
> > + if (TransactionIdIsValid(CheckXidAlive) &&
> > + !TransactionIdIsInProgress(CheckXidAlive) &&
> > + !TransactionIdDidCommit(CheckXidAlive))
> > + ereport(ERROR,
> > + (errcode(ERRCODE_TRANSACTION_ROLLBACK),
> > + errmsg("transaction aborted during system catalog scan")));
> >
> > Why here we can't use TransactionIdDidAbort? If we can't use it, then
> > can you add comments stating the reason of the same.
> Done

+ * If CheckXidAlive is valid, then we check if it aborted. If it did, we
+ * error out. Instead of directly checking the abort status we do check
+ * if it is not in progress transaction and no committed. Because if there
+ * were a system crash then status of the the transaction which were running
+ * at that time might not have marked. So we need to consider them as
+ * aborted. Refer detailed comments at snapmgr.c where the variable is
+ * declared.

How about replacing the above comment with below one:

If CheckXidAlive is valid, then we check if it aborted. If it did, we
error out. We can't directly use TransactionIdDidAbort as after crash
such transaction might not have been marked as aborted. See detailed
comments at snapmgr.c where the variable is declared.

I am not able to understand the change in
v8-0011-BUGFIX-set-final_lsn-for-subxacts-before-cleanup. Do you have
any explanation for the same?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sergei Kornilov 2020-01-30 11:00:43 recovery_target_action=pause with confusing hint
Previous Message Maurizio Sambati 2020-01-30 10:13:23 Re: ERROR: subtransaction logged without previous top-level txn record