Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2019-12-30 09:40:47
Message-ID: CAFiTN-t8PmKA1X4jEqKmkvs0ggWpy0APWpPuaJwpx2YpfAf97w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 12, 2019 at 9:44 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
Yesterday, Tomas has posted the latest version of the patch set which
contain the fix for schema send part. Meanwhile, I was working on few
review comments/bugfixes and refactoring. I have tried to merge those
changes with the latest patch set except the refactoring related to
"0006-Implement-streaming-mode-in-ReorderBuffer" patch, because Tomas
has also made some changes in the same patch. I have created a
separate patch for the same so that we can review the changes and then
we can merge them to the main patch.

> On Wed, Dec 11, 2019 at 5:22 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Mon, Dec 9, 2019 at 1:27 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > I have review the patch set and here are few comments/questions
> > >
> > > 1.
> > > +static void
> > > +pg_decode_stream_change(LogicalDecodingContext *ctx,
> > > + ReorderBufferTXN *txn,
> > > + Relation relation,
> > > + ReorderBufferChange *change)
> > > +{
> > > + OutputPluginPrepareWrite(ctx, true);
> > > + appendStringInfo(ctx->out, "streaming change for TXN %u", txn->xid);
> > > + OutputPluginWrite(ctx, true);
> > > +}
> > >
> > > Should we show the tuple in the streamed change like we do for the
> > > pg_decode_change?
> > >
> >
> > I think so. The patch shows the message in
> > pg_decode_stream_message(), so why to prohibit showing tuple here?

Yeah, we can do that. One option is that we can directly register
"pg_decode_change" function as stream_change_cb plugin and that will
show the tuple, another option is that we can write a similar function
as pg_decode_change and change the message which includes the text
"STREAM" so that the user can distinguish between tuple from committed
transaction and the in-progress transaction.

While analyzing this solution I have encountered one more issue, the
problem is that currently, during commit time in DecodeCommit we check
whether we need to skip the changes of the transaction or not by
calling SnapBuildXactNeedsSkip but since now we support streaming so
it's possible that before commit wal arrive we might have already sent
the changes to the output plugin even though we could have skipped
those changes. So my question is instead of checking at the commit
time can't we check before adding to ReorderBuffer itself or we can
truncate the changes if SnapBuildXactNeedsSkip is true whenever
logical_decoding_workmem limit is reached.

> > Few comments on this patch series:
> >
> > 0001-Immediately-WAL-log-assignments:
> > ------------------------------------------------------------
> >
> > The commit message still refers to the old design for this patch. I
> > think you need to modify the commit message as per the latest patch.
Done
> >
> > 0002-Issue-individual-invalidations-with-wal_level-log
> > ----------------------------------------------------------------------------
> > 1.
> > xact_desc_invalidations(StringInfo buf,
> > {
> > ..
> > + else if (msg->id == SHAREDINVALSNAPSHOT_ID)
> > + appendStringInfo(buf, " snapshot %u", msg->sn.relId);
> >
> > You have removed logging for the above cache but forgot to remove its
> > reference from one of the places. Also, I think you need to add a
> > comment somewhere in inval.c to say why you are writing for WAL for
> > some types of invalidations and not for others?
Done
> >
> > 0003-Extend-the-output-plugin-API-with-stream-methods
> > --------------------------------------------------------------------------------
> > 1.
> > + are required, while <function>stream_message_cb</function> and
> > + <function>stream_message_cb</function> are optional.
> >
> > stream_message_cb is mentioned twice. It seems the second one is for truncate.
Done
> >
> > 2.
> > size of the transaction size and network bandwidth, the transfer time
> > + may significantly increase the apply lag.
> >
> > /size of the transaction size/size of the transaction
> >
> > no need to mention size twice.
Done
> >
> > 3.
> > + Similarly to spill-to-disk behavior, streaming is triggered when the total
> > + amount of changes decoded from the WAL (for all in-progress
> > transactions)
> > + exceeds limit defined by <varname>logical_work_mem</varname> setting.
> >
> > The guc name used is wrong. /Similarly to/Similar to/
Done
> >
> > 4.
> > stream_start_cb_wrapper()
> > {
> > ..
> > + /* state.report_location = apply_lsn; */
> > ..
> > + /* FIXME ctx->write_location = apply_lsn; */
> > ..
> > }
> >
> > See, if we can fix these and similar in the callback for the stop. I
> > think we don't have final_lsn till we commit/abort. Can we compute
> > before calling these API's?
Done
> >
> >
> > 0005-Gracefully-handle-concurrent-aborts-of-uncommitte
> > ----------------------------------------------------------------------------------
> > 1.
> > @@ -1877,6 +1877,7 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
> > PG_CATCH();
> > {
> > /* TODO: Encapsulate cleanup
> > from the PG_TRY and PG_CATCH blocks */
> > +
> > if (iterstate)
> > ReorderBufferIterTXNFinish(rb, iterstate);
> >
> > Spurious line change.
> >
Done
> > 2. The commit message of this patch refers to Prepared transactions.
> > I think that needs to be changed.
> >
> > 0006-Implement-streaming-mode-in-ReorderBuffer
> > -------------------------------------------------------------------------
> > 1.
> > +
> > +/* iterator for streaming (only get data from memory) */
> > +static ReorderBufferStreamIterTXNState * ReorderBufferStreamIterTXNInit(
> > +
> > ReorderBuffer *rb,
> > +
> > ReorderBufferTXN
> > *txn);
> > +
> > +static ReorderBufferChange *ReorderBufferStreamIterTXNNext(
> > + ReorderBuffer *rb,
> > +
> > ReorderBufferStreamIterTXNState * state);
> > +
> > +static void ReorderBufferStreamIterTXNFinish(
> > +
> > ReorderBuffer *rb,
> > +
> > ReorderBufferStreamIterTXNState * state);
> >
> > Do we really need to introduce new APIs for iterating over changes
> > from streamed transactions? Why can't we reuse the same API's as we
> > use for committed xacts?
Done
> >
> > 2.
> > +static void
> > +ReorderBufferStreamCommit(ReorderBuffer *rb, ReorderBufferTXN *txn)
> >
> > Please write some comments atop ReorderBufferStreamCommit.
Done
> >
> > 3.
> > +ReorderBufferStreamTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
> > {
> > ..
> > ..
> > + if (txn->snapshot_now
> > == NULL)
> > + {
> > + dlist_iter subxact_i;
> > +
> > + /* make sure this transaction is streamed for the first time */
> > +
> > Assert(!rbtxn_is_streamed(txn));
> > +
> > + /* at the beginning we should have invalid command ID */
> > + Assert(txn->command_id ==
> > InvalidCommandId);
> > +
> > + dlist_foreach(subxact_i, &txn->subtxns)
> > + {
> > + ReorderBufferTXN *subtxn;
> > +
> > +
> > subtxn = dlist_container(ReorderBufferTXN, node, subxact_i.cur);
> > +
> > + if (subtxn->base_snapshot != NULL &&
> > +
> > (txn->base_snapshot == NULL ||
> > + txn->base_snapshot_lsn > subtxn->base_snapshot_lsn))
> > + {
> > +
> > txn->base_snapshot = subtxn->base_snapshot;
> >
> > The logic here seems to be correct, but I am not sure why it is not
> > considered to purge the base snapshot before assigning the subtxn's
> > snapshot and similarly, we have not purged snapshot for subtxn once we
> > are done with it. I think we can use
> > ReorderBufferTransferSnapToParent to replace part of the logic here.
> > Do you see any reason for doing things differently here?
Done
> >
> > 4. In ReorderBufferStreamTXN, why do you need to use
> > ReorderBufferCopySnap to assign txn->base_snapshot to snapshot_now.

IMHO, here instead of directly copying the base snapshot we are
modifying it by passing command id and thats the reason we are copying
it.
> >
> > 5. I see a lot of code similarity in ReorderBufferStreamTXN and
> > existing ReorderBufferCommit. I understand that there are some subtle
> > differences due to which we need to write this new function but can't
> > we encapsulate the specific parts of code in functions and then call
> > from both places. I am talking about code in different cases for
> > change->action.
Done
> >
> > 6. + * Note: We never stream and serialize a transaction at the same time (e
> > /(e/(we
Done

I have also found one bug in
"v3-0012-fixup-add-proper-schema-tracking.patch" due to which some of
the streaming test cases were failing, I have created a separate patch
to fix the same.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v4-0001-Immediately-WAL-log-assignments.patch application/octet-stream 10.3 KB
v4-0002-Issue-individual-invalidations-with-wal_level-log.patch application/octet-stream 15.7 KB
v4-0003-fixup-is_schema_sent-set-too-early.patch application/octet-stream 858 bytes
v4-0004-Extend-the-output-plugin-API-with-stream-methods.patch application/octet-stream 34.8 KB
v4-0005-Cleaning-up-of-flags-in-ReorderBufferTXN-structur.patch application/octet-stream 8.1 KB
v4-0006-Gracefully-handle-concurrent-aborts-of-uncommitte.patch application/octet-stream 11.9 KB
v4-0007-Implement-streaming-mode-in-ReorderBuffer.patch application/octet-stream 44.3 KB
v4-0008-fixup-add-is_schema_sent-back.patch application/octet-stream 854 bytes
v4-0009-fixup-get-rid-of-is_schema_sent-entirely.patch application/octet-stream 3.6 KB
v4-0010-Support-logical_decoding_work_mem-set-from-create.patch application/octet-stream 13.1 KB
v4-0011-Add-support-for-streaming-to-built-in-replication.patch application/octet-stream 89.6 KB
v4-0012-fixup-add-proper-schema-tracking.patch application/octet-stream 3.3 KB
v4-0013-Track-statistics-for-streaming.patch application/octet-stream 11.7 KB
v4-0014-Enable-streaming-for-all-subscription-TAP-tests.patch application/octet-stream 14.7 KB
v4-0015-BUGFIX-set-final_lsn-for-subxacts-before-cleanup.patch application/octet-stream 1013 bytes
v4-0016-Add-TAP-test-for-streaming-vs.-DDL.patch application/octet-stream 4.4 KB
v4-0017-Extend-handling-of-concurrent-aborts-for-streamin.patch application/octet-stream 3.7 KB
v4-0018-Review-comment-fix-and-refactoring.patch application/octet-stream 36.1 KB
v4-0019-Bugfix-in-schema-tracking.patch application/octet-stream 837 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-12-30 10:13:09 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message Peter Eisentraut 2019-12-30 09:08:47 Re: Windows v readline