Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2020-05-15 09:17:27
Message-ID: CAFiTN-vBmgbv0wRjupuYHZOh_4ubPi8FdTX=MDeMmn4U0ZZYGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 12, 2020 at 4:39 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, May 7, 2020 at 6:17 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Tue, May 5, 2020 at 7:13 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > I have fixed one more issue in 0010 patch. The issue was that once
> > the transaction is serialized due to the incomplete toast after
> > streaming the serialized store was not cleaned up so it was streaming
> > the same tuple multiple times.
> >
>
> I have reviewed a few patches (003, 004, and 005) and below are my comments.
>
> v20-0003-Extend-the-output-plugin-API-with-stream-methods
> ----------------------------------------------------------------------------------------
> 2.
> + <para>
> + Similar to spill-to-disk behavior, streaming is triggered when the total
> + amount of changes decoded from the WAL (for all in-progress transactions)
> + exceeds limit defined by
> <varname>logical_decoding_work_mem</varname> setting.
> + At that point the largest toplevel transaction (measured by
> amount of memory
> + currently used for decoded changes) is selected and streamed.
> + </para>
>
> I think we need to explain here the cases/exception where we need to
> spill even when stream is enabled and check if this is per latest
> implementation, otherwise, update it.

Done

> 3.
> + * To support streaming, we require change/commit/abort callbacks. The
> + * message callback is optional, similarly to regular output plugins.
>
> /similarly/similar

Done

> 4.
> +static void
> +stream_start_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn)
> +{
> + LogicalDecodingContext *ctx = cache->private_data;
> + LogicalErrorCallbackState state;
> + ErrorContextCallback errcallback;
> +
> + Assert(!ctx->fast_forward);
> +
> + /* We're only supposed to call this when streaming is supported. */
> + Assert(ctx->streaming);
> +
> + /* Push callback + info on the error context stack */
> + state.ctx = ctx;
> + state.callback_name = "stream_start";
> + /* state.report_location = apply_lsn; */
>
> Why can't we supply the report_location here? I think here we need to
> report txn->first_lsn if this is the very first stream and
> txn->final_lsn if it is any consecutive one.

Done

> 5.
> +static void
> +stream_stop_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn)
> +{
> + LogicalDecodingContext *ctx = cache->private_data;
> + LogicalErrorCallbackState state;
> + ErrorContextCallback errcallback;
> +
> + Assert(!ctx->fast_forward);
> +
> + /* We're only supposed to call this when streaming is supported. */
> + Assert(ctx->streaming);
> +
> + /* Push callback + info on the error context stack */
> + state.ctx = ctx;
> + state.callback_name = "stream_stop";
> + /* state.report_location = apply_lsn; */
>
> Can't we report txn->final_lsn here

We are already setting this to the txn->final_ls in 0006 patch, but I
have moved it into this patch now.

> 6. I think it will be good if we can provide an example of streaming
> changes via test_decoding at
> https://www.postgresql.org/docs/devel/test-decoding.html. I think we
> can also explain there why the user is not expected to see the actual
> data in the stream.

I have a few problems to solve here.
- With streaming transaction also shall we show the actual values or
we shall do like it is currently in the patch
(appendStringInfo(ctx->out, "streaming change for TXN %u",
txn->xid);). I think we should show the actual values instead of what
we are doing now.
- In the example we can not show a real example, because of the
in-progress transaction to show the changes, we might have to
implement a lot of tuple. I think we can show the partial output?

> v20-0004-Gracefully-handle-concurrent-aborts-of-uncommitt
> ----------------------------------------------------------------------------------------
> 7.
> + /*
> + * We don't expect direct calls to table_tuple_get_latest_tid with valid
> + * CheckXidAlive for catalog or regular tables.
>
> There is an extra space between 'CheckXidAlive' and 'for'. I can see
> similar problems in other places as well where this comment is used,
> fix those as well.

Done

> 8.
> +/*
> + * CheckXidAlive is a xid value pointing to a possibly ongoing (sub)
> + * transaction. Currently, it is used in logical decoding. It's possible
> + * that such transactions can get aborted while the decoding is ongoing in
> + * which case we skip decoding that particular transaction. To ensure that we
> + * check whether the CheckXidAlive is aborted after fetching the tuple from
> + * system tables. We also ensure that during logical decoding we never
> + * directly access the tableam or heap APIs because we are checking for the
> + * concurrent aborts only in systable_* APIs.
> + */
>
> In this comment, there is an inconsistency in the space used after
> completing the sentence. In the part "transaction. To", single space
> is used whereas at other places two spaces are used after a full stop.

Done

> v20-0005-Implement-streaming-mode-in-ReorderBuffer
> -----------------------------------------------------------------------------
> 9.
> Implement streaming mode in ReorderBuffer
>
> Instead of serializing the transaction to disk after reaching the
> maximum number of changes in memory (4096 changes), we consume the
> changes we have in memory and invoke new stream API methods. This
> happens in ReorderBufferStreamTXN() using about the same logic as
> in ReorderBufferCommit() logic.
>
> I think the above part of the commit message needs to be updated.

Done

> 10.
> Theoretically, we could get rid of the k-way merge, and append the
> changes to the toplevel xact directly (and remember the position
> in the list in case the subxact gets aborted later).
>
> I don't think this part of the commit message is correct as we
> sometimes need to spill even during streaming. Please check the
> entire commit message and update according to the latest
> implementation.

Done

> 11.
> - * HeapTupleSatisfiesHistoricMVCC.
> + * tqual.c's HeapTupleSatisfiesHistoricMVCC.
> + *
> + * We do build the hash table even if there are no CIDs. That's
> + * because when streaming in-progress transactions we may run into
> + * tuples with the CID before actually decoding them. Think e.g. about
> + * INSERT followed by TRUNCATE, where the TRUNCATE may not be decoded
> + * yet when applying the INSERT. So we build a hash table so that
> + * ResolveCminCmaxDuringDecoding does not segfault in this case.
> + *
> + * XXX We might limit this behavior to streaming mode, and just bail
> + * out when decoding transaction at commit time (at which point it's
> + * guaranteed to see all CIDs).
> */
> static void
> ReorderBufferBuildTupleCidHash(ReorderBuffer *rb, ReorderBufferTXN *txn)
> @@ -1350,9 +1498,6 @@ ReorderBufferBuildTupleCidHash(ReorderBuffer
> *rb, ReorderBufferTXN *txn)
> dlist_iter iter;
> HASHCTL hash_ctl;
>
> - if (!rbtxn_has_catalog_changes(txn) || dlist_is_empty(&txn->tuplecids))
> - return;
> -
>
> I don't understand this change. Why would "INSERT followed by
> TRUNCATE" could lead to a tuple which can come for decode before its
> CID? The patch has made changes based on this assumption in
> HeapTupleSatisfiesHistoricMVCC which appears to be very risky as the
> behavior could be dependent on whether we are streaming the changes
> for in-progress xact or at the commit of a transaction. We might want
> to generate a test to once validate this behavior.
>
> Also, the comment refers to tqual.c which is wrong as this API is now
> in heapam_visibility.c.

Done.

> 12.
> + * setup CheckXidAlive if it's not committed yet. We don't check if the xid
> + * aborted. That will happen during catalog access. Also reset the
> + * sysbegin_called flag.
> */
> - if (txn->base_snapshot == NULL)
> + if (!TransactionIdDidCommit(xid))
> {
> - Assert(txn->ninvalidations == 0);
> - ReorderBufferCleanupTXN(rb, txn);
> - return;
> + CheckXidAlive = xid;
> + bsysscan = false;
> }
>
> In the comment, the flag name 'sysbegin_called' should be bsysscan.

Done

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v21-0003-Extend-the-output-plugin-API-with-stream-methods.patch application/octet-stream 35.2 KB
v21-0004-Gracefully-handle-concurrent-aborts-of-uncommitt.patch application/octet-stream 13.3 KB
v21-0001-Immediately-WAL-log-assignments.patch application/octet-stream 10.6 KB
v21-0002-Issue-individual-invalidations-with-wal_level-lo.patch application/octet-stream 16.7 KB
v21-0005-Implement-streaming-mode-in-ReorderBuffer.patch application/octet-stream 37.1 KB
v21-0006-Add-support-for-streaming-to-built-in-replicatio.patch application/octet-stream 90.1 KB
v21-0008-Enable-streaming-for-all-subscription-TAP-tests.patch application/octet-stream 14.8 KB
v21-0009-Add-TAP-test-for-streaming-vs.-DDL.patch application/octet-stream 4.4 KB
v21-0010-Bugfix-handling-of-incomplete-toast-tuple.patch application/octet-stream 16.0 KB
v21-0007-Track-statistics-for-streaming.patch application/octet-stream 12.0 KB
v21-0012-Add-streaming-option-in-pg_dump.patch application/octet-stream 2.7 KB
v21-0011-Provide-new-api-to-get-the-streaming-changes.patch application/octet-stream 5.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2020-05-15 09:17:57 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message Michael Paquier 2020-05-15 09:08:17 pg_stat_wal_receiver and flushedUpto/writtenUpto