Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2020-05-13 06:04:42
Message-ID: CAFiTN-v6rzMAmtyh=JuaM9eT8QmPeRn1=x30s9x-i_p=z7rPeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 12, 2020 at 4:39 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, May 7, 2020 at 6:17 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Tue, May 5, 2020 at 7:13 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > I have fixed one more issue in 0010 patch. The issue was that once
> > the transaction is serialized due to the incomplete toast after
> > streaming the serialized store was not cleaned up so it was streaming
> > the same tuple multiple times.
> >
>
> I have reviewed a few patches (003, 004, and 005) and below are my comments.

Thanks for the review, I am replying some of the comments where I have
confusion, others are fine.

>
> v20-0003-Extend-the-output-plugin-API-with-stream-methods
> ----------------------------------------------------------------------------------------
> 1.
> +static void
> +pg_decode_stream_change(LogicalDecodingContext *ctx,
> + ReorderBufferTXN *txn,
> + Relation relation,
> + ReorderBufferChange *change)
> +{
> + OutputPluginPrepareWrite(ctx, true);
> + appendStringInfo(ctx->out, "streaming change for TXN %u", txn->xid);
> + OutputPluginWrite(ctx, true);
> +}
> +
> +static void
> +pg_decode_stream_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
> + int nrelations, Relation relations[],
> + ReorderBufferChange *change)
> +{
> + OutputPluginPrepareWrite(ctx, true);
> + appendStringInfo(ctx->out, "streaming truncate for TXN %u", txn->xid);
> + OutputPluginWrite(ctx, true);
> +}
>
> In the above and similar APIs, there are parameters like relation
> which are not used. I think you should add some comments atop these
> APIs to explain why it is so? I guess it is because we want to keep
> them similar to non-stream version of APIs and we can't display
> relation or other information as the transaction is still in-progress.

I think because the interfaces are designed that way because other
decoding plugins might need it e.g. in pgoutput we need change and
relation but not here. We have other similar examples also e.g.
pg_decode_message has the parameter txn but not used. Do you think we
still need to add comments?

> 4.
> +static void
> +stream_start_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn)
> +{
> + LogicalDecodingContext *ctx = cache->private_data;
> + LogicalErrorCallbackState state;
> + ErrorContextCallback errcallback;
> +
> + Assert(!ctx->fast_forward);
> +
> + /* We're only supposed to call this when streaming is supported. */
> + Assert(ctx->streaming);
> +
> + /* Push callback + info on the error context stack */
> + state.ctx = ctx;
> + state.callback_name = "stream_start";
> + /* state.report_location = apply_lsn; */
>
> Why can't we supply the report_location here? I think here we need to
> report txn->first_lsn if this is the very first stream and
> txn->final_lsn if it is any consecutive one.

I am not sure about this, Because for the very first stream we will
report the location of the first lsn of the stream and for the
consecutive stream we will report the last lsn in the stream.

>
> 11.
> - * HeapTupleSatisfiesHistoricMVCC.
> + * tqual.c's HeapTupleSatisfiesHistoricMVCC.
> + *
> + * We do build the hash table even if there are no CIDs. That's
> + * because when streaming in-progress transactions we may run into
> + * tuples with the CID before actually decoding them. Think e.g. about
> + * INSERT followed by TRUNCATE, where the TRUNCATE may not be decoded
> + * yet when applying the INSERT. So we build a hash table so that
> + * ResolveCminCmaxDuringDecoding does not segfault in this case.
> + *
> + * XXX We might limit this behavior to streaming mode, and just bail
> + * out when decoding transaction at commit time (at which point it's
> + * guaranteed to see all CIDs).
> */
> static void
> ReorderBufferBuildTupleCidHash(ReorderBuffer *rb, ReorderBufferTXN *txn)
> @@ -1350,9 +1498,6 @@ ReorderBufferBuildTupleCidHash(ReorderBuffer
> *rb, ReorderBufferTXN *txn)
> dlist_iter iter;
> HASHCTL hash_ctl;
>
> - if (!rbtxn_has_catalog_changes(txn) || dlist_is_empty(&txn->tuplecids))
> - return;
> -
>
> I don't understand this change. Why would "INSERT followed by
> TRUNCATE" could lead to a tuple which can come for decode before its
> CID?

Actually, even if we haven't decoded the DDL operation but in the
actual system table the tuple might have been deleted from the next
operation. e.g. while we are streaming the INSERT it is possible that
the truncate has already deleted that tuple and set the max for the
tuple. So before streaming patch, we were only streaming the INSERT
only on commit so by that time we had got all the operation which has
done DDL and we would have already prepared tuple CID hash.

The patch has made changes based on this assumption in
> HeapTupleSatisfiesHistoricMVCC which appears to be very risky as the
> behavior could be dependent on whether we are streaming the changes
> for in-progress xact or at the commit of a transaction. We might want
> to generate a test to once validate this behavior.

We have already added the test case for the same, 011_stream_ddl.pl in
test/subscription

> Also, the comment refers to tqual.c which is wrong as this API is now
> in heapam_visibility.c.

Ok, will fix.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2020-05-13 07:02:35 Re: making update/delete of inheritance trees scale better
Previous Message David G. Johnston 2020-05-13 05:26:46 Re: Event trigger code comment duplication