Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2020-05-13 15:46:26
Message-ID: CAFiTN-vNqFv=PFReoQK0+dZ6irEmYK_hRVe3t7m=jS4C1NiH8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 13, 2020 at 4:50 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, May 13, 2020 at 11:35 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Tue, May 12, 2020 at 4:39 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > >
> > > v20-0003-Extend-the-output-plugin-API-with-stream-methods
> > > ----------------------------------------------------------------------------------------
> > > 1.
> > > +static void
> > > +pg_decode_stream_change(LogicalDecodingContext *ctx,
> > > + ReorderBufferTXN *txn,
> > > + Relation relation,
> > > + ReorderBufferChange *change)
> > > +{
> > > + OutputPluginPrepareWrite(ctx, true);
> > > + appendStringInfo(ctx->out, "streaming change for TXN %u", txn->xid);
> > > + OutputPluginWrite(ctx, true);
> > > +}
> > > +
> > > +static void
> > > +pg_decode_stream_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
> > > + int nrelations, Relation relations[],
> > > + ReorderBufferChange *change)
> > > +{
> > > + OutputPluginPrepareWrite(ctx, true);
> > > + appendStringInfo(ctx->out, "streaming truncate for TXN %u", txn->xid);
> > > + OutputPluginWrite(ctx, true);
> > > +}
> > >
> > > In the above and similar APIs, there are parameters like relation
> > > which are not used. I think you should add some comments atop these
> > > APIs to explain why it is so? I guess it is because we want to keep
> > > them similar to non-stream version of APIs and we can't display
> > > relation or other information as the transaction is still in-progress.
> >
> > I think because the interfaces are designed that way because other
> > decoding plugins might need it e.g. in pgoutput we need change and
> > relation but not here. We have other similar examples also e.g.
> > pg_decode_message has the parameter txn but not used. Do you think we
> > still need to add comments?
> >
>
> In that case, we can leave but lets ensure that we are not exposing
> any parameter which is not used and if there is any due to some
> reason, we should document it. I will also look into this.
>
> > > 4.
> > > +static void
> > > +stream_start_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn)
> > > +{
> > > + LogicalDecodingContext *ctx = cache->private_data;
> > > + LogicalErrorCallbackState state;
> > > + ErrorContextCallback errcallback;
> > > +
> > > + Assert(!ctx->fast_forward);
> > > +
> > > + /* We're only supposed to call this when streaming is supported. */
> > > + Assert(ctx->streaming);
> > > +
> > > + /* Push callback + info on the error context stack */
> > > + state.ctx = ctx;
> > > + state.callback_name = "stream_start";
> > > + /* state.report_location = apply_lsn; */
> > >
> > > Why can't we supply the report_location here? I think here we need to
> > > report txn->first_lsn if this is the very first stream and
> > > txn->final_lsn if it is any consecutive one.
> >
> > I am not sure about this, Because for the very first stream we will
> > report the location of the first lsn of the stream and for the
> > consecutive stream we will report the last lsn in the stream.
> >
>
> Yeah, that doesn't seem to be consistent. How about if get it as an
> additional parameter? The caller can pass the lsn of the very first
> change it is trying to decode in this stream.

Hmm, I think we need to call ReorderBufferIterTXNInit and
ReorderBufferIterTXNNext and get the first change of the stream after
that we shall call stream start then we can find out the first LSN of
the stream. I will see how to do so that it doesn't look awkward.
Basically, as of now, our code is of this layout.

1. stream_start;
2. ReorderBufferIterTXNInit(rb, txn, &iterstate);
while ((change = ReorderBufferIterTXNNext(rb, iterstate)) != NULL)
{
stream changes
}
3. stream stop

So if we want to know the first lsn of this stream then we shall do
something like this

1. ReorderBufferIterTXNInit(rb, txn, &iterstate);
while ((change = ReorderBufferIterTXNNext(rb, iterstate)) != NULL)
{
2. if first_change
stream_start;

stream changes
}
3. stream stop

> > >
> > > 11.
> > > - * HeapTupleSatisfiesHistoricMVCC.
> > > + * tqual.c's HeapTupleSatisfiesHistoricMVCC.
> > > + *
> > > + * We do build the hash table even if there are no CIDs. That's
> > > + * because when streaming in-progress transactions we may run into
> > > + * tuples with the CID before actually decoding them. Think e.g. about
> > > + * INSERT followed by TRUNCATE, where the TRUNCATE may not be decoded
> > > + * yet when applying the INSERT. So we build a hash table so that
> > > + * ResolveCminCmaxDuringDecoding does not segfault in this case.
> > > + *
> > > + * XXX We might limit this behavior to streaming mode, and just bail
> > > + * out when decoding transaction at commit time (at which point it's
> > > + * guaranteed to see all CIDs).
> > > */
> > > static void
> > > ReorderBufferBuildTupleCidHash(ReorderBuffer *rb, ReorderBufferTXN *txn)
> > > @@ -1350,9 +1498,6 @@ ReorderBufferBuildTupleCidHash(ReorderBuffer
> > > *rb, ReorderBufferTXN *txn)
> > > dlist_iter iter;
> > > HASHCTL hash_ctl;
> > >
> > > - if (!rbtxn_has_catalog_changes(txn) || dlist_is_empty(&txn->tuplecids))
> > > - return;
> > > -
> > >
> > > I don't understand this change. Why would "INSERT followed by
> > > TRUNCATE" could lead to a tuple which can come for decode before its
> > > CID?
> >
> > Actually, even if we haven't decoded the DDL operation but in the
> > actual system table the tuple might have been deleted from the next
> > operation. e.g. while we are streaming the INSERT it is possible that
> > the truncate has already deleted that tuple and set the max for the
> > tuple. So before streaming patch, we were only streaming the INSERT
> > only on commit so by that time we had got all the operation which has
> > done DDL and we would have already prepared tuple CID hash.
> >
>
> Okay, but I think for that case how good is that we always allow CID
> hash table to be built even if there are no catalog changes in TXN
> (see changes in ReorderBufferBuildTupleCidHash). Can't we detect that
> while resolving the cmin/cmax?

Maybe in ResolveCminCmaxDuringDecoding we can see if tuplecid_data is
NULL then we can return as unresolved and then caller can take a call
based on that.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2020-05-13 15:48:25 Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Previous Message Tom Lane 2020-05-13 15:38:15 Re: SLRU statistics