Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2019-12-29 08:04:07
Message-ID: CAFiTN-v9ShPzhQ3RnsZkgidL5Mwc7XwaNzNLfAHgMHjfwFv4eQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 28, 2019 at 9:33 PM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> On Tue, Dec 10, 2019 at 10:23:19AM +0530, Dilip Kumar wrote:
> >On Tue, Dec 10, 2019 at 9:52 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >>
> >> On Mon, Dec 2, 2019 at 2:02 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >> >
> >> > On Sun, Dec 1, 2019 at 7:58 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >> > >
> >> > > On Fri, Nov 22, 2019 at 01:18:11PM +0530, Dilip Kumar wrote:
> >> > > > I have rebased the patch on the latest head and also fix the issue of
> >> > > > "concurrent abort handling of the (sub)transaction." and attached as
> >> > > > (v1-0013-Extend-handling-of-concurrent-aborts-for-streamin) along with
> >> > > > the complete patch set. I have added the version number so that we
> >> > > > can track the changes.
> >> > >
> >> > > The patch has rotten a bit and does not apply anymore. Could you
> >> > > please send a rebased version? I have moved it to next CF, waiting on
> >> > > author.
> >> >
> >> > I have rebased the patch set on the latest head.
> >> >
> >> > Apart from this, there is one issue reported by my colleague Vignesh.
> >> > The issue is that if we use more than two relations in a transaction
> >> > then there is an error on standby (no relation map entry for remote
> >> > relation ID 16390). After analyzing I have found that for the
> >> > streaming transaction an "is_schema_sent" flag is kept in
> >> > ReorderBufferTXN. And, I think that is done so that we can send the
> >> > schema for each transaction stream so that if any subtransaction gets
> >> > aborted we don't lose the logical WAL for that schema. But, this
> >> > solution has induced a very basic issue that if a transaction operate
> >> > on more than 1 relation then after sending the schema for the first
> >> > relation it will mark the flag true and the schema for the subsequent
> >> > relations will never be sent.
> >> >
> >>
> >> How about keeping a list of top-level xids in each RelationSyncEntry?
> >> Basically, whenever we send the schema for any transaction, we note
> >> that in RelationSyncEntry and at abort time we can remove xid from the
> >> list. Now, whenever, we check whether to send schema for any
> >> operation in a transaction, we will check if our xid is present in
> >> that list for a particular RelationSyncEntry and take an action based
> >> on that (if xid is present, then we won't send the schema, otherwise,
> >> send it).
> >The idea make sense to me. I will try to write a patch for this and test.
> >
>
> Yeah, the "is_schema_sent" flag in ReorderBufferTXN does not work - it
> needs to be in the RelationSyncEntry. In fact, I already have code for
> that in my private repository - I thought the patches I sent here do
> include this, but apparently I forgot to include this bit :-(
>
> Attached is a rebased patch series, fixing this. It's essentially v2
> with a couple of patches (0003, 0008, 0009 and 0012) replacing the
> is_schema_sent with correct handling.
>
>
> 0003 - removes an is_schema_sent reference added prematurely (it's added
> by a later patch, causing compile failure)
>
> 0008 - adds the is_schema_sent back (essentially reverting 0003)
>
> 0009 - removes is_schema_sent entirely
>
> 0012 - adds the correct handling of schema flags in pgoutput
>
>
> I don't know what other changes you've made since v2, so this way it
> should be possible to just take 0003, 0008, 0009 and 0012 and slip them
> in with minimal hassle.
>
> FWIW thanks to everyone (and Amit and Dilip in particular) working on
> this patch series. There's been a lot of great reviews and improvements
> since I abandoned this thread for a while. I expect to be able to spend
> more time working on this in January.
>
+static void
+set_schema_sent_in_streamed_txn(RelationSyncEntry *entry, TransactionId xid)
+{
+ MemoryContext oldctx;
+
+ oldctx = MemoryContextSwitchTo(CacheMemoryContext);
+
+ entry->streamed_txns = lappend_int(entry->streamed_txns, xid);
+
+ MemoryContextSwitchTo(oldctx);
+}
I was looking into the schema tracking solution and I have one
question, Shouldn't we remove the topxid from the list if the
(sub)transaction is aborted? because once it is aborted we need to
resent the schema. I think we can remove the xid from the list in the
cleanup_rel_sync_cache function?

I have observed some more issues

1. Currently, In ReorderBufferCommit, it is always expected that
whenever we get REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM, we must
have already got REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT and in
SPEC_CONFIRM we send the tuple we got in SPECT_INSERT. But, now those
two messages can be in different streams. So we need to find a way to
handle this. Maybe once we get SPEC_INSERT then we can remember the
tuple and then if we get the SPECT_CONFIRM in the next stream we can
send that tuple?

2. During commit time in DecodeCommit we check whether we need to skip
the changes of the transaction or not by calling
SnapBuildXactNeedsSkip but since now we support streaming so it's
possible that before we decode the commit WAL, we might have already
sent the changes to the output plugin even though we could have
skipped those changes. So my question is instead of checking at the
commit time can't we check before adding to ReorderBuffer itself or we
can truncate the changes if SnapBuildXactNeedsSkip is true whenever
logical_decoding_workmem limit is reached. Am I missing something
here?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2019-12-29 09:13:27 Re: Disallow cancellation of waiting for synchronous replication
Previous Message Fabien COELHO 2019-12-29 07:30:26 Re: Greatest Common Divisor