Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2020-05-29 15:01:09
Message-ID: CAFiTN-tAaPawKeMmnYh7L6-1bFxfrCB8VS=uceD+Qp-1n-HWYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 28, 2020 at 2:41 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, May 28, 2020 at 12:46 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Tue, May 26, 2020 at 12:00 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > Isn't this problem only for subxact file as we anyway create changes
> > > file as part of start stream message which should have come after
> > > abort? If so, can't we detect whether subxact file exists probably by
> > > using nsubxacts or something like that? Can you please once try to
> > > reproduce this scenario to ensure that we are not missing anything?
> >
> > I have tested this, as of now, by default we create both changes and
> > subxact files irrespective of whether we get any subtransactions or
> > not. Maybe this could be optimized that only if we have any subxact
> > then only create that file otherwise not? What's your opinion on the
> > same.
> >
>
> Yeah, that makes sense.
>
> > > > > > > 8.
> > > > > > > @@ -2295,6 +2677,13 @@ ReorderBufferXidSetCatalogChanges(ReorderBuffer
> > > > > > > *rb, TransactionId xid,
> > > > > > > txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
> > > > > > >
> > > > > > > txn->txn_flags |= RBTXN_HAS_CATALOG_CHANGES;
> > > > > > > +
> > > > > > > + /*
> > > > > > > + * TOCHECK: Mark toplevel transaction as having catalog changes too
> > > > > > > + * if one of its children has.
> > > > > > > + */
> > > > > > > + if (txn->toptxn != NULL)
> > > > > > > + txn->toptxn->txn_flags |= RBTXN_HAS_CATALOG_CHANGES;
> > > > > > > }
> > > > > > >
> > > > > > > Why are we marking top transaction here?
> > > > > >
> > > > > > We need to mark top transaction to decide whether to build tuplecid
> > > > > > hash or not. In non-streaming mode, we are only sending during the
> > > > > > commit time, and during commit time we know whether the top
> > > > > > transaction has any catalog changes or not based on the invalidation
> > > > > > message so we are marking the top transaction there in DecodeCommit.
> > > > > > Since here we are not waiting till commit so we need to mark the top
> > > > > > transaction as soon as we mark any of its child transactions.
> > > > > >
> > > > >
> > > > > But how does it help? We use this flag (via
> > > > > ReorderBufferXidHasCatalogChanges) in SnapBuildCommitTxn which is
> > > > > anyway done in DecodeCommit and that too after setting this flag for
> > > > > the top transaction if required. So, how will it help in setting it
> > > > > while processing for subxid. Also, even if we have to do it won't it
> > > > > add the xid needlessly in builder->committed.xip array?
> > > >
> > > > In ReorderBufferBuildTupleCidHash, we use this flag to decide whether
> > > > to build the tuplecid hash or not based on whether it has catalog
> > > > changes or not.
> > > >
> > >
> > > Okay, but you haven't answered the second part of the question: "won't
> > > it add the xid of top transaction needlessly in builder->committed.xip
> > > array, see function SnapBuildCommitTxn?" IIUC, this can happen
> > > without patch as well because DecodeCommit also sets the flags just
> > > based on invalidation messages irrespective of whether the messages
> > > are generated by top transaction or not, is that right?
> >
> > Yes, with or without the patch it always adds the topxid. I think
> > purpose for doing this with/without patch is not for the snapshot
> > instead we are marking the top itself that some of its subtxn has the
> > catalog changes so that while building the tuplecid has we can know
> > whether to build the hash or not. But, having said that I feel in
> > ReorderBufferBuildTupleCidHash why do we need these two checks
> > if (!rbtxn_has_catalog_changes(txn) || dlist_is_empty(&txn->tuplecids))
> > return;
> >
> > I mean it should be enough to just have the check, because if we have
> > added something to the tuplecids then catalog changes must be there
> > because that time we are setting the catalog changes to true.
> >
> > if (dlist_is_empty(&txn->tuplecids))
> > return;
> >
> > I think in the base code there are multiple things going on
> > 1. If we get new CID we always set the catalog change in that
> > transaction but add the tuplecids in the top transaction. So
> > basically, top transaction is so far not marked with catalog changes
> > but it has tuplecids.
> > 2. Now, in DecodeCommit the top xid will be marked that it has catalog
> > changes based on the invalidation messages.
> >
>
> I don't think it is advisable to remove that check from base code
> unless we have a strong reason for doing so. I think here you can
> write better comments about why you are marking the flag for top
> transaction and remove TOCHECK from the comment.

Done.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2020-05-29 15:01:16 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message Dilip Kumar 2020-05-29 15:00:42 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions