Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2020-02-04 05:29:51
Message-ID: CAFiTN-uJTHgBogP-C8fXpuqSpWvmrgYRLFH=V0FH-hwQ75eXGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 28, 2020 at 11:43 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Jan 28, 2020 at 11:34 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Tue, Jan 28, 2020 at 11:28 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Jan 22, 2020 at 10:30 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > >
> > > > On Tue, Jan 14, 2020 at 10:44 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > >
> > > > > Hmm, I think this can turn out to be inefficient because we can easily
> > > > > end up spilling the data even when we don't need to so. Consider
> > > > > cases, where part of the streamed changes are for toast, and remaining
> > > > > are the changes which we would have streamed and hence can be removed.
> > > > > In such cases, we could have easily consumed remaining changes for
> > > > > toast without spilling. Also, I am not sure if spilling changes from
> > > > > the hash table is a good idea as they are no more in the same order as
> > > > > they were in ReorderBuffer which means the order in which we serialize
> > > > > the changes normally would change and that might have some impact, so
> > > > > we would need some more study if we want to pursue this idea.
> > > > I have fixed this bug and attached it as a separate patch. I will
> > > > merge it to the main patch after we agree with the idea and after some
> > > > more testing.
> > > >
> > > > The idea is that whenever we get the toasted chunk instead of directly
> > > > inserting it into the toast hash I am inserting it into some local
> > > > list so that if we don't get the change for the main table then we can
> > > > insert these changes back to the txn->changes. So once we get the
> > > > change for the main table at that time I am preparing the hash table
> > > > to merge the chunks.
> > > >
> > >
> > >
> > > I think this idea will work but appears to be quite costly because (a)
> > > you might need to serialize/deserialize the changes multiple times and
> > > might attempt streaming multiple times even though you can't do (b)
> > > you need to remove/add the same set of changes from the main list
> > > multiple times.
> > I agree with this.
> > >
> > > It seems to me that we need to add all of this new handling because
> > > while taking the decision whether to stream or not we don't know
> > > whether the txn has changes that can't be streamed. One idea to make
> > > it work is that we identify it while decoding the WAL. I think we
> > > need to set a bit in the insert/delete WAL record to identify if the
> > > tuple belongs to a toast relation. This won't add any additional
> > > overhead in WAL and reduce a lot of complexity in the logical decoding
> > > and also decoding will be efficient. If this is feasible, then we can
> > > do the same for speculative insertions.
> > The Idea looks good to me. I will work on this.
> >
>
> One more thing we can do is to identify whether the tuple belongs to
> toast relation while decoding it. However, I think to do that we need
> to have access to relcache at that time and that might add some
> overhead as we need to do that for each tuple. Can we investigate
> what it will take to do that and if it is better than setting a bit
> during WAL logging.
>
I have done some more analysis on this and it appears that there are
few problems in doing this. Basically, once we get the confirmed
flush location, we advance the replication_slot_catalog_xmin so that
vacuum can garbage collect the old tuple. So the problem is that
while we are collecting the changes in the ReorderBuffer our catalog
version might have removed, and we might not find any relation entry
with that relfilenodeid (because it is dropped or altered in the
future).

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message imai.yoshikazu@fujitsu.com 2020-02-04 06:06:39 RE: Complete data erasure
Previous Message Michael Paquier 2020-02-04 05:28:57 Re: base backup client as auxiliary backend process