Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2020-09-14 11:19:53
Message-ID: CAA4eK1LcpjoGFd=6VCUVDroJVSVD2v0Lxr6ofcfSGY=gXnSmpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 14, 2020 at 1:23 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Mon, Sep 14, 2020 at 8:48 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> >
> > Yeah, this is right, and here is some initial analysis. It seems to be
> > failing in below code:
> > rel_sync_cache_relation_cb(){ ...list_free(entry->streamed_txns);..}
> >
> > This list can have elements only in 'streaming' mode (need to enable
> > 'streaming' with Create Subscription command) whereas none of the
> > tests in 010_truncate.pl is using 'streaming', so this list should be
> > empty (NULL). The two different assertion failures shown in BF reports
> > in list_free code are as below:
> > Assert(list->length > 0);
> > Assert(list->length <= list->max_length);
> >
> > It seems to me that this list is not initialized properly when it is
> > not used or maybe that is true in some special circumstances because
> > we initialize it in get_rel_sync_entry(). I am not sure if CCI build
> > is impacting this in some way.
>
>
> Even I have analyzed this but did not find any reason why the
> streamed_txns list should be anything other than NULL. The only thing
> is we are initializing the entry->streamed_txns to NULL and the list
> free is checking "if (list == NIL)" then return. However IMHO, that
> should not be an issue becase NIL is defined as (List*) NULL.
>

Yeah, that is not the issue but it is better to initialize it with NIL
for the sake of consistency. The basic issue here was we were trying
to open/lock the relation(s) before initializing this list. Now, when
we process the invalidations during open relation, we try to access
this list in rel_sync_cache_relation_cb and that leads to assertion
failure. I have reproduced the exact scenario of 010_truncate.pl via
debugger. Basically, the backend on publisher has sent the
invalidation after truncating the relation 'tab1' and while processing
the truncate message if WALSender receives that message exactly after
creating the RelSyncEntry for 'tab1', the Assertion shown in BF can be
reproduced.

The attached patch will fix the issue. What do you think?

--
With Regards,
Amit Kapila.

Attachment Content-Type Size
v1-0001-Fix-initialization-of-RelationSyncEntry-for-strea.patch application/octet-stream 2.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2020-09-14 11:34:53 Re: Division in dynahash.c due to HASH_FFACTOR
Previous Message Ashutosh Sharma 2020-09-14 10:26:07 Re: recovering from "found xmin ... from before relfrozenxid ..."