Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2020-09-14 07:53:23
Message-ID: CAFiTN-sHYjixVBxbVtxf-Q1BtuRwfYeyoksT=Z2eZDpW3jSTfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 14, 2020 at 8:48 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Sep 14, 2020 at 3:08 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
> > > Pushed.
> >
> > Observe the following reports:
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=idiacanthus&dt=2020-09-13%2016%3A54%3A03
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2020-09-10%2009%3A08%3A03
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=komodoensis&dt=2020-09-05%2020%3A22%3A02
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet&dt=2020-09-04%2001%3A52%3A03
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet&dt=2020-09-03%2020%3A54%3A04
> >
> > These are all on HEAD, and all within the last ten days, and I see
> > nothing comparable in any branch before that. So it's hard to avoid
> > the conclusion that somebody broke something about ten days ago.
> >
> > None of these animals provided gdb backtraces; but we do have a built-in
> > trace from several, and they all look like pgoutput.so is trying to
> > list_free() garbage, somewhere inside a relcache invalidation/rebuild
> > scenario:
> >
>
> Yeah, this is right, and here is some initial analysis. It seems to be
> failing in below code:
> rel_sync_cache_relation_cb(){ ...list_free(entry->streamed_txns);..}
>
> This list can have elements only in 'streaming' mode (need to enable
> 'streaming' with Create Subscription command) whereas none of the
> tests in 010_truncate.pl is using 'streaming', so this list should be
> empty (NULL). The two different assertion failures shown in BF reports
> in list_free code are as below:
> Assert(list->length > 0);
> Assert(list->length <= list->max_length);
>
> It seems to me that this list is not initialized properly when it is
> not used or maybe that is true in some special circumstances because
> we initialize it in get_rel_sync_entry(). I am not sure if CCI build
> is impacting this in some way.

Even I have analyzed this but did not find any reason why the
streamed_txns list should be anything other than NULL. The only thing
is we are initializing the entry->streamed_txns to NULL and the list
free is checking "if (list == NIL)" then return. However IMHO, that
should not be an issue becase NIL is defined as (List*) NULL. I am
doing further testing and investigation.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-09-14 08:04:54 Re: Gripes about walsender command processing
Previous Message Juan José Santamaría Flecha 2020-09-14 07:51:54 Re: A micro-optimisation for walkdir()