Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2020-09-14 03:18:32
Message-ID: CAA4eK1LqpFqcmQhDEdTKtHCAC1vZ9eGr2JWJ6Fgvc64f+twDGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 14, 2020 at 3:08 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
> > Pushed.
>
> Observe the following reports:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=idiacanthus&dt=2020-09-13%2016%3A54%3A03
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2020-09-10%2009%3A08%3A03
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=komodoensis&dt=2020-09-05%2020%3A22%3A02
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet&dt=2020-09-04%2001%3A52%3A03
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet&dt=2020-09-03%2020%3A54%3A04
>
> These are all on HEAD, and all within the last ten days, and I see
> nothing comparable in any branch before that. So it's hard to avoid
> the conclusion that somebody broke something about ten days ago.
>
> None of these animals provided gdb backtraces; but we do have a built-in
> trace from several, and they all look like pgoutput.so is trying to
> list_free() garbage, somewhere inside a relcache invalidation/rebuild
> scenario:
>

Yeah, this is right, and here is some initial analysis. It seems to be
failing in below code:
rel_sync_cache_relation_cb(){ ...list_free(entry->streamed_txns);..}

This list can have elements only in 'streaming' mode (need to enable
'streaming' with Create Subscription command) whereas none of the
tests in 010_truncate.pl is using 'streaming', so this list should be
empty (NULL). The two different assertion failures shown in BF reports
in list_free code are as below:
Assert(list->length > 0);
Assert(list->length <= list->max_length);

It seems to me that this list is not initialized properly when it is
not used or maybe that is true in some special circumstances because
we initialize it in get_rel_sync_entry(). I am not sure if CCI build
is impacting this in some way.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2020-09-14 03:51:33 Re: ModifyTable overheads in generic plans
Previous Message Tom Lane 2020-09-14 02:48:51 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions