Re: logical decoding bug: segfault in ReorderBufferToastReplace()

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Jeremy Schneider <schnjere(at)amazon(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical decoding bug: segfault in ReorderBufferToastReplace()
Date: 2021-06-05 06:42:49
Message-ID: CAA4eK1+LG8rpwTt0F8M+JPEWMET0mP31YiGaMuEC_VCm1=8RrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-committers pgsql-hackers

On Sat, Jun 5, 2021 at 5:05 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> On 2021-Jun-04, Jeremy Schneider wrote:
>
> > ERROR: XX000: could not open relation with OID 0
> > LOCATION: ReorderBufferToastReplace, reorderbuffer.c:305
>
> Hah.
>
> It seems to me that this code should silently return if
> rd_rel->reltoastrelid == 0; just like in the case of
> txn->toast_hash == NULL. It evidently means that no datum can be
> toasted, and therefor no toast replacement is needed.
>

Even, if this fixes the issue, I guess it is better to find why this
happens? I think the reason why the code is giving an error is that
after toast insertions we always expect the insert on the main table
of toast table, but if there happens to be a case where after toast
insertion, instead of getting the insertion on the main table we get
an insert in some other table then you will see this error. I think
this can happen for speculative insertions where insertions lead to a
toast table insert, then we get a speculative abort record, and then
insertion on some other table. The main thing is currently decoding
code ignores speculative aborts due to which such a problem can occur.
Now, there could be other cases where such a problem can happen but if
my theory is correct then the patch we are discussing in the thread
[1] should solve this problem.

Jeremy, is this problem reproducible? Can we get a testcase or
pg_waldump output of previous WAL records?

[1] - https://www.postgresql.org/message-id/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P%2BXgsNAViug15Fm99jA%40mail.gmail.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Noah Misch 2021-06-05 20:55:45 Re: BUG #16961: Could not access status of transaction
Previous Message Dilip Kumar 2021-06-05 04:44:42 Re: logical decoding bug: segfault in ReorderBufferToastReplace()

Browse pgsql-committers by date

  From Date Subject
Next Message Peter Eisentraut 2021-06-05 07:05:06 pgsql: doc: Make terminology in glossary consistent
Previous Message Peter Eisentraut 2021-06-05 05:58:12 pgsql: gitattributes: Add new entry to silence whitespace error

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2021-06-05 07:29:14 Re: DELETE CASCADE
Previous Message Bharath Rupireddy 2021-06-05 06:36:46 Re: A new function to wait for the backend exit after termination