Re: logical decoding bug: segfault in ReorderBufferToastReplace()

From: Andres Freund <andres(at)anarazel(dot)de>
To: Jeremy Schneider <schnjere(at)amazon(dot)com>
Cc: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical decoding bug: segfault in ReorderBufferToastReplace()
Date: 2019-12-14 00:25:13
Message-ID: 20191214002513.p5rw6vqenzvrud5y@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-committers pgsql-hackers

Hi,

On 2019-12-13 16:13:35 -0800, Jeremy Schneider wrote:
> On 12/11/19 08:35, Andres Freund wrote:
> > I think we need to see pg_waldump output for the preceding records. That
> > might allow us to see why there's a toast record that's being associated
> > with this table, despite there not being a toast table.
> Unfortunately the WAL logs are no longer available at this time.  :(
>
> I did a little poking around in the core file and searching source code
> but didn't find anything yet.  Is there any memory structure that would
> have the preceding/following records cached in memory?  If so then I
> might be able to extract this from the core dumps.

Well, not the records directly, but the changes could be, depending on
the size of the changes. That'd already help. It depends a bit on
whether there are subtransactions or not (txn->nsubtxns will tell
you). Within one transaction, the currently loaded (i.e. not changes
that are spilled to disk, and haven't currently been restored - see
txn->serialized) changes are in ReorderBufferTXN->changes.

> > Seems like we clearly should add an elog(ERROR) here, so we error out,
> > rather than crash.

> done - in the commit that I replied to when I started this thread :)
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=69f883fef14a3fc5849126799278abcc43f40f56

Ah, I was actually thinking this is the thread of a similar sounding
bug, where ReorderBufferToastReplace would crash because there isn't
actually a new tuple - there somehow toast changes exist for a delete.

> > Is this version of postgres effectively unmodified in any potentially
> > relevant region (snapshot computations, generation of WAL records, ...)?
> It's not changed from community code in any relevant regions.  (Also,
> FYI, this is not Aurora.)

Well, I've heard mutterings that plain RDS postgres had some efficiency
improvements around snapshots (in the GetSnapshotData() sense) - and
that's an area where slightly wrong changes could quite plausibly
cause a bug like this.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Jeff Janes 2019-12-14 03:03:40 Re: BUG #16162: create index using gist_trgm_ops leads to panic
Previous Message Thomas Munro 2019-12-14 00:22:39 Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash

Browse pgsql-committers by date

  From Date Subject
Next Message Thomas Munro 2019-12-14 03:48:12 pgsql: Don't use _mdfd_getseg() in mdsyncfiletag().
Previous Message Jeremy Schneider 2019-12-14 00:13:35 Re: logical decoding bug: segfault in ReorderBufferToastReplace()

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2019-12-14 00:48:39 Re: Why is get_actual_variable_range()'s use of SnapshotNonVacuumable safe during recovery?
Previous Message Jeremy Schneider 2019-12-14 00:13:35 Re: logical decoding bug: segfault in ReorderBufferToastReplace()