Re: logical decoding bug: segfault in ReorderBufferToastReplace()

From: Jeremy Schneider <schnjere(at)amazon(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical decoding bug: segfault in ReorderBufferToastReplace()
Date: 2019-12-20 23:21:30
Message-ID: 187dfed1-7d97-4a8c-2932-b8a3d4dce697@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-committers pgsql-hackers

On 12/13/19 16:25, Andres Freund wrote:
> On 2019-12-13 16:13:35 -0800, Jeremy Schneider wrote:
>> On 12/11/19 08:35, Andres Freund wrote:
>>> I think we need to see pg_waldump output for the preceding records. That
>>> might allow us to see why there's a toast record that's being associated
>>> with this table, despite there not being a toast table.
>> Unfortunately the WAL logs are no longer available at this time.  :(
>>
>> I did a little poking around in the core file and searching source code
>> but didn't find anything yet.  Is there any memory structure that would
>> have the preceding/following records cached in memory?  If so then I
>> might be able to extract this from the core dumps.
>
> Well, not the records directly, but the changes could be, depending on
> the size of the changes. That'd already help. It depends a bit on
> whether there are subtransactions or not (txn->nsubtxns will tell
> you). Within one transaction, the currently loaded (i.e. not changes
> that are spilled to disk, and haven't currently been restored - see
> txn->serialized) changes are in ReorderBufferTXN->changes.

I did include the txn in the original post to this thread; there are 357
changes in the transaction and they are all in memory (none spilled to
disk a.k.a. serialized). No subtransactions. However I do see that
"txn.has_catalog_changes = true" which makes me wonder if that's related
to the bug.

So... now I know... walking a dlist in gdb and dumping all the changes
is not exactly a walk in the park! Need some python magic like Tomas
Vondra's script that decodes Nodes. I was not yet successful today in
figuring out how to do this... so the changes are there in the core dump
but I can't get them yet. :)

I also dug around the ReorderBufferIterTXNState a little bit but there's
nothing that isn't already in the original post.

If anyone has a trick for walking a dlist in gdb that would be awesome...

I'm off for holidays and won't be working on this for a couple weeks;
not sure whether it'll be possible to get to the bottom of it. But I
hope there's enough info in this thread to at least get a head start if
someone hits it again in the future.

> Well, I've heard mutterings that plain RDS postgres had some efficiency
> improvements around snapshots (in the GetSnapshotData() sense) - and
> that's an area where slightly wrong changes could quite plausibly
> cause a bug like this.

Definitely no changes around snapshots. I've never even heard anyone
talk about making changes like that in RDS PostgreSQL - feels to me like
people at AWS want it to be as close as possible to postgresql.org code.

Aurora is different; it feels to me like the engineering org has more
license to make changes. For example they re-wrote the subtransaction
subsystem. No changes to GetSnapshotData though.

-Jeremy

--
Jeremy Schneider
Database Engineer
Amazon Web Services

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Zhihong Zhang 2019-12-21 00:39:28 Re: Indexing on JSONB field not working
Previous Message Jeff Janes 2019-12-20 22:57:37 Re: Indexing on JSONB field not working

Browse pgsql-committers by date

  From Date Subject
Next Message Michael Paquier 2019-12-21 02:18:06 Re: pgsql: Superuser can permit passwordless connections on postgres_fdw
Previous Message Andrew Dunstan 2019-12-20 22:22:56 Re: pgsql: Adjust test case added by commit 6136e94dc.

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Lorenz 2019-12-21 00:15:07 Re: Created feature for to_date() conversion using patterns 'YYYY-WW', 'YYYY-WW-D', 'YYYY-MM-W' and 'YYYY-MM-W-D'
Previous Message Bruce Momjian 2019-12-20 21:38:32 Re: Session WAL activity