From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | andrey(dot)salnikov(at)dataegret(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #16125: Crash of PostgreSQL's wal sender during logical replication |
Date: | 2019-11-18 22:24:16 |
Message-ID: | 20191118222416.dkn5cdmbxmtcemaf@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi,
On 2019-11-18 21:58:16 +0100, Tomas Vondra wrote:
> and the ReorderBufferToastReplace does this:
>
> newtup = change->data.tp.newtuple;
>
> heap_deform_tuple(&newtup->tuple, desc, attrs, isnull);
>
> but that fails, because the tuple pointer happens to be 0x8, which is
> clearly bogus. Not sure where that comes from, I don't recognize that as
> a typical patter.
It indicates that change->data.tp.newtuple is NULL,
afaict. newtup->tuple boils down to
((char *) newtup->tuple) + offsetof(ReorderBufferTupleBuf, tuple)
and offsetof(ReorderBufferTupleBuf, tuple) is 0x8.
> Can you create a core dump (see [1]), and print 'change' and 'txn' in
> frame #2? I wonder if some the other fields are bogus too (but it can't
> be entirely true ...), and if the transaction got serialized.
Please print change and *change, both, please.
I suspect what's happening is that somehow a change that shouldn't have
toast changes - e.g. a DELETE - somehow has toast changes. Which then
triggers a failure in ReorderBufferToastReplace(), which expects
newtuple to be valid.
It's probably worthwhile to add an elog(ERROR) check for this, even if
this does not turn out to be the case.
> > This behaviour does not depends on defined data in tables, because we see it
> > in different database with different sets of tables in publications.
>
> I'm not sure I really believe that. Surely there has to be something
> special about your schema, or possibly access patter that triggers this
> bug in your environment and not elsewhere.
Yea. Are there any C triggers present? Any unusual extensions? Users of
the transaction hook, for example?
> > Looks like a real issue in logical replication.
> > I will happy to provide an additional information about that issue, but i
> > should know what else to need to collect for helping to solve this
> > problem.
> >
>
> Well, if you can create a reproducer, that'd be the best option, because
> then we can investigate locally instead of the ping-ping here.
>
> But if that's not possible, let's start with the schema and the
> additional information from the core file.
>
> I'd also like to see the contents of the WAL, particularly for the XID
> triggering this issue. Please run pg_waldump and see how much data is
> there for XID 1667601527. It does commit at 25EE/D6DE6EB8, not sure
> where it starts. It may have subtransactions, so don't do just grep.
Yea, that'd be helpful.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Adam Scott | 2019-11-18 22:51:25 | Re: BUG #16122: segfault pg_detoast_datum (datum=0x0) at fmgr.c:1833 numrange query |
Previous Message | Elvis Pranskevichus | 2019-11-18 21:37:08 | Re: BUG #16121: 12 regression: Volatile function in target list subquery behave as stable |