Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Ondřej Jirman <ienieghapheoghaiwida(at)xff(dot)cz>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker
Date: 2019-11-21 21:41:15
Message-ID: 20191121214115.3tgefmvkmrzaim74@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Nov 21, 2019 at 07:58:44PM +0100, Ondřej Jirman wrote:
>On Thu, Nov 21, 2019 at 06:35:55PM +0100, Ondřej Jirman wrote:
>> One missing piece is what exactly is the contents of the outstanding output from
>> pgoutput plugin, that the replica crashes on and doesn't apply. Are there any
>> tools for inspecting the binary output from the pgoutput plugin? Maybe that can
>> provide a clue.
>
>So I've looked at it manually, and the segfaulting transaction doesn't make
>much sense to me. On primary a row with 37880 byte cover_image was inserted,
>but pgoutput plugin sends these records:
>
>B
>R - public.videos title cover_image metadata,...
>U - new row where cover_image tuple doesn't have any data, just a flag ('u')
>C
>
>Which means 'unchanged toast column' according to logical/proto.c
>
>Yet cover_image is defined as not null.
>
>I guess it's some special handling for VARATT_IS_EXTERNAL_ONDISK.

Actually, no. The second message is 'U' so update, not insert. And we
don't send TOAST values (oversized values, stored aside) that were not
updated. Which is this case, because the cover_image was not updated (as
implied by the 'u' flag).

For the record, the decoded data looks like this:

1) relation structure (8 columns)

R, 19048, public, videos, d, 8,
(true, id, 23 => int4, 0xffffffff)
(false, title, 25 => text, 0xffffffff)
(false, cover_image, 17 => bytea, 0xffffffff)
(false, metadata, 3802 => jsonb, 0xffffffff)
(false, category, 23 => int4, 0xffffffff)
(false, published, 1082 => date, 0xffffffff)
(false, added, 1114 => timestamp, 0xffffffff)
(false, played, 16 => uuid, 0xffffffff)

2) update, with the TOAST value undefined/unchanged

U, 19048, N, 8,
(4, 8393),
(86, Pazderka: O média se povede boj. Kdyby nás politici milovali, bylo by něco špatně),
u, /* unchanged bytea */
(313, {.. json ..}),
(3, 242),
(10, 2019-11-18),
(26, 2019-11-18 12:50:01.816376),
(1, t)

So it's perfectly well formatted, I think.

>
>To me this looks like this transaction is assuming data for bytea column
>cover_image are already replicated from earlier?
>
>Maybe it is not though? It's certainly breaking some assumption on the
>replica, because this is the precise point where segfault happens.
>

I think this more and more seems like some sort of bug in the apply
code, not resetting a NULL flag and interpreting the bogus value as
a valid pointer. Wouldn't the be the first such bug.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tomas Vondra 2019-11-21 22:08:54 Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker
Previous Message PG Bug reporting form 2019-11-21 19:01:07 BUG #16131: pg_upgrade 9 -> 11.6 and a database is missing