Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker

From: Ondřej Jirman <ienieghapheoghaiwida(at)xff(dot)cz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker
Date: 2019-11-22 17:32:04
Message-ID: 20191122173204.hevpm4hgpk2tiyl6@core.my.home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Nov 22, 2019 at 11:50:03AM -0500, Tom Lane wrote:
> Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> > On Fri, Nov 22, 2019 at 01:54:01AM +0100, Ondřej Jirman wrote:
> >> On Thu, Nov 21, 2019 at 07:45:34PM -0500, Tom Lane wrote:
> >>> =?utf-8?Q?Ond=C5=99ej?= Jirman <ienieghapheoghaiwida(at)xff(dot)cz> writes:
> >>>> newtup.changed
> >>>> {true, true, false, true, true, true, true, true, false <repeats 1656 times>}
>
> >>> So column 3 is not getting replaced.
>
> > Can you show us the attribute list as defined in the system, including
> > e.g. dropped columns? That is, something like
> > SELECT attnum, attname, atttypid FROM pg_attribute
> > WHERE attrelid = 'public.videos'::regclass;
> > both from the published and subscriber.
>
> After digging around a bit more in the logrep code, it seems like we'd
> only have a situation with newtup.changed[i] = false if the publisher
> had hit this bit in logicalrep_write_tuple:
>
> else if (att->attlen == -1 && VARATT_IS_EXTERNAL_ONDISK(values[i]))
> {
> pq_sendbyte(out, 'u'); /* unchanged toast column */
> continue;
> }
>
> So this seems to indicate that Ondřej's case involves an update on a row
> that contained a toasted-out-of-line column that did not get updated.
> That's different from the trigger conditions that I postulated, but the
> end result is the same: slot_modify_cstrings would have a case where
> it's supposed to retain the old data for a pass-by-ref column, and it'd
> fail to do that correctly.

Yes, it was UPDATE videos SET played = TRUE; on a row that had 38kB BYTEA
value in cover_image.

Thank you both, Tom and Tomas, for your help.

regards,
o.

> I also discovered that before v12, the calling code looked like this:
>
> ExecStoreTuple(localslot->tts_tuple, remoteslot, InvalidBuffer, false);
> slot_modify_cstrings(remoteslot, rel, newtup.values, newtup.changed);
>
> so that at that time, "remoteslot" indeed did not have its own local
> copy of the tuple, and thus the code failed to fail. Perhaps this was
> intentional on the part of whoever wrote this, but it's still an
> unacceptable abuse of the API if you ask me.
>
> I added some tests involving dropped columns to what I had last night
> and pushed it. However, looking at this now, I see that we really
> still don't have enough coverage --- the code path I showed above
> is not hit by the regression tests, according to the code coverage
> bot. I don't think it's really acceptable for such a corner case
> bit of the logrep protocol to not get tested :-(
>
> regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Ondřej Jirman 2019-11-22 17:40:03 Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker
Previous Message Tom Lane 2019-11-22 16:50:03 Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker