Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Ondřej Jirman <ienieghapheoghaiwida(at)xff(dot)cz>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker
Date: 2019-11-22 16:50:03
Message-ID: 26015.1574441403@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> On Fri, Nov 22, 2019 at 01:54:01AM +0100, Ondřej Jirman wrote:
>> On Thu, Nov 21, 2019 at 07:45:34PM -0500, Tom Lane wrote:
>>> =?utf-8?Q?Ond=C5=99ej?= Jirman <ienieghapheoghaiwida(at)xff(dot)cz> writes:
>>>> newtup.changed
>>>> {true, true, false, true, true, true, true, true, false <repeats 1656 times>}

>>> So column 3 is not getting replaced.

> Can you show us the attribute list as defined in the system, including
> e.g. dropped columns? That is, something like
> SELECT attnum, attname, atttypid FROM pg_attribute
> WHERE attrelid = 'public.videos'::regclass;
> both from the published and subscriber.

After digging around a bit more in the logrep code, it seems like we'd
only have a situation with newtup.changed[i] = false if the publisher
had hit this bit in logicalrep_write_tuple:

else if (att->attlen == -1 && VARATT_IS_EXTERNAL_ONDISK(values[i]))
{
pq_sendbyte(out, 'u'); /* unchanged toast column */
continue;
}

So this seems to indicate that Ondřej's case involves an update on a row
that contained a toasted-out-of-line column that did not get updated.
That's different from the trigger conditions that I postulated, but the
end result is the same: slot_modify_cstrings would have a case where
it's supposed to retain the old data for a pass-by-ref column, and it'd
fail to do that correctly.

I also discovered that before v12, the calling code looked like this:

ExecStoreTuple(localslot->tts_tuple, remoteslot, InvalidBuffer, false);
slot_modify_cstrings(remoteslot, rel, newtup.values, newtup.changed);

so that at that time, "remoteslot" indeed did not have its own local
copy of the tuple, and thus the code failed to fail. Perhaps this was
intentional on the part of whoever wrote this, but it's still an
unacceptable abuse of the API if you ask me.

I added some tests involving dropped columns to what I had last night
and pushed it. However, looking at this now, I see that we really
still don't have enough coverage --- the code path I showed above
is not hit by the regression tests, according to the code coverage
bot. I don't think it's really acceptable for such a corner case
bit of the logrep protocol to not get tested :-(

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Ondřej Jirman 2019-11-22 17:32:04 Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker
Previous Message Tomas Vondra 2019-11-22 16:12:48 Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker