From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Ondřej Jirman <ienieghapheoghaiwida(at)xff(dot)cz> |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker |
Date: | 2019-11-21 16:57:52 |
Message-ID: | 20191121165752.dffge6bh756xlfdg@development |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Thu, Nov 21, 2019 at 05:15:02PM +0100, Ondřej Jirman wrote:
>On Thu, Nov 21, 2019 at 04:57:07PM +0100, Ondřej Jirman wrote:
>>
>> Maybe it has something to do with my upgrade method. I
>> dumped/restored the replica with pg_dumpall, and then just proceded
>> to enable subscription and refresh publication with (copy_data=false)
>> for all my subscriptions.
>
>OTOH, it may not. There are 2 more databases replicated the same way
>from the same database cluster, and they don't crash the replica
>server, and continue replicating. The one of the other databases also
>has bytea columns in some of the tables.
>
>It really just seems related to the machine restart (a regular one)
>that I did on the primary, minutes later replica crashed, and kept
>crashing ever since whenever connecting to the primary for the hometv
>database.
>
Hmmm. A restart of the primary certainly should not cause any such
damage, that'd be a bug too. And it'd be a bit strange that it correctly
sends the data and it crashes the replica. How exactly did you restart
the primary? What mode - smart/fast/immediate?
>So maybe something's wrong with the replica database (maybe because the
>connection got killed by the walsender at unfortunate time), rather
>than the original database, because I can replicate the original DB
>afresh into a new copy just fine and other databases continue
>replicating just fine if I disable the crashing subscription.
>
Possibly, but what would be the damaged bit? The only thing I can think
of is the replication slot info (i.e. snapshot), and I know there were
some timing issues in the serialization.
How far is the change from the restart point of the slot (visible in
pg_replication_slots)? If there are many changes since then, that'd mean
the corrupted snapshot is unlikely.
There's a lot of moving parts in this - you're replicating between major
versions, and from ARM to x86. All of that should work, of course, but
maybe there's a bug somewhere. So it might take time to investigate and
fix. Thanks for you patience ;-)
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2019-11-21 17:01:52 | Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker |
Previous Message | PG Bug reporting form | 2019-11-21 16:46:58 | BUG #16130: planner does not pick unique btree index and goes for seq scan but unsafe hash index works. |