BUG #16226: background worker "logical replication worker" (PID <pid>) was terminated by signal 11: Segmentation

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: vadim(at)postgrespro(dot)co(dot)il
Subject: BUG #16226: background worker "logical replication worker" (PID <pid>) was terminated by signal 11: Segmentation
Date: 2020-01-22 07:43:23
Message-ID: 16226-690712ac4467269a@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 16226
Logged by: Vadim Yatsenko
Email address: vadim(at)postgrespro(dot)co(dot)il
PostgreSQL version: 12.1
Operating system: CentOS Linux 7 (Core) Kernel: Linux 3.10.0-1062.9.
Description:

Hi!
We have 2 PostgreSQL servers with logical replication between Postgres 11.6
(Primary) and 12.1 (Logical). Some times ago, we changed column type in a 2
big tables from integer to text:

oid | nspname | table | rows |
pages | size | index_size | total_size
------------+----------+--------------------------------------+-----------+---------+------------+------------+----------------
573803236 | public | acc_rep | 283359648 | 9276153
| 71 GB | 50 GB | 121 GB
586884963 | public | acc_rep_actual | 283280448 | 8855209 | 68
GB | 27 GB | 95 GB

, this of course led to a full rewrite both tables. We repated this
operation on both servers. And after that we started to get error like
"background worker "logical replication worker" (PID <pid>) was terminated
by signal 11: Segmentation fault" and server goes to recovery mode. This
error repated several times every 10-15 minutes. When we connected to
logical replication worker process using gdb, we recieved the following
message:

Program received signal SIGSEGV, Segmentation fault.
0x00007f04e5060426 in __memcpy_ssse3_back () from /lib64/libc.so.6
(gdb) bt
#0 0x00007f04e5060426 in __memcpy_ssse3_back () from /lib64/libc.so.6
#1 0x000000000061e1cb in memcpy (__len=224588909, __src=<optimized out>,
__dest=0x101b870) at /usr/include/bits/string3.h:51
#2 tts_virtual_materialize (slot=0x101b0b0) at execTuples.c:235
#3 0x000000000061f594 in ExecFetchSlotHeapTuple (slot=0x101b0b0,
materialize=<optimized out>, shouldFree=0x7ffc331ae187) at
execTuples.c:1624
#4 0x00000000004c868c in heapam_tuple_update (relation=0x7f04e77d8470,
otid=0x101b240, slot=0x101b0b0, cid=19778, snapshot=<optimized out>,
crosscheck=0x0, wait=true, tmfd=0x7ffc331ae210, lockmode=0x7ffc331ae20c,
update_indexes=0x7ffc331ae277) at heapam_handler.c:325
#5 0x00000000004f5524 in table_tuple_update (update_indexes=0x7ffc331ae277,
lockmode=0x7ffc331ae20c, tmfd=0x7ffc331ae210, wait=true, crosscheck=0x0,
snapshot=0x0, cid=<optimized out>, slot=0x101b0b0, otid=0x101b240,
rel=0x7f04e77d8470)
at ../../../../src/include/access/tableam.h:1261
#6 simple_table_tuple_update (rel=0x7f04e77d8470,
otid=otid(at)entry=0x101b240, slot=slot(at)entry=0x101b0b0, snapshot=0x0,
update_indexes=update_indexes(at)entry=0x7ffc331ae277) at tableam.c:325
#7 0x000000000061bbc0 in ExecSimpleRelationUpdate
(estate=estate(at)entry=0x1031660, epqstate=epqstate(at)entry=0x7ffc331ae330,
searchslot=searchslot(at)entry=0x101b210, slot=slot(at)entry=0x101b0b0) at
execReplication.c:493
#8 0x000000000071100d in apply_handle_update (s=s(at)entry=0x7ffc331b5910) at
worker.c:750
#9 0x0000000000711ca7 in apply_dispatch (s=0x7ffc331b5910) at
worker.c:968
#10 LogicalRepApplyLoop (last_received=31760851165536) at worker.c:1176
#11 ApplyWorkerMain (main_arg=<optimized out>) at worker.c:1734
#12 0x00000000006e2d4a in StartBackgroundWorker () at bgworker.c:834
#13 0x00000000006ef1d3 in do_start_bgworker (rw=0xf71970) at
postmaster.c:5770
#14 maybe_start_bgworkers () at postmaster.c:5996
#15 0x00000000006efde5 in sigusr1_handler (postgres_signal_arg=<optimized
out>) at postmaster.c:5167
#16 <signal handler called>
#17 0x00007f04e4fff933 in __select_nocancel () from /lib64/libc.so.6
#18 0x00000000004833d4 in ServerLoop () at postmaster.c:1668
#19 0x00000000006f106f in PostmasterMain (argc=argc(at)entry=3,
argv=argv(at)entry=0xf45300) at postmaster.c:1377
#20 0x0000000000484cd3 in main (argc=3, argv=0xf45300) at main.c:228
(gdb)
#0 0x00007f04e5060426 in __memcpy_ssse3_back () from /lib64/libc.so.6
#1 0x000000000061e1cb in memcpy (__len=224588909, __src=<optimized out>,
__dest=0x101b870) at /usr/include/bits/string3.h:51
#2 tts_virtual_materialize (slot=0x101b0b0) at execTuples.c:235
#3 0x000000000061f594 in ExecFetchSlotHeapTuple (slot=0x101b0b0,
materialize=<optimized out>, shouldFree=0x7ffc331ae187) at
execTuples.c:1624
#4 0x00000000004c868c in heapam_tuple_update (relation=0x7f04e77d8470,
otid=0x101b240, slot=0x101b0b0, cid=19778, snapshot=<optimized out>,
crosscheck=0x0, wait=true, tmfd=0x7ffc331ae210, lockmode=0x7ffc331ae20c,
update_indexes=0x7ffc331ae277) at heapam_handler.c:325
#5 0x00000000004f5524 in table_tuple_update (update_indexes=0x7ffc331ae277,
lockmode=0x7ffc331ae20c, tmfd=0x7ffc331ae210, wait=true, crosscheck=0x0,
snapshot=0x0, cid=<optimized out>, slot=0x101b0b0, otid=0x101b240,
rel=0x7f04e77d8470)
at ../../../../src/include/access/tableam.h:1261
#6 simple_table_tuple_update (rel=0x7f04e77d8470,
otid=otid(at)entry=0x101b240, slot=slot(at)entry=0x101b0b0, snapshot=0x0,
update_indexes=update_indexes(at)entry=0x7ffc331ae277) at tableam.c:325
#7 0x000000000061bbc0 in ExecSimpleRelationUpdate
(estate=estate(at)entry=0x1031660, epqstate=epqstate(at)entry=0x7ffc331ae330,
searchslot=searchslot(at)entry=0x101b210, slot=slot(at)entry=0x101b0b0) at
execReplication.c:493
#8 0x000000000071100d in apply_handle_update (s=s(at)entry=0x7ffc331b5910) at
worker.c:750
#9 0x0000000000711ca7 in apply_dispatch (s=0x7ffc331b5910) at
worker.c:968
#10 LogicalRepApplyLoop (last_received=31760851165536) at worker.c:1176
#11 ApplyWorkerMain (main_arg=<optimized out>) at worker.c:1734
#12 0x00000000006e2d4a in StartBackgroundWorker () at bgworker.c:834
#13 0x00000000006ef1d3 in do_start_bgworker (rw=0xf71970) at
postmaster.c:5770
#14 maybe_start_bgworkers () at postmaster.c:5996
#15 0x00000000006efde5 in sigusr1_handler (postgres_signal_arg=<optimized
out>) at postmaster.c:5167
#16 <signal handler called>
#17 0x00007f04e4fff933 in __select_nocancel () from /lib64/libc.so.6
#18 0x00000000004833d4 in ServerLoop () at postmaster.c:1668
#19 0x00000000006f106f in PostmasterMain (argc=argc(at)entry=3,
argv=argv(at)entry=0xf45300) at postmaster.c:1377
#20 0x0000000000484cd3 in main (argc=3, argv=0xf45300) at main.c:228

Error from /var/log/messages:

Jan 21 02:01:59 dbmr01 kernel: postmaster[50093]: segfault at e76f13d ip
00007fa8f701ffa0 sp 00007ffc8ae23498 error 4 in
libc-2.17.so[7fa8f6ed4000+1c2000]
Jan 21 02:01:59 dbmr01 abrt-hook-ccpp: Process 50093 (postgres) of user 26
killed by SIGSEGV - dumping core
Jan 21 02:02:15 dbmr01 abrt-server: Package 'postgresql12-server' isn't
signed with proper key
Jan 21 02:02:15 dbmr01 abrt-server: 'post-create' on
'/var/spool/abrt/ccpp-2020-01-21-02:01:59-50093' exited with 1
Jan 21 02:02:15 dbmr01 abrt-server: Deleting problem directory
'/var/spool/abrt/ccpp-2020-01-21-02:01:59-50093'

The issue was solved by removing these tables from publication on Primary.
Can you advise something? Is this a bug or didn't I take something into
account when working with logical replication?
Thank you!

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Eisentraut 2020-01-22 09:22:55 Re: BUG #16226: background worker "logical replication worker" (PID <pid>) was terminated by signal 11: Segmentation
Previous Message PG Bug reporting form 2020-01-22 06:49:43 BUG #16225: EL8 - PGDG postgresql* packages conflict with appstream postgresql packages