Logical replication: stuck spinlock at ReplicationSlotRelease

From: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Logical replication: stuck spinlock at ReplicationSlotRelease
Date: 2017-06-21 13:02:16
Message-ID: A737B7A37273E048B164557ADEF4A58B53A5C8E8@ntex2010i.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While playing with HEAD as of d14c85ed,
I ran into the following:

CREATE DATABASE source;
CREATE DATABASE recipient;

\c source
CREATE TABLE repli(id integer PRIMARY KEY, val text NOT NULL);
INSERT INTO repli VALUES (1, 'one');
CREATE PUBLICATION repsend FOR TABLE repli;
SELECT pg_create_logical_replication_slot('reprec', 'pgoutput');

\c recipient
CREATE TABLE repli(id integer PRIMARY KEY, val text NOT NULL);
CREATE SUBSCRIPTION reprec CONNECTION 'port=5432 dbname=source user=postgres'
PUBLICATION repsend WITH (create_slot='false', slot_name='reprec');
SELECT * FROM repli;
┌────┬─────┐
│ id │ val │
├────┼─────┤
│ 1 │ one │
└────┴─────┘
(1 row)

Looks good.
Now I try to produce a replication conflict:

INSERT INTO repli VALUES (2, 'block');

\c source
INSERT INTO repli VALUES (2, 'two');

After a while the server crashes, and the log looks like this:

2017-06-21 14:55:07.002 CEST [23096] ERROR: duplicate key value violates unique constraint "repli_pkey"
2017-06-21 14:55:07.002 CEST [23096] DETAIL: Key (id)=(2) already exists.
2017-06-21 14:55:07.003 CEST [23070] LOG: worker process: logical replication worker for subscription 28681 (PID 23096) exited with exit code 1
2017-06-21 14:55:07.006 CEST [23117] LOG: logical replication apply worker for subscription "reprec" has started
DEBUG: received replication command: IDENTIFY_SYSTEM
DEBUG: received replication command: START_REPLICATION SLOT "reprec" LOGICAL 0/0 (proto_version '1', publication_names '"repsend"')
2017-06-21 14:55:07.008 CEST [23118] LOG: starting logical decoding for slot "reprec"
2017-06-21 14:55:07.008 CEST [23118] DETAIL: streaming transactions committing after 0/51A67B8, reading WAL from 0/51A66C8
LOG: starting logical decoding for slot "reprec"
DETAIL: streaming transactions committing after 0/51A67B8, reading WAL from 0/51A66C8
2017-06-21 14:55:07.008 CEST [23118] LOG: logical decoding found consistent point at 0/51A66C8
2017-06-21 14:55:07.008 CEST [23118] DETAIL: There are no running transactions.
DEBUG: sending replication keepalive
LOG: logical decoding found consistent point at 0/51A66C8
DETAIL: There are no running transactions.
2017-06-21 14:55:07.009 CEST [23117] ERROR: duplicate key value violates unique constraint "repli_pkey"
2017-06-21 14:55:07.009 CEST [23117] DETAIL: Key (id)=(2) already exists.
2017-06-21 14:55:07.010 CEST [23070] LOG: worker process: logical replication worker for subscription 28681 (PID 23117) exited with exit code 1
2017-06-21 14:55:12.019 CEST [23122] LOG: logical replication apply worker for subscription "reprec" has started
DEBUG: received replication command: IDENTIFY_SYSTEM
2017-06-21 14:55:12.021 CEST [23124] LOG: starting logical decoding for slot "reprec"
2017-06-21 14:55:12.021 CEST [23124] DETAIL: streaming transactions committing after 0/51A67B8, reading WAL from 0/51A66C8
DEBUG: received replication command: START_REPLICATION SLOT "reprec" LOGICAL 0/0 (proto_version '1', publication_names '"repsend"')
2017-06-21 14:55:12.022 CEST [23124] LOG: logical decoding found consistent point at 0/51A66C8
2017-06-21 14:55:12.022 CEST [23124] DETAIL: There are no running transactions.
LOG: starting logical decoding for slot "reprec"
DETAIL: streaming transactions committing after 0/51A67B8, reading WAL from 0/51A66C8
DEBUG: sending replication keepalive
LOG: logical decoding found consistent point at 0/51A66C8
DETAIL: There are no running transactions.
2017-06-21 14:55:12.023 CEST [23122] ERROR: duplicate key value violates unique constraint "repli_pkey"
2017-06-21 14:55:12.023 CEST [23122] DETAIL: Key (id)=(2) already exists.
2017-06-21 14:55:12.024 CEST [23070] LOG: worker process: logical replication worker for subscription 28681 (PID 23122) exited with exit code 1
2017-06-21 14:55:12.033 CEST [23124] LOG: could not send data to client: Broken pipe
2017-06-21 14:55:12.033 CEST [23124] FATAL: connection to client lost
2017-06-21 14:55:17.032 CEST [23133] LOG: logical replication apply worker for subscription "reprec" has started
DEBUG: received replication command: IDENTIFY_SYSTEM
DEBUG: received replication command: START_REPLICATION SLOT "reprec" LOGICAL 0/0 (proto_version '1', publication_names '"repsend"')
2017-06-21 14:57:24.552 CEST [23124] PANIC: stuck spinlock detected at ReplicationSlotRelease, slot.c:394
2017-06-21 14:57:24.885 CEST [23070] LOG: server process (PID 23124) was terminated by signal 6: Aborted
2017-06-21 14:57:24.885 CEST [23070] LOG: terminating any other active server processes
2017-06-21 14:57:24.887 CEST [23134] LOG: could not send data to client: Broken pipe
2017-06-21 14:57:24.890 CEST [23070] LOG: all server processes terminated; reinitializing

Yours,
Laurenz Albe

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kuntal Ghosh 2017-06-21 13:20:58 Re: Autovacuum launcher occurs error when cancelled by SIGINT
Previous Message Dilip Kumar 2017-06-21 12:57:33 Re: Default Partition for Range