Re: BUG #14758: Segfault with logical replication on a function index

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: scott(at)deltaex(dot)com
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14758: Segfault with logical replication on a function index
Date: 2017-07-28 10:07:59
Message-ID: CAD21AoCHutPoUdtZg2RHDneXEFnbL3zp+FcQkMrZ8Lrfa3kZLQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Mon, Jul 24, 2017 at 4:22 PM, <scott(at)deltaex(dot)com> wrote:
> The following bug has been logged on the website:
>
> Bug reference: 14758
> Logged by: Scott Milliken
> Email address: scott(at)deltaex(dot)com
> PostgreSQL version: 10beta2
> Operating system: Linux 4.10.0-27-generic #30~16.04.2-Ubuntu S
> Description:
>
> I'm testing logical replication on 10beta2, and found a segfault that I can
> reliably reproduce with an index on a not-actually immutable function.
>
> Here's the function in question:
>
> ```
> CREATE OR REPLACE FUNCTION public.immutable_random(integer)
> RETURNS double precision
> LANGUAGE sql
> IMMUTABLE
> AS $function$SELECT random();
> $function$;
> ```
>
> It's not actually immutable since it's calling random (a hack to get an
> efficient random sort on a table).
>
> (Aside: I'd understand if it errored on creation of the index, but would
> really prefer to keep using this instead of tablesample because it's fast,
> deterministic, and doesn't have sampling biases like the SYSTEM sampling.)
>
>
> Here's full reproduction instructions:
>
>
> Primary:
> ```
> mkdir -p /tmp/test-seg0
> PGPORT=5301 initdb -D /tmp/test-seg0
> echo "wal_level = logical" >> /tmp/test-seg0/postgresql.conf
> PGPORT=5301 pg_ctl -D /tmp/test-seg0 start
> for (( ; ; )); do if pg_isready -d postgres -p 5301; then break; fi; sleep
> 1; done
> psql -p 5301 postgres -c "CREATE USER test WITH PASSWORD 'test' SUPERUSER
> CREATEDB CREATEROLE LOGIN REPLICATION BYPASSRLS;"
> createdb -p 5301 -E utf8 test
>
> psql -p 5301 -U test test -c "CREATE TABLE testtbl (id int, name text);"
> psql -p 5301 -U test test -c "ALTER TABLE testtbl ADD CONSTRAINT
> testtbl_pkey PRIMARY KEY (id);"
> psql -p 5301 -U test test -c "CREATE PUBLICATION testpub FOR TABLE
> testtbl;"
> psql -p 5301 -U test test -c "INSERT INTO testtbl (id, name) VALUES (1,
> 'a');"
> ```
>
> Secondary:
> ```
> mkdir -p /tmp/test-seg1
> PGPORT=5302 initdb -D /tmp/test-seg1
> PGPORT=5302 pg_ctl -D /tmp/test-seg1 start
> for (( ; ; )); do if pg_isready -d postgres -p 5302; then break; fi; sleep
> 1; done
> psql -p 5302 postgres -c "CREATE USER test WITH PASSWORD 'test' SUPERUSER
> CREATEDB CREATEROLE LOGIN REPLICATION BYPASSRLS;"
> createdb -p 5302 -E utf8 test
>
> psql -p 5302 -U test test -c "CREATE TABLE testtbl (id int, name text);"
> psql -p 5302 -U test test -c "ALTER TABLE testtbl ADD CONSTRAINT
> testtbl_pkey PRIMARY KEY (id);"
> psql -p 5302 -U test test -c 'CREATE FUNCTION
> public.immutable_random(integer) RETURNS double precision LANGUAGE sql
> IMMUTABLE AS $function$ SELECT random(); $function$'
> psql -p 5302 -U test test -c "CREATE INDEX ix_testtbl_random ON testtbl
> USING btree (immutable_random(id));"
> psql -p 5302 -U test test -c "CREATE SUBSCRIPTION test0_testpub CONNECTION
> 'port=5301 user=test dbname=test' PUBLICATION testpub;"
> ```
>
> The secondary crashes with a segfault:
>
> ```
> 2017-07-23 23:55:37.961 PDT [4823] LOG: logical replication table
> synchronization worker for subscription "test0_testpub", table "testtbl"
> has started
> 2017-07-23 23:55:38.244 PDT [4758] LOG: worker process: logical replication
> worker for subscription 16396 sync 16386 (PID 4823) was terminated by signal
> 11: Segmentation fault
> 2017-07-23 23:55:38.244 PDT [4758] LOG: terminating any other active server
> processes
> 2017-07-23 23:55:38.245 PDT [4763] WARNING: terminating connection because
> of crash of another server process
> 2017-07-23 23:55:38.245 PDT [4763] DETAIL: The postmaster has commanded
> this server process to roll back the current transaction and exit, because
> another server process exited
> abnormally and possibly corrupted shared memory.
> 2017-07-23 23:55:38.245 PDT [4763] HINT: In a moment you should be able to
> reconnect to the database and repeat your command.
> 2017-07-23 23:55:38.247 PDT [4758] LOG: all server processes terminated;
> reinitializing
> 2017-07-23 23:55:38.256 PDT [4826] LOG: database system was interrupted;
> last known up at 2017-07-23 23:55:36 PDT
> 2017-07-23 23:55:38.809 PDT [4826] LOG: database system was not properly
> shut down; automatic recovery in progress
> 2017-07-23 23:55:38.812 PDT [4826] LOG: redo starts at 0/173AEA0
> 2017-07-23 23:55:38.815 PDT [4826] LOG: invalid record length at 0/17B50B0:
> wanted 24, got 0
> 2017-07-23 23:55:38.815 PDT [4826] LOG: redo done at 0/17B5070
> 2017-07-23 23:55:38.815 PDT [4826] LOG: last completed transaction was at
> log time 2017-07-23 23:55:37.962957-07
> ```
>

Thank you for the reporting and precise reproducing steps!
I could reproduced this issue and it seems to me that the cause of
this is that the table sync worker didn't get a snapshot before
starting table copy. Attached patch fixes this problem.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
fix_table_sync.patch application/octet-stream 595 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Etsuro Fujita 2017-07-28 11:55:30 Re: [HACKERS] BUG #14759: insert into foreign data partitions fail
Previous Message Rick Otten 2017-07-28 00:54:31 Re: signal 11 segfaults with parallel workers

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2017-07-28 11:19:13 Re: GSoC 2017: Foreign Key Arrays
Previous Message Ashutosh Sharma 2017-07-28 09:00:17 Re: pl/perl extension fails on Windows