Re: BUG #14758: Segfault with logical replication on a function index

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Scott Milliken <scott(at)deltaex(dot)com>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #14758: Segfault with logical replication on a function index
Date: 2017-07-31 00:40:34
Message-ID: CAD21AoDfToXRXsc-pg86QUou07AsVrE0yFtCehG_S98-dNVrMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Moved to -hackers.

On Sat, Jul 29, 2017 at 4:35 AM, Scott Milliken <scott(at)deltaex(dot)com> wrote:
> Thank you Masahiko! I've tested and confirmed that this patch fixes the
> problem.
>

Thank you for the testing. This issue should be added to the open item
since this cause of the server crash. I'll add it.

> On Fri, Jul 28, 2017 at 3:07 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
>>
>> On Mon, Jul 24, 2017 at 4:22 PM, <scott(at)deltaex(dot)com> wrote:
>> > The following bug has been logged on the website:
>> >
>> > Bug reference: 14758
>> > Logged by: Scott Milliken
>> > Email address: scott(at)deltaex(dot)com
>> > PostgreSQL version: 10beta2
>> > Operating system: Linux 4.10.0-27-generic #30~16.04.2-Ubuntu S
>> > Description:
>> >
>> > I'm testing logical replication on 10beta2, and found a segfault that I
>> > can
>> > reliably reproduce with an index on a not-actually immutable function.
>> >
>> > Here's the function in question:
>> >
>> > ```
>> > CREATE OR REPLACE FUNCTION public.immutable_random(integer)
>> > RETURNS double precision
>> > LANGUAGE sql
>> > IMMUTABLE
>> > AS $function$SELECT random();
>> > $function$;
>> > ```
>> >
>> > It's not actually immutable since it's calling random (a hack to get an
>> > efficient random sort on a table).
>> >
>> > (Aside: I'd understand if it errored on creation of the index, but would
>> > really prefer to keep using this instead of tablesample because it's
>> > fast,
>> > deterministic, and doesn't have sampling biases like the SYSTEM
>> > sampling.)
>> >
>> >
>> > Here's full reproduction instructions:
>> >
>> >
>> > Primary:
>> > ```
>> > mkdir -p /tmp/test-seg0
>> > PGPORT=5301 initdb -D /tmp/test-seg0
>> > echo "wal_level = logical" >> /tmp/test-seg0/postgresql.conf
>> > PGPORT=5301 pg_ctl -D /tmp/test-seg0 start
>> > for (( ; ; )); do if pg_isready -d postgres -p 5301; then break; fi;
>> > sleep
>> > 1; done
>> > psql -p 5301 postgres -c "CREATE USER test WITH PASSWORD 'test'
>> > SUPERUSER
>> > CREATEDB CREATEROLE LOGIN REPLICATION BYPASSRLS;"
>> > createdb -p 5301 -E utf8 test
>> >
>> > psql -p 5301 -U test test -c "CREATE TABLE testtbl (id int, name text);"
>> > psql -p 5301 -U test test -c "ALTER TABLE testtbl ADD CONSTRAINT
>> > testtbl_pkey PRIMARY KEY (id);"
>> > psql -p 5301 -U test test -c "CREATE PUBLICATION testpub FOR TABLE
>> > testtbl;"
>> > psql -p 5301 -U test test -c "INSERT INTO testtbl (id, name) VALUES (1,
>> > 'a');"
>> > ```
>> >
>> > Secondary:
>> > ```
>> > mkdir -p /tmp/test-seg1
>> > PGPORT=5302 initdb -D /tmp/test-seg1
>> > PGPORT=5302 pg_ctl -D /tmp/test-seg1 start
>> > for (( ; ; )); do if pg_isready -d postgres -p 5302; then break; fi;
>> > sleep
>> > 1; done
>> > psql -p 5302 postgres -c "CREATE USER test WITH PASSWORD 'test'
>> > SUPERUSER
>> > CREATEDB CREATEROLE LOGIN REPLICATION BYPASSRLS;"
>> > createdb -p 5302 -E utf8 test
>> >
>> > psql -p 5302 -U test test -c "CREATE TABLE testtbl (id int, name text);"
>> > psql -p 5302 -U test test -c "ALTER TABLE testtbl ADD CONSTRAINT
>> > testtbl_pkey PRIMARY KEY (id);"
>> > psql -p 5302 -U test test -c 'CREATE FUNCTION
>> > public.immutable_random(integer) RETURNS double precision LANGUAGE sql
>> > IMMUTABLE AS $function$ SELECT random(); $function$'
>> > psql -p 5302 -U test test -c "CREATE INDEX ix_testtbl_random ON testtbl
>> > USING btree (immutable_random(id));"
>> > psql -p 5302 -U test test -c "CREATE SUBSCRIPTION test0_testpub
>> > CONNECTION
>> > 'port=5301 user=test dbname=test' PUBLICATION testpub;"
>> > ```
>> >
>> > The secondary crashes with a segfault:
>> >
>> > ```
>> > 2017-07-23 23:55:37.961 PDT [4823] LOG: logical replication table
>> > synchronization worker for subscription "test0_testpub", table "testtbl"
>> > has started
>> > 2017-07-23 23:55:38.244 PDT [4758] LOG: worker process: logical
>> > replication
>> > worker for subscription 16396 sync 16386 (PID 4823) was terminated by
>> > signal
>> > 11: Segmentation fault
>> > 2017-07-23 23:55:38.244 PDT [4758] LOG: terminating any other active
>> > server
>> > processes
>> > 2017-07-23 23:55:38.245 PDT [4763] WARNING: terminating connection
>> > because
>> > of crash of another server process
>> > 2017-07-23 23:55:38.245 PDT [4763] DETAIL: The postmaster has commanded
>> > this server process to roll back the current transaction and exit,
>> > because
>> > another server process exited
>> > abnormally and possibly corrupted shared memory.
>> > 2017-07-23 23:55:38.245 PDT [4763] HINT: In a moment you should be able
>> > to
>> > reconnect to the database and repeat your command.
>> > 2017-07-23 23:55:38.247 PDT [4758] LOG: all server processes
>> > terminated;
>> > reinitializing
>> > 2017-07-23 23:55:38.256 PDT [4826] LOG: database system was
>> > interrupted;
>> > last known up at 2017-07-23 23:55:36 PDT
>> > 2017-07-23 23:55:38.809 PDT [4826] LOG: database system was not
>> > properly
>> > shut down; automatic recovery in progress
>> > 2017-07-23 23:55:38.812 PDT [4826] LOG: redo starts at 0/173AEA0
>> > 2017-07-23 23:55:38.815 PDT [4826] LOG: invalid record length at
>> > 0/17B50B0:
>> > wanted 24, got 0
>> > 2017-07-23 23:55:38.815 PDT [4826] LOG: redo done at 0/17B5070
>> > 2017-07-23 23:55:38.815 PDT [4826] LOG: last completed transaction was
>> > at
>> > log time 2017-07-23 23:55:37.962957-07
>> > ```
>> >
>>
>> Thank you for the reporting and precise reproducing steps!
>> I could reproduced this issue and it seems to me that the cause of
>> this is that the table sync worker didn't get a snapshot before
>> starting table copy. Attached patch fixes this problem.
>>
>> Regards,
>>
>> --
>> Masahiko Sawada
>> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
>> NTT Open Source Software Center
>
>

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Langote 2017-07-31 00:58:04 Re: [HACKERS] BUG #14759: insert into foreign data partitions fail
Previous Message Rick Otten 2017-07-30 21:25:41 Re: signal 11 segfaults with parallel workers

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-07-31 00:42:45 Re: Refreshing subscription relation state inside a transaction block
Previous Message Julien Rouhaud 2017-07-31 00:09:34 Re: segfault in HEAD when too many nested functions call