From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | scott(at)deltaex(dot)com |
Cc: | pgsql-bugs <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: BUG #14758: Segfault with logical replication on a function index |
Date: | 2017-07-28 10:07:59 |
Message-ID: | CAD21AoCHutPoUdtZg2RHDneXEFnbL3zp+FcQkMrZ8Lrfa3kZLQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
On Mon, Jul 24, 2017 at 4:22 PM, <scott(at)deltaex(dot)com> wrote:
> The following bug has been logged on the website:
>
> Bug reference: 14758
> Logged by: Scott Milliken
> Email address: scott(at)deltaex(dot)com
> PostgreSQL version: 10beta2
> Operating system: Linux 4.10.0-27-generic #30~16.04.2-Ubuntu S
> Description:
>
> I'm testing logical replication on 10beta2, and found a segfault that I can
> reliably reproduce with an index on a not-actually immutable function.
>
> Here's the function in question:
>
> ```
> CREATE OR REPLACE FUNCTION public.immutable_random(integer)
> RETURNS double precision
> LANGUAGE sql
> IMMUTABLE
> AS $function$SELECT random();
> $function$;
> ```
>
> It's not actually immutable since it's calling random (a hack to get an
> efficient random sort on a table).
>
> (Aside: I'd understand if it errored on creation of the index, but would
> really prefer to keep using this instead of tablesample because it's fast,
> deterministic, and doesn't have sampling biases like the SYSTEM sampling.)
>
>
> Here's full reproduction instructions:
>
>
> Primary:
> ```
> mkdir -p /tmp/test-seg0
> PGPORT=5301 initdb -D /tmp/test-seg0
> echo "wal_level = logical" >> /tmp/test-seg0/postgresql.conf
> PGPORT=5301 pg_ctl -D /tmp/test-seg0 start
> for (( ; ; )); do if pg_isready -d postgres -p 5301; then break; fi; sleep
> 1; done
> psql -p 5301 postgres -c "CREATE USER test WITH PASSWORD 'test' SUPERUSER
> CREATEDB CREATEROLE LOGIN REPLICATION BYPASSRLS;"
> createdb -p 5301 -E utf8 test
>
> psql -p 5301 -U test test -c "CREATE TABLE testtbl (id int, name text);"
> psql -p 5301 -U test test -c "ALTER TABLE testtbl ADD CONSTRAINT
> testtbl_pkey PRIMARY KEY (id);"
> psql -p 5301 -U test test -c "CREATE PUBLICATION testpub FOR TABLE
> testtbl;"
> psql -p 5301 -U test test -c "INSERT INTO testtbl (id, name) VALUES (1,
> 'a');"
> ```
>
> Secondary:
> ```
> mkdir -p /tmp/test-seg1
> PGPORT=5302 initdb -D /tmp/test-seg1
> PGPORT=5302 pg_ctl -D /tmp/test-seg1 start
> for (( ; ; )); do if pg_isready -d postgres -p 5302; then break; fi; sleep
> 1; done
> psql -p 5302 postgres -c "CREATE USER test WITH PASSWORD 'test' SUPERUSER
> CREATEDB CREATEROLE LOGIN REPLICATION BYPASSRLS;"
> createdb -p 5302 -E utf8 test
>
> psql -p 5302 -U test test -c "CREATE TABLE testtbl (id int, name text);"
> psql -p 5302 -U test test -c "ALTER TABLE testtbl ADD CONSTRAINT
> testtbl_pkey PRIMARY KEY (id);"
> psql -p 5302 -U test test -c 'CREATE FUNCTION
> public.immutable_random(integer) RETURNS double precision LANGUAGE sql
> IMMUTABLE AS $function$ SELECT random(); $function$'
> psql -p 5302 -U test test -c "CREATE INDEX ix_testtbl_random ON testtbl
> USING btree (immutable_random(id));"
> psql -p 5302 -U test test -c "CREATE SUBSCRIPTION test0_testpub CONNECTION
> 'port=5301 user=test dbname=test' PUBLICATION testpub;"
> ```
>
> The secondary crashes with a segfault:
>
> ```
> 2017-07-23 23:55:37.961 PDT [4823] LOG: logical replication table
> synchronization worker for subscription "test0_testpub", table "testtbl"
> has started
> 2017-07-23 23:55:38.244 PDT [4758] LOG: worker process: logical replication
> worker for subscription 16396 sync 16386 (PID 4823) was terminated by signal
> 11: Segmentation fault
> 2017-07-23 23:55:38.244 PDT [4758] LOG: terminating any other active server
> processes
> 2017-07-23 23:55:38.245 PDT [4763] WARNING: terminating connection because
> of crash of another server process
> 2017-07-23 23:55:38.245 PDT [4763] DETAIL: The postmaster has commanded
> this server process to roll back the current transaction and exit, because
> another server process exited
> abnormally and possibly corrupted shared memory.
> 2017-07-23 23:55:38.245 PDT [4763] HINT: In a moment you should be able to
> reconnect to the database and repeat your command.
> 2017-07-23 23:55:38.247 PDT [4758] LOG: all server processes terminated;
> reinitializing
> 2017-07-23 23:55:38.256 PDT [4826] LOG: database system was interrupted;
> last known up at 2017-07-23 23:55:36 PDT
> 2017-07-23 23:55:38.809 PDT [4826] LOG: database system was not properly
> shut down; automatic recovery in progress
> 2017-07-23 23:55:38.812 PDT [4826] LOG: redo starts at 0/173AEA0
> 2017-07-23 23:55:38.815 PDT [4826] LOG: invalid record length at 0/17B50B0:
> wanted 24, got 0
> 2017-07-23 23:55:38.815 PDT [4826] LOG: redo done at 0/17B5070
> 2017-07-23 23:55:38.815 PDT [4826] LOG: last completed transaction was at
> log time 2017-07-23 23:55:37.962957-07
> ```
>
Thank you for the reporting and precise reproducing steps!
I could reproduced this issue and it seems to me that the cause of
this is that the table sync worker didn't get a snapshot before
starting table copy. Attached patch fixes this problem.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Attachment | Content-Type | Size |
---|---|---|
fix_table_sync.patch | application/octet-stream | 595 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Etsuro Fujita | 2017-07-28 11:55:30 | Re: [HACKERS] BUG #14759: insert into foreign data partitions fail |
Previous Message | Rick Otten | 2017-07-28 00:54:31 | Re: signal 11 segfaults with parallel workers |
From | Date | Subject | |
---|---|---|---|
Next Message | Erik Rijkers | 2017-07-28 11:19:13 | Re: GSoC 2017: Foreign Key Arrays |
Previous Message | Ashutosh Sharma | 2017-07-28 09:00:17 | Re: pl/perl extension fails on Windows |