Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker

From: Ondřej Jirman <ienieghapheoghaiwida(at)xff(dot)cz>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker
Date: 2019-11-21 11:53:26
Message-ID: 20191121115326.arierrbeok6we5sv@core.my.home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello,

On Thu, Nov 21, 2019 at 11:39:40AM +0100, Tomas Vondra wrote:
> On Thu, Nov 21, 2019 at 01:14:18AM +0000, PG Bug reporting form wrote:
> > The following bug has been logged on the website:
> >
> > Bug reference: 16129
> > Logged by: Ondrej Jirman
> > Email address: ienieghapheoghaiwida(at)xff(dot)cz
> > PostgreSQL version: 12.1
> > Operating system: Arch Linux
> > Description:
> >
> > Hello,
> >
> > I've upgraded my main PostgreSQL cluster from 11.5 to 12.1 via pg_dumpall
> > method and after a while I started getting segfault in logical replication
> > worker.
> >
> > My setup is fairly vanilla, non-default options:
> >
> > shared_buffers = 256MB
> > work_mem = 512MB
> > temp_buffers = 64MB
> > maintenance_work_mem = 4GB
> > effective_cache_size = 16GB
> > max_logical_replication_workers = 30
> > max_replication_slots = 30
> > max_worker_processes = 30
> > wal_level = logical
> >
> > I have several databases that I subscribe to from this database cluster
> > using logical replication.
> >
> > Replication of one of my databases (running on ARMv7 machine) started
> > segfaulting on the subscriber side (x86_64) like this:
> >
> > #0 0x00007fc259739917 in __memmove_sse2_unaligned_erms () from
> > /usr/lib/libc.so.6
> > #1 0x000055d033e93d44 in memcpy (__len=620701425, __src=<optimized out>,
> > __dest=0x55d0356da804) at /usr/include/bits/string_fortified.h:34
> > #2 tts_virtual_materialize (slot=0x55d0356da3b8) at execTuples.c:235
> > #3 0x000055d033e94d32 in ExecFetchSlotHeapTuple
> > (slot=slot(at)entry=0x55d0356da3b8, materialize=materialize(at)entry=true,
> > shouldFree=shouldFree(at)entry=0x7fff0e7cf387) at execTuples.c:1624

I forgot to add that publisher is still PostgreSQL 11.5.

> Hmmm, so it's failing on this memcpy() in tts_virtual_materialize:
>
> else
> {
> Size data_length = 0;
>
> data = (char *) att_align_nominal(data, att->attalign);
> data_length = att_addlength_datum(data_length, att->attlen, val);
>
> memcpy(data, DatumGetPointer(val), data_length);

This is a bytea column in one of the tables on the publisher.

So this looks like it's segfaulting while trying to copy too much bytes
(data_length determined as 620,701,425 bytes):

#1 0x000055d033e93d44 in memcpy (__len=620701425, __src=<optimized out>, __dest=0x55d0356da804) at /usr/include/bits/string_fortified.h:34

But maximum length of any bytea valaue in the publisher database is <200kB.

>
> slot->tts_values[natt] = PointerGetDatum(data);
> data += data_length;
> }
>
> The question is, which of the pointers is bogus. You seem to already
> have a core file, so can you inspect the variables in frame #2? I think
> especially
>
> p *slot
> p natt
> p val
> p *att
>
> would be interesting to see.

(gdb) p *slot
$1 = {type = T_TupleTableSlot, tts_flags = 20, tts_nvalid = 8, tts_ops = 0x558149ff4da0 <TTSOpsVirtual>, tts_tupleDescriptor = 0x7fcca2ea7548, tts_values = 0x55814adfbc10,
tts_isnull = 0x55814adfbc50, tts_mcxt = 0x55814adfb6e0, tts_tid = {ip_blkid = {bi_hi = 65535, bi_lo = 65535}, ip_posid = 0}, tts_tableOid = 0}
(gdb) p natt
$2 = 2
(gdb) p val
$3 = <optimized out>
(gdb) p slot->tts_values[nat]
No symbol "nat" in current context.
(gdb) p slot->tts_values[natt]
$4 = 94013795319824
(gdb) p *slot->tts_values[natt]
$5 = -1812161596
(gdb) p *att
$6 = {attrelid = 55240, attname = {data = "cover_image", '\000' <repeats 52 times>}, atttypid = 17, attstattarget = -1, attlen = -1, attnum = 3, attndims = 0, attcacheoff = -1, atttypmod = -1,
attbyval = false, attstorage = 120 'x', attalign = 105 'i', attnotnull = true, atthasdef = false, atthasmissing = false, attidentity = 0 '\000', attgenerated = 0 '\000', attisdropped = false,
attislocal = true, attinhcount = 0, attcollation = 0}

> Also, how does the replicated schema look like? Can we see the table
> definitions?

SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;
SET default_tablespace = '';
SET default_table_access_method = heap;
CREATE TABLE public.categories (
id integer NOT NULL,
name text NOT NULL,
description text,
metadata jsonb NOT NULL,
provider integer NOT NULL,
subscribed boolean DEFAULT false NOT NULL,
cover_image bytea
);
CREATE SEQUENCE public.categories_id_seq
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
ALTER SEQUENCE public.categories_id_seq OWNED BY public.categories.id;
CREATE TABLE public.providers (
id integer NOT NULL,
system_name text NOT NULL,
name text NOT NULL
);
CREATE SEQUENCE public.providers_id_seq
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
ALTER SEQUENCE public.providers_id_seq OWNED BY public.providers.id;
CREATE TABLE public.videos (
id integer NOT NULL,
title text NOT NULL,
cover_image bytea NOT NULL,
metadata jsonb NOT NULL,
category integer NOT NULL,
published date NOT NULL,
added timestamp without time zone DEFAULT now() NOT NULL,
played boolean DEFAULT false NOT NULL
);
CREATE SEQUENCE public.videos_id_seq
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
ALTER SEQUENCE public.videos_id_seq OWNED BY public.videos.id;
ALTER TABLE ONLY public.categories ALTER COLUMN id SET DEFAULT nextval('public.categories_id_seq'::regclass);
ALTER TABLE ONLY public.providers ALTER COLUMN id SET DEFAULT nextval('public.providers_id_seq'::regclass);
ALTER TABLE ONLY public.videos ALTER COLUMN id SET DEFAULT nextval('public.videos_id_seq'::regclass);
ALTER TABLE ONLY public.categories
ADD CONSTRAINT categories_pkey PRIMARY KEY (id);
ALTER TABLE ONLY public.providers
ADD CONSTRAINT providers_pkey PRIMARY KEY (id);
ALTER TABLE ONLY public.providers
ADD CONSTRAINT providers_system_name_key UNIQUE (system_name);
ALTER TABLE ONLY public.videos
ADD CONSTRAINT videos_pkey PRIMARY KEY (id);
ALTER TABLE ONLY public.categories
ADD CONSTRAINT categories_provider_fkey FOREIGN KEY (provider) REFERENCES public.providers(id);
ALTER TABLE ONLY public.videos
ADD CONSTRAINT videos_category_fkey FOREIGN KEY (category) REFERENCES public.categories(id);
CREATE SUBSCRIPTION l5_hometv CONNECTION 'host=redacted port=5432 user=redacted password=redacted dbname=hometv' PUBLICATION pub WITH (connect = false, slot_name = 'l5_hometv');

Publisher:

SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;
SET default_tablespace = '';
SET default_with_oids = false;
CREATE TABLE public.categories (
id integer NOT NULL,
name text NOT NULL,
description text,
metadata jsonb NOT NULL,
provider integer NOT NULL,
subscribed boolean DEFAULT false NOT NULL,
cover_image bytea
);
CREATE SEQUENCE public.categories_id_seq
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
ALTER SEQUENCE public.categories_id_seq OWNED BY public.categories.id;
CREATE TABLE public.providers (
id integer NOT NULL,
system_name text NOT NULL,
name text NOT NULL
);
CREATE SEQUENCE public.providers_id_seq
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
ALTER SEQUENCE public.providers_id_seq OWNED BY public.providers.id;
CREATE TABLE public.videos (
id integer NOT NULL,
title text NOT NULL,
cover_image bytea NOT NULL,
metadata jsonb NOT NULL,
category integer NOT NULL,
published date NOT NULL,
added timestamp without time zone DEFAULT now() NOT NULL,
played boolean DEFAULT false NOT NULL
);
CREATE SEQUENCE public.videos_id_seq
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;
ALTER SEQUENCE public.videos_id_seq OWNED BY public.videos.id;
ALTER TABLE ONLY public.categories ALTER COLUMN id SET DEFAULT nextval('public.categories_id_seq'::regclass);
ALTER TABLE ONLY public.providers ALTER COLUMN id SET DEFAULT nextval('public.providers_id_seq'::regclass);
ALTER TABLE ONLY public.videos ALTER COLUMN id SET DEFAULT nextval('public.videos_id_seq'::regclass);
ALTER TABLE ONLY public.categories
ADD CONSTRAINT categories_pkey PRIMARY KEY (id);
ALTER TABLE ONLY public.providers
ADD CONSTRAINT providers_pkey PRIMARY KEY (id);
ALTER TABLE ONLY public.providers
ADD CONSTRAINT providers_system_name_key UNIQUE (system_name);
ALTER TABLE ONLY public.videos
ADD CONSTRAINT videos_pkey PRIMARY KEY (id);
ALTER TABLE ONLY public.categories
ADD CONSTRAINT categories_provider_fkey FOREIGN KEY (provider) REFERENCES public.providers(id);
ALTER TABLE ONLY public.videos
ADD CONSTRAINT videos_category_fkey FOREIGN KEY (category) REFERENCES public.categories(id);
CREATE PUBLICATION pub FOR ALL TABLES WITH (publish = 'insert, update, delete, truncate');

thank you and regards,
Ondrej

> regards
>
> --
> Tomas Vondra http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Dean Rasheed 2019-11-21 12:29:39 Re: Failed assertion clauses != NIL
Previous Message Tomas Vondra 2019-11-21 10:39:40 Re: BUG #16129: Segfault in tts_virtual_materialize in logical replication worker