Re: logical decoding bug: segfault in ReorderBufferToastReplace()

From: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "Schneider (AWS), Jeremy" <schnjere(at)amazon(dot)com>
Subject: Re: logical decoding bug: segfault in ReorderBufferToastReplace()
Date: 2019-12-11 08:17:01
Message-ID: EEB686D3-F8A7-4371-9A96-5DF3B72A7734@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-committers pgsql-hackers

On 12/9/19, 10:10 AM, "Tomas Vondra" <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>On Wed, Dec 04, 2019 at 05:36:16PM -0800, Jeremy Schneider wrote:
>>On 9/8/19 14:01, Tom Lane wrote:
>>> Fix RelationIdGetRelation calls that weren't bothering with error checks.
>>>
>>> ...
>>>
>>> Details
>>> -------
>>> https://git.postgresql.org/pg/commitdiff/69f883fef14a3fc5849126799278abcc43f40f56
>>
>>We had two different databases this week (with the same schema) both
>>independently hit the condition of this recent commit from Tom. It's on
>>11.5 so we're actually segfaulting and restarting rather than just
>>causing the walsender process to ERROR, but regardless there's still
>>some underlying bug here.
>>
>>We have core files and we're still working to see if we can figure out
>>what's going on, but I thought I'd report now in case anyone has extra
>>ideas or suggestions. The segfault is on line 3034 of reorderbuffer.c.
>>
>>https://github.com/postgres/postgres/blob/REL_11_5/src/backend/replication/logical/reorderbuffer.c#L3034
>>
>>3033 toast_rel = RelationIdGetRelation(relation->rd_rel->reltoastrelid);
>>3034 toast_desc = RelationGetDescr(toast_rel);
>>
>>We'll keep looking; let me know any feedback! Would love to track down
>>whatever bug is in the logical decoding code, if that's what it is.
>>
>>==========
>>
>>backtrace showing the call stack...
>>
>>Core was generated by `postgres: walsender <NAME-REDACTED>
>><DNS-REDACTED>(31712)'.
>>Program terminated with signal 11, Segmentation fault.
>>#0 ReorderBufferToastReplace (rb=0x3086af0, txn=0x3094a78,
>>relation=0x2b79177249c8, relation=0x2b79177249c8, change=0x30ac938)
>> at reorderbuffer.c:3034
>>3034 reorderbuffer.c: No such file or directory.
>>...
>>(gdb) #0 ReorderBufferToastReplace (rb=0x3086af0, txn=0x3094a78,
>>relation=0x2b79177249c8, relation=0x2b79177249c8, change=0x30ac938)
>> at reorderbuffer.c:3034
>>#1 ReorderBufferCommit (rb=0x3086af0, xid=xid(at)entry=1358809,
>>commit_lsn=9430473346032, end_lsn=<optimized out>,
>> commit_time=commit_time(at)entry=628712466364268,
>>origin_id=origin_id(at)entry=0, origin_lsn=origin_lsn(at)entry=0) at
>>reorderbuffer.c:1584
>>#2 0x0000000000716248 in DecodeCommit (xid=1358809,
>>parsed=0x7ffc4ce123f0, buf=0x7ffc4ce125b0, ctx=0x3068f70) at decode.c:637
>>#3 DecodeXactOp (ctx=0x3068f70, buf=buf(at)entry=0x7ffc4ce125b0) at
>>decode.c:245
>>#4 0x000000000071655a in LogicalDecodingProcessRecord (ctx=0x3068f70,
>>record=0x3069208) at decode.c:117
>>#5 0x0000000000727150 in XLogSendLogical () at walsender.c:2886
>>#6 0x0000000000729192 in WalSndLoop (send_data=send_data(at)entry=0x7270f0
>><XLogSendLogical>) at walsender.c:2249
>>#7 0x0000000000729f91 in StartLogicalReplication (cmd=0x30485a0) at
>>walsender.c:1111
>>#8 exec_replication_command (
>> cmd_string=cmd_string(at)entry=0x2f968b0 "START_REPLICATION SLOT
>>\"<NAME-REDACTED>\" LOGICAL 893/38002B98 (proto_version '1',
>>publication_names '\"<NAME-REDACTED>\"')") at walsender.c:1628
>>#9 0x000000000076e939 in PostgresMain (argc=<optimized out>,
>>argv=argv(at)entry=0x2fea168, dbname=0x2fea020 "<NAME-REDACTED>",
>> username=<optimized out>) at postgres.c:4182
>>#10 0x00000000004bdcb5 in BackendRun (port=0x2fdec50) at postmaster.c:4410
>>#11 BackendStartup (port=0x2fdec50) at postmaster.c:4082
>>#12 ServerLoop () at postmaster.c:1759
>>#13 0x00000000007062f9 in PostmasterMain (argc=argc(at)entry=7,
>>argv=argv(at)entry=0x2f92540) at postmaster.c:1432
>>#14 0x00000000004be73b in main (argc=7, argv=0x2f92540) at main.c:228
>>
>>==========
>>
>>Some additional context...
>>
>># select * from pg_publication_rel;
>> prpubid | prrelid
>>---------+---------
>> 71417 | 16453
>> 71417 | 54949
>>(2 rows)
>>
>>(gdb) print toast_rel
>>$4 = (struct RelationData *) 0x0
>>
>>(gdb) print *relation->rd_rel
>>$11 = {relname = {data = "<NAME-REDACTED>", '\000' <repeats 44 times>},
>>relnamespace = 16402, reltype = 16430, reloftype = 0,
>>relowner = 16393, relam = 0, relfilenode = 16428, reltablespace = 0,
>>relpages = 0, reltuples = 0, relallvisible = 0, reltoastrelid = 0,

>Hmmm, so reltoastrelid = 0, i.e. the relation does not have a TOAST
>relation. Yet we're calling ReorderBufferToastReplace on the decoded
>record ... interesting.
>
>Can you share structure of the relation causing the issue?

Here it is:

\d+ rel_having_issue
Table "public.rel_having_issue"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
----------------+--------------------------+-----------+----------+-------------------------------------------------+----------+--------------+-------------
id | integer | | not null | nextval('rel_having_issue_id_seq'::regclass) | plain | |
field1 | character varying(255) | | | | extended | |
field2 | integer | | | | plain | |
field3 | timestamp with time zone | | | | plain | |
Indexes:
"rel_having_issue_pkey" PRIMARY KEY, btree (id)

select relname,relfilenode,reltoastrelid from pg_class where relname='rel_having_issue';
relname | relfilenode | reltoastrelid
---------------------+-------------+---------------
rel_having_issue | 16428 | 0

Bertrand

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Devrim Gündüz 2019-12-11 09:07:33 Re: BUG #16152: postgresql10-plpython-10.11-2PGDG.rhel7.x86_64 requires an unexistant package
Previous Message vignesh C 2019-12-11 05:43:04 Re: Reorderbuffer crash during recovery

Browse pgsql-committers by date

  From Date Subject
Next Message Peter Eisentraut 2019-12-11 08:17:03 pgsql: Remove ATPrepSetStatistics
Previous Message Dor Ben Dov 2019-12-11 08:05:42 PostgreSQL HA & FO

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-12-11 08:17:27 Re: about allow_system_table_mods and SET STATISTICS
Previous Message Koichi Suzuki 2019-12-11 08:17:00 Re: get_database_name() from background worker