Re: Issues with logical replication

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: Issues with logical replication
Date: 2017-10-04 00:12:06
Message-ID: CAD21AoDv7kZ8dQMqiNTx4wey39GLUCv+Af-cj1xAz-RRN_SMzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 4, 2017 at 8:35 AM, Petr Jelinek
<petr(dot)jelinek(at)2ndquadrant(dot)com> wrote:
> On 02/10/17 18:59, Petr Jelinek wrote:
>>>
>>> Now fix the trigger function:
>>> CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$
>>> BEGIN
>>> RETURN NEW;
>>> END $$ LANGUAGE plpgsql;
>>>
>>> And manually perform at master two updates inside one transaction:
>>>
>>> postgres=# begin;
>>> BEGIN
>>> postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
>>> UPDATE 1
>>> postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
>>> UPDATE 1
>>> postgres=# commit;
>>> <hangs>
>>>
>>> and in replica log we see:
>>> 2017-10-02 18:40:26.094 MSK [2954] LOG: logical replication apply
>>> worker for subscription "sub" has started
>>> 2017-10-02 18:40:26.101 MSK [2954] ERROR: attempted to lock invisible
>>> tuple
>>> 2017-10-02 18:40:26.102 MSK [2882] LOG: worker process: logical
>>> replication worker for subscription 16403 (PID 2954) exited with exit
>>> code 1
>>>
>>> Error happens in trigger.c:
>>>
>>> #3 0x000000000069bddb in GetTupleForTrigger (estate=0x2e36b50,
>>> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
>>> lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at
>>> trigger.c:3103
>>> #4 0x000000000069b259 in ExecBRUpdateTriggers (estate=0x2e36b50,
>>> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac,
>>> fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748
>>> #5 0x00000000006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50,
>>> epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240)
>>> at execReplication.c:461
>>> #6 0x0000000000820894 in apply_handle_update (s=0x7ffc442163b0) at
>>> worker.c:736
>>
>> We have locked the same tuple in RelationFindReplTupleByIndex() just
>> before this gets called and didn't get the same error. I guess we do
>> something wrong with snapshots. Will need to investigate more.
>>
>
> Okay, so it's not snapshot. It's the fact that we don't set the
> es_output_cid in replication worker which GetTupleForTrigger is using
> when locking the tuple. Attached one-liner fixes it.
>

Thank you for the patch.
This bug can happen even without the trigger and I confirmed tgat the
bug is fixed by the patch. I think the patch fixed it properly.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-10-04 00:15:44 Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Previous Message Andrew Dunstan 2017-10-03 23:53:41 Re: [HACKERS] BUG #14825: enum type: unsafe use?