Re: [HACKERS] Issues with logical replication

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: [HACKERS] Issues with logical replication
Date: 2017-11-22 04:19:54
Message-ID: CAMsr+YE4CDohD_QB5yUpGk0SYOxkR_hsruNk+=FYv5Fp8A3U8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4 October 2017 at 07:35, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
wrote:

> On 02/10/17 18:59, Petr Jelinek wrote:
> >>
> >> Now fix the trigger function:
> >> CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER
> AS $$
> >> BEGIN
> >> RETURN NEW;
> >> END $$ LANGUAGE plpgsql;
> >>
> >> And manually perform at master two updates inside one transaction:
> >>
> >> postgres=# begin;
> >> BEGIN
> >> postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
> >> UPDATE 1
> >> postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
> >> UPDATE 1
> >> postgres=# commit;
> >> <hangs>
> >>
> >> and in replica log we see:
> >> 2017-10-02 18:40:26.094 MSK [2954] LOG: logical replication apply
> >> worker for subscription "sub" has started
> >> 2017-10-02 18:40:26.101 MSK [2954] ERROR: attempted to lock invisible
> >> tuple
> >> 2017-10-02 18:40:26.102 MSK [2882] LOG: worker process: logical
> >> replication worker for subscription 16403 (PID 2954) exited with exit
> >> code 1
> >>
> >> Error happens in trigger.c:
> >>
> >> #3 0x000000000069bddb in GetTupleForTrigger (estate=0x2e36b50,
> >> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
> >> lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at
> >> trigger.c:3103
> >> #4 0x000000000069b259 in ExecBRUpdateTriggers (estate=0x2e36b50,
> >> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac,
> >> fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748
> >> #5 0x00000000006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50,
> >> epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240)
> >> at execReplication.c:461
> >> #6 0x0000000000820894 in apply_handle_update (s=0x7ffc442163b0) at
> >> worker.c:736
> >
> > We have locked the same tuple in RelationFindReplTupleByIndex() just
> > before this gets called and didn't get the same error. I guess we do
> > something wrong with snapshots. Will need to investigate more.
> >
>
> Okay, so it's not snapshot. It's the fact that we don't set the
> es_output_cid in replication worker which GetTupleForTrigger is using
> when locking the tuple. Attached one-liner fixes it.
>

This seems like a clear-cut bug with a simple fix.

Lets get this committed, so we don't lose it. The rest of the thread is
going off into the weeds a bit issues unrelated to the original problem.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jing Wang 2017-11-22 04:42:10 Re: [HACKERS] Support to COMMENT ON DATABASE CURRENT_DATABASE
Previous Message Kyotaro HORIGUCHI 2017-11-22 04:15:48 Re: Failed to delete old ReorderBuffer spilled files