Re: Logical decoding restart problems

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Logical decoding restart problems
Date: 2016-08-20 01:32:51
Message-ID: 623b869e-8c16-7f02-6195-8012d29f275a@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 19/08/16 18:36, Konstantin Knizhnik wrote:
>
>
> On 19.08.2016 19:06, Petr Jelinek wrote:
>>
>> I don't think this will work correctly, there will be gap between when
>> the new slot starts to decode and the drop of the old one as the new
>> slot first needs to make snapshot.
>>
>> Do I understand correctly that you are not using replication origins?
>
> No, we are using replication origins to prevent recursive replication.
> The gap is not a biggest problem because in multimaster we commit only
> those transaction which are acknowledged by all live nodes.
> So if some transaction falls in the gap... it is unlucky and will be
> aborted.
>

Ah okay, forgot you have sync MM.

>>
>>> The problem is that for some reasons consistent point is not so
>>> consistent and we get partly decoded transactions.
>>> I.e. transaction body consists of two UPDATE but reorder_buffer extracts
>>> only the one (last) update and sent this truncated transaction to
>>> destination causing consistency violation at replica. I started
>>> investigation of logical decoding code and found several things which I
>>> do not understand.
>>
>> Never seen this happen. Do you have more details about what exactly is
>> happening?
>>
>
> This is transaction at primary node:
>
> root(at)knizhnik:/home/knizhnik/postgres_cluster/contrib/mmts# docker exec
> -ti node1 bash
> postgres(at)9a04a0c9f246:/pg$ pg_xlogdump
> data/pg_xlog/000000010000000000000001 | fgrep "tx: 6899"
> rmgr: Heap len (rec/tot): 7/ 53, tx: 6899, lsn:
> 0/019EFBD8, prev 0/019EFB80, desc: LOCK off 12: xid 6899 LOCK_ONLY
> EXCL_LOCK , blkref #0: rel 1663/12407/16421 blk 60
> rmgr: Heap len (rec/tot): 14/ 74, tx: 6899, lsn:
> 0/019EFC10, prev 0/019EFBD8, desc: HOT_UPDATE off 12 xmax 6899 ; new off
> 224 xmax 6899, blkref #0: rel 1663/12407/16421 blk 60
> rmgr: Heap len (rec/tot): 7/ 53, tx: 6899, lsn:
> 0/019EFC60, prev 0/019EFC10, desc: LOCK off 153: xid 6899 LOCK_ONLY
> EXCL_LOCK , blkref #0: rel 1663/12407/16421 blk 49
> rmgr: Heap len (rec/tot): 14/ 82, tx: 6899, lsn:
> 0/019EFC98, prev 0/019EFC60, desc: UPDATE off 153 xmax 6899 ; new off 55
> xmax 6899, blkref #0: rel 1663/12407/16421 blk 62, blkref #1: rel
> 1663/12407/16421 blk 49
> rmgr: Btree len (rec/tot): 2/ 64, tx: 6899, lsn:
> 0/019EFCF0, prev 0/019EFC98, desc: INSERT_LEAF off 294, blkref #0: rel
> 1663/12407/16424 blk 23
> rmgr: Transaction len (rec/tot): 236/ 265, tx: 6899, lsn:
> 0/019EFD30, prev 0/019EFCF0, desc: PREPARE
> pg_xlogdump: FATAL: error in WAL record at 0/1AC0E70: invalid record
> length at 0/1AC0EA8: wanted 24, got 0
>
>
> This is the replicated transaction at other node (it is ont clear from
> the trace, but believe me, it is the same transaction):
>
>
> root(at)knizhnik:/home/knizhnik/postgres_cluster/contrib/mmts# docker exec
> -ti node2 bash
> postgres(at)e5b16d82ce06:/pg$ pg_xlogdump
> data/pg_xlog/000000010000000000000001 | fgrep "tx: 6882"
> rmgr: Heap len (rec/tot): 14/ 74, tx: 6882, lsn:
> 0/019F3240, prev 0/019F31F0, desc: HOT_UPDATE off 113 xmax 6882 ; new
> off 219 xmax 0, blkref #0: rel 1663/12407/16421 blk 53
> rmgr: Heap len (rec/tot): 14/ 82, tx: 6882, lsn:
> 0/019F5CB8, prev 0/019F5C60, desc: UPDATE off 163 xmax 6882 ; new off 4
> xmax 0, blkref #0: rel 1663/12407/16421 blk 62, blkref #1: rel
> 1663/12407/16421 blk 51
> rmgr: Btree len (rec/tot): 2/ 64, tx: 6882, lsn:
> 0/019F5D10, prev 0/019F5CB8, desc: INSERT_LEAF off 284, blkref #0: rel
> 1663/12407/16424 blk 23
> rmgr: Transaction len (rec/tot): 248/ 274, tx: 6882, lsn:
> 0/019F61F8, prev 0/019F60E8, desc: PREPARE
> pg_xlogdump: FATAL: error in WAL record at 0/1AD3AD8: invalid record
> length at 0/1AD3B10: wanted 24, got 0
>
>
> And "shorten" version of the same transaction at the third (recovered)
> node:
>
> root(at)knizhnik:/home/knizhnik/postgres_cluster/contrib/mmts# docker exec
> -ti node3 bash
> postgres(at)b4066d4211bc:/pg$ pg_xlogdump
> data/pg_xlog/000000010000000000000001 | fgrep "tx: 6893"
> rmgr: Heap len (rec/tot): 7/ 53, tx: 6893, lsn:
> 0/01A29828, prev 0/01A297E0, desc: LOCK off 172: xid 6893 LOCK_ONLY
> EXCL_LOCK , blkref #0: rel 1663/12407/16421 blk 50
> rmgr: Heap len (rec/tot): 14/ 82, tx: 6893, lsn:
> 0/01A29860, prev 0/01A29828, desc: UPDATE+INIT off 172 xmax 6893 ; new
> off 1 xmax 6893, blkref #0: rel 1663/12407/16421 blk 64, blkref #1: rel
> 1663/12407/16421 blk 50
> rmgr: Btree len (rec/tot): 2/ 64, tx: 6893, lsn:
> 0/01A298B8, prev 0/01A29860, desc: INSERT_LEAF off 314, blkref #0: rel
> 1663/12407/16424 blk 23
> rmgr: Transaction len (rec/tot): 236/ 265, tx: 6893, lsn:
> 0/01A298F8, prev 0/01A298B8, desc: PREPARE
> pg_xlogdump: FATAL: error in WAL record at 0/1ACBBF8: invalid record
> length at 0/1ACBC30: wanted 24, got 0
>
> You can see one update instead of two.
>
> Sorry, I have not saved trace with output of logical decoder. Bu t it
> really decodes just one update!
> What I have done:
>
> DROP_REPLICATION_SLOT "mtm_slot_1";
> CREATE_REPLICATION_SLOT "mtm_slot_1" LOGICAL "multimaster";
> START_REPLICATION SLOT "mtm_slot_1" LOGICAL 0/0 ("startup_params_format"
> '1', "max_proto_version" '1', "min_proto_version" '1',
> "forward_changesets" '1', "mtm_replication_mode" 'recovered');
>
> I have also tried to calculate last origin LSN for this node and
> explicitly specify it in START_REPLICATION.
> But it doesn't help: the same problem persists.
>

Well okay, that says that there was different number of UPDATEs on the
subscriber, but that does not mean that it's fault of decoding. The
difference in WAL on downstream can be for example because it didn't
find matching key for update due to some other desync issue. You'll need
to check what the output plugin outputs and go from there. If you have
some reproducible use-case/script I can help debugging it.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2016-08-20 01:44:39 Re: LSN as a recovery target
Previous Message Michael Paquier 2016-08-20 00:13:42 Re: LSN as a recovery target