Re: Logical Replication WIP

From: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Logical Replication WIP
Date: 2016-08-15 13:51:09
Message-ID: 555F33AF-583A-4A2A-BBE6-6409DC3C0AD4@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 11 Aug 2016, at 17:43, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>
>>
>> * Also I wasn’t able actually to run replication itself =) While regression tests passes, TAP
>> tests and manual run stuck. pg_subscription_rel.substate never becomes ‘r’. I’ll investigate
>> that more and write again.
>
> Interesting, please keep me posted. It's possible for tables to stay in 's' state for some time if there is nothing happening on the server, but that should not mean anything is stuck.

Slightly played around, it seems that apply worker waits forever for substate change.

(lldb) bt
* thread #1: tid = 0x183e00, 0x00007fff88c7f2a2 libsystem_kernel.dylib`poll + 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
frame #0: 0x00007fff88c7f2a2 libsystem_kernel.dylib`poll + 10
frame #1: 0x00000001017ca8a3 postgres`WaitEventSetWaitBlock(set=0x00007fd2dc816b30, cur_timeout=10000, occurred_events=0x00007fff5e7f67d8, nevents=1) + 51 at latch.c:1108
frame #2: 0x00000001017ca438 postgres`WaitEventSetWait(set=0x00007fd2dc816b30, timeout=10000, occurred_events=0x00007fff5e7f67d8, nevents=1) + 248 at latch.c:941
frame #3: 0x00000001017c9fde postgres`WaitLatchOrSocket(latch=0x000000010ab208a4, wakeEvents=25, sock=-1, timeout=10000) + 254 at latch.c:347
frame #4: 0x00000001017c9eda postgres`WaitLatch(latch=0x000000010ab208a4, wakeEvents=25, timeout=10000) + 42 at latch.c:302
* frame #5: 0x0000000101793352 postgres`wait_for_sync_status_change(tstate=0x0000000101e409b0) + 178 at tablesync.c:228
frame #6: 0x0000000101792bbe postgres`process_syncing_tables_apply(slotname="subbi", end_lsn=140734778796592) + 430 at tablesync.c:436
frame #7: 0x00000001017928c1 postgres`process_syncing_tables(slotname="subbi", end_lsn=140734778796592) + 81 at tablesync.c:518
frame #8: 0x000000010177b620 postgres`LogicalRepApplyLoop(last_received=140734778796592) + 704 at apply.c:1122
frame #9: 0x000000010177bef4 postgres`ApplyWorkerMain(main_arg=0) + 1044 at apply.c:1353
frame #10: 0x000000010174cb5a postgres`StartBackgroundWorker + 826 at bgworker.c:729
frame #11: 0x0000000101762227 postgres`do_start_bgworker(rw=0x00007fd2db700000) + 343 at postmaster.c:5553
frame #12: 0x000000010175d42b postgres`maybe_start_bgworker + 427 at postmaster.c:5761
frame #13: 0x000000010175bccf postgres`sigusr1_handler(postgres_signal_arg=30) + 383 at postmaster.c:4979
frame #14: 0x00007fff9ab2352a libsystem_platform.dylib`_sigtramp + 26
frame #15: 0x00007fff88c7e07b libsystem_kernel.dylib`__select + 11
frame #16: 0x000000010175d5ac postgres`ServerLoop + 252 at postmaster.c:1665
frame #17: 0x000000010175b2e0 postgres`PostmasterMain(argc=3, argv=0x00007fd2db403840) + 5968 at postmaster.c:1309
frame #18: 0x000000010169507f postgres`main(argc=3, argv=0x00007fd2db403840) + 751 at main.c:228
frame #19: 0x00007fff8d45c5ad libdyld.dylib`start + 1
(lldb) p state
(char) $1 = 'c'
(lldb) p tstate->state
(char) $2 = ‘c’

Also I’ve noted that some lsn position looks wrong on publisher:

postgres=# select restart_lsn, confirmed_flush_lsn from pg_replication_slots;
restart_lsn | confirmed_flush_lsn
-------------+---------------------
0/1530EF8 | 7FFF/5E7F6A30
(1 row)

postgres=# select sent_location, write_location, flush_location, replay_location from pg_stat_replication;
sent_location | write_location | flush_location | replay_location
---------------+----------------+----------------+-----------------
0/1530F30 | 7FFF/5E7F6A30 | 7FFF/5E7F6A30 | 7FFF/5E7F6A30
(1 row)

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-08-15 13:55:19 Re: WIP: Barriers
Previous Message Petr Jelinek 2016-08-15 13:33:56 Re: Logical Replication WIP