|From:||Craig Ringer <craig(at)2ndquadrant(dot)com>|
|To:||konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>|
|Cc:||Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Petr Jelinek <petr(at)2ndquadrant(dot)com>|
|Subject:||Re: Logical decoding restart problems|
|Views:||Raw Message | Whole Thread | Download mbox|
On 20 August 2016 at 14:56, konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
> Thank you for answers.
> No, you don't need to recreate them. Just advance your replication
> identifier downstream and request a replay position in the future. Let the
> existing slot skip over unwanted data and resume where you want to start
> You can advance the replication origins on the peers as you replay
> forwarded xacts from your master.
> Have a look at how the BDR code does this during "catchup mode" replay.
> So while your problem discussed below seems concerning, you don't have to
> drop and recreate slots like are currently doing.
> The only reason for recreation of slot is that I want to move it to the
> current "horizont" and skip all pending transaction without explicit
> specification of the restart position.
Why not just specify the restart position as the upstream server's xlog
Anyway, you _should_ specify the restart position. Otherwise, if there's
concurrent write activity, you might have a gap between when you stop
replaying from your forwarding slot on the recovery node and start
replaying from the other nodes.
Again, really, go read the BDR catchup mode code. Really.
> If I do not drop the slot and just restart replication specifying position
> 0/0 (invalid LSN), then replication will be continued from the current slot
> position in WAL, will not it?
The "current slot position" isn't in WAL. It's stored in the replication
slot in pg_replslot/ . But yes, if you pass 0/0 it'll use the stored
confirmed_flush_lsn from the replication slot.
> So there is no way to specify something "start replication from the end
> of WAL", like lseek(0, SEEK_END).
Correct, but you can fetch the server's xlog insert position separately and
I guess I can see it being a little bit useful to be able to say "start
decoding at the first commit after this command". Send a patch, see if
I still think your whole approach is wrong and you need to use replication
origins or similar to co-ordinate a consistent switchover.
> Slot is created by peer node using standard libpq connection with
> database=replication connection string.
So walsender interface then.
>> The problem is that for some reasons consistent point is not so
>> consistent and we get partly decoded transactions.
>> I.e. transaction body consists of two UPDATE but reorder_buffer extracts
>> only the one (last) update and sent this truncated transaction to
>> destination causing consistency violation at replica. I started
>> investigation of logical decoding code and found several things which I do
>> not understand.
> Yeah, that sounds concerning and shouldn't happen.
> I looked at replication code more precisely and understand that my first
> concerns were wrong.
> Confirming flush position should not prevent replaying transactions with
> smaller LSNs.
Strictly, confirming the flush position does not prevent transactions *with
changes* at lower LSNs. It does prevent replay of transactions that
*commit* with lower LSNs.
> But unfortunately the problem is really present. May be it is caused by
> race conditions (although most logical decoder data is local to backend).
> This is why I will try to create reproducing scenario without multimaster.
> Yeh, but unfortunately it happens. Need to understand why...
Yes. I think we need a simple standalone test case. I've never yet seen a
partially decoded transaction like this.
> It's all already there. See logical decoding's use of xl_running_xacts.
> But how this information is persisted?
restart_lsn points to a xl_running_xacts record in WAL. Which is of course
persistent. The restart_lsn is persistent in the replication slot, as is
catalog_xmin and confirmed_flush_lsn.
> What will happen if wal_sender is restarted?
That's why the restart_lsn exists. Decoding restarts from the restart_lsn
when you START_REPLICATION on the new walsender. It continues without
sending data to the client until it decodes the first commit >
confirmed_flush_lsn or some greater-than-that LSN that you requested by
passing it to the START_REPLICATION command.
The snapshot builder is also involved; see snapbuild.c and the comments
I'll wait for a test case or some more detail.
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
|Next Message||Craig Ringer||2016-08-20 13:24:47||[PATCH] Transaction traceability - txid_status(bigint)|
|Previous Message||Robert Haas||2016-08-20 12:43:10||Re: dsm_unpin_segment|