Re: Logical decoding restart problems

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Logical decoding restart problems
Date: 2016-08-19 16:06:31
Message-ID: 9b686524-60e5-3dcb-cda1-af01d1ed8145@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 19/08/16 09:34, konstantin knizhnik wrote:
>
> We are using logical decoding in multimaster and we are faced with the
> problem that inconsistent transactions are sent to replica.
> Briefly, multimaster is using logical decoding in this way:
> 1. Each multimaster node is connected with each other using logical
> decoding channel and so each pair of nodes
> has its own replication slot.
> 2. In normal scenario each replication channel is used to replicate only
> those transactions which were originated at the source node.
> We are using origin mechanism to skip "foreign" transactions.
> When offline cluster node is returned back to the multimaster we need
> to recover this node to the current cluster state.
> Recovery is performed from one of the cluster's node. So we are using
> only one replication channel to receive all (self and foreign) transactions.
> Only in this case we can guarantee consistent order of applying
> transactions at recovered node.
> After the end of recovery we need to recreate replication slots with all
> other cluster nodes (because we have already replied transactions from
> this nodes).
> To restart logical decoding we first drop existed slot, then create new
> one and then start logical replication from the WAL position 0/0
> (invalid LSN).
> In this case recovery should be started from the last consistent point.
>

I don't think this will work correctly, there will be gap between when
the new slot starts to decode and the drop of the old one as the new
slot first needs to make snapshot.

Do I understand correctly that you are not using replication origins?

> The problem is that for some reasons consistent point is not so
> consistent and we get partly decoded transactions.
> I.e. transaction body consists of two UPDATE but reorder_buffer extracts
> only the one (last) update and sent this truncated transaction to
> destination causing consistency violation at replica. I started
> investigation of logical decoding code and found several things which I
> do not understand.

Never seen this happen. Do you have more details about what exactly is
happening?

>
> Assume that we have transactions T1={start_lsn=100, end_lsn=400} and
> T2={start_lsn=200, end_lsn=300}.
> Transaction T2 is sent to the replica and replica confirms that
> flush_lsn=300.
> If now we want to restart logical decoding, we can not start with
> position less than 300, because CreateDecodingContext doesn't allow it:
>
> * start_lsn
> *The LSN at which to start decoding. If InvalidXLogRecPtr, restart
> *from the slot's confirmed_flush; otherwise, start from the specified
> *location (but move it forwards to confirmed_flush if it's older than
> *that, see below).
> *
> else if (start_lsn < slot->data.confirmed_flush)
> {
> /*
> * It might seem like we should error out in this case, but it's
> * pretty common for a client to acknowledge a LSN it doesn't have to
> * do anything for, and thus didn't store persistently, because the
> * xlog records didn't result in anything relevant for logical
> * decoding. Clients have to be able to do that to support synchronous
> * replication.
> */
>
> So it means that we have no chance to restore T1?
> What is worse, if there are valid T2 transaction records with lsn >=
> 300, then we can partly decode T1 and send this T1' to the replica.
> I missed something here?

The decoding starts from restart_lsn of the slot, start_lsn is used for
skipping the transactions.

> Are there any alternative way to "seek" slot to the proper position
> without actual fetching data from it or recreation of the slot?

You can seek forward just fine, just specify the start position in
START_REPLICATION command.

> Is there any mechanism in xlog which can enforce consistent decoding of
> transaction (so that no transaction records are missed)?
> May be I missed something but I didn't find any "record_number" or
> something else which can identify first record of transaction.

As I mentioned above, what you probably want to do is use replication
origins. When you use those you get origin info when decoding the
transaction which you can then send to downstream and it can update it's
idea of where it is for that origin. This is especially useful for the
transaction forwarding you are doing (See BDR and/or pglogical code for
example of that).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2016-08-19 16:22:32 Re: sslmode=require fallback
Previous Message Tom Lane 2016-08-19 15:46:00 Re: Should we cacheline align PGXACT?