RE: CDC/ETL system on top of logical replication with pgoutput, custom client

From: José Neves <rafaneves3(at)msn(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: CDC/ETL system on top of logical replication with pgoutput, custom client
Date: 2023-08-01 09:13:47
Message-ID: PR3P193MB0491EE7B04B8C9B3F515DAB3890AA@PR3P193MB0491.EURP193.PROD.OUTLOOK.COM
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Andres.

Owh, I see the error of my way... :(

By ignoring commits, and committing individual operation LSNs, I was effectively rolling back the subscription. In the previous example, if I committed the LSN of the first insert of the second transaction (LSN1-500), I was basically telling Postgres to send everything again, including the already processed T1.

> what you mean with the "different from the numeric order"
I'm probably lacking terminology. I mean that LSN4-5500 > LSN4-4000 > LSN3-3000 > LSN3-2500...

But, if I'm understanding correctly, I can only rely on the incremental sequence to be true for the commit events. Which explains my pain.
The world makes sense again.

Thank you very much. Will try to implement this new logic, and hopefully not bug again with this issue.
Regards,
José Neves
________________________________
De: Andres Freund <andres(at)anarazel(dot)de>
Enviado: 1 de agosto de 2023 00:21
Para: José Neves <rafaneves3(at)msn(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>; pgsql-hackers(at)postgresql(dot)org <pgsql-hackers(at)postgresql(dot)org>
Assunto: Re: CDC/ETL system on top of logical replication with pgoutput, custom client

Hi,

On 2023-07-31 21:25:06 +0000, José Neves wrote:
> Ok, if I understood you correctly, I start to see where my logic is faulty. Just to make sure that I got it right, taking the following example again:
>
> T-1
> INSERT LSN1-1000
> UPDATE LSN2-2000
> UPDATE LSN3-3000
> COMMIT LSN4-4000
>
> T-2
> INSERT LSN1-500
> UPDATE LSN2-1500
> UPDATE LSN3-2500
> COMMIT LSN4-5500
>
> Where data will arrive in this order:
>
> INSERT LSN1-500
> INSERT LSN1-1000
> UPDATE LSN2-1500
> UPDATE LSN2-2000
> UPDATE LSN3-2500
> UPDATE LSN3-3000
> COMMIT LSN4-4000
> COMMIT LSN4-5500

No, they won't arrive in that order. They will arive as

BEGIN
INSERT LSN1-1000
UPDATE LSN2-2000
UPDATE LSN3-3000
COMMIT LSN4-4000
BEGIN
INSERT LSN1-500
UPDATE LSN2-1500
UPDATE LSN3-2500
COMMIT LSN4-5500

Because T1 committed before T2. Changes are only streamed out at commit /
prepare transaction (*). Within a transaction, they however *will* be ordered
by LSN.

(*) Unless you use streaming mode, in which case it'll all be more
complicated, as you'll also receive changes for transactions that might still
abort.

> You are saying that the LSN3-3000 will never be missing, either the entire
> connection will fail at that point, or all should be received in the
> expected order (which is different from the "numeric order" of LSNs).

I'm not quite sure what you mean with the "different from the numeric order"
bit...

> If the connection is down, upon restart, I will receive the entire T-1
> transaction again (well, all example data again).

Yes, unless you already acknowledged receipt up to LSN4-4000 and/or are only
asking for newer transactions when reconnecting.

> In addition to that, if I commit LSN4-4000, even tho that LSN has a "bigger
> numeric value" than the ones representing INSERT and UPDATE events on T-2, I
> will be receiving the entire T-2 transaction again, as the LSN4-5500 is
> still uncommitted.

I don't quite know what you mean with "commit LSN4-4000" here.

> This makes sense to me, but just to be extra clear, I will never receive a
> transaction commit before receiving all other events for that transaction.

Correct.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-08-01 09:39:01 Re: [PoC] pg_upgrade: allow to upgrade publisher node
Previous Message Richard Guo 2023-08-01 08:44:27 Oversight in reparameterize_path_by_child leading to executor crash