RE: Time delayed LR (WAS Re: logical replication restrictions)

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: Önder Kalacı <onderkalaci(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "vignesh21(at)gmail(dot)com" <vignesh21(at)gmail(dot)com>, "shveta(dot)malik(at)gmail(dot)com" <shveta(dot)malik(at)gmail(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "dilipbalaut(at)gmail(dot)com" <dilipbalaut(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, "euler(at)eulerto(dot)com" <euler(at)eulerto(dot)com>, "m(dot)melihmutlu(at)gmail(dot)com" <m(dot)melihmutlu(at)gmail(dot)com>, "marcos(at)f10(dot)com(dot)br" <marcos(at)f10(dot)com(dot)br>, 'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com>
Subject: RE: Time delayed LR (WAS Re: logical replication restrictions)
Date: 2023-04-19 09:30:46
Message-ID: TYAPR01MB5866568A5C1E71338328B20CF5629@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear hackers,

I have rebased an update the PoC. Please see attached.

In [1], I wrote:

>
### Restore from files

To check the elapsed time from the commit, all commit_time of delayed transactions
must be stored in the memory. Basically it can store when the worker handle COMMIT
message, but it must do special treatment for restarting.

When an apply worker receives COMMIT/PREPARE/COMMIT PREPARED message, it writes
the message, flush them, and cache the commit_time. When worker restarts, it open
files, check the final message (this is done by seeking some bytes from end of
the file), and then cache the written commit_time.
>
But I have been thinking that this spec is terrible. Therefore, I have implemented
new approach which uses the its filename for restoring when it is commit. Followings
are the summary.

When a worker receives a BEGIN message, it creates a new file and writes its
changes to it. The filename contains the following dash-separated components:

1. Subscription OID
2. XID of the delayed transaction on the publisher
3. Status of the delaying transaction
4. Upper 32 bits of the commit_lsn
5. Lower 32 bits of the commit_lsn
6. Upper 32 bits of the end_lsn
7. Lower 32 bits of the end_lsn
8. Commit time

At the beginning, the new file contains components 4-8 as 0 because the worker
does not know their values. When it receives a COMMIT message, the changes are
written to the permanent file, and the file is renamed to an appropriate value.

While restarting, the worker reads the directory containing the files and caches
their commit time into memory from the filenames. Files do not need to be opened
at this point. Therefore, PREPARE/COMMIT PREPARED messages are no longer written
into the file. The status of transactions can be distinguished from the filename.

Another notable change is the addition of a replication option. If the
min_apply_delay is greater than 0, a new parameter called "require_schema" is
passed via START_REPICATION command. When "require_schema" is enabled, the publisher
sends its schema (RELATION and TYPE messages) every time it sends decoded DMLs.
This is necessary because delayed transactions may be applied after the subscriber
is restarted, and the LogicalRepRelMap hash is destroyed at that time. If the
RELATION message is not written into the delayed file, and the worker restarts
just before applying the transaction, it will fail to open the local relation
and display an error message: "ERROR: no relation map entry".

And some small bugs were also fixed.

[1]: https://www.postgresql.org/message-id/TYAPR01MB5866D871F60DDFD8FAA2CDE4F5BD9@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
v4-0001-WIP-Time-delayed-logical-replication-by-serializi.patch application/octet-stream 108.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruno Bonfils 2023-04-19 09:35:29 About #13489
Previous Message Daniel Gustafsson 2023-04-19 09:07:04 Re: Should we put command options in alphabetical order in the doc?