RE: Time delayed LR (WAS Re: logical replication restrictions)

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: Önder Kalacı <onderkalaci(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "vignesh21(at)gmail(dot)com" <vignesh21(at)gmail(dot)com>, "shveta(dot)malik(at)gmail(dot)com" <shveta(dot)malik(at)gmail(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "dilipbalaut(at)gmail(dot)com" <dilipbalaut(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, "euler(at)eulerto(dot)com" <euler(at)eulerto(dot)com>, "m(dot)melihmutlu(at)gmail(dot)com" <m(dot)melihmutlu(at)gmail(dot)com>, "marcos(at)f10(dot)com(dot)br" <marcos(at)f10(dot)com(dot)br>, 'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com>
Subject: RE: Time delayed LR (WAS Re: logical replication restrictions)
Date: 2023-03-17 13:11:58
Message-ID: TYAPR01MB5866D871F60DDFD8FAA2CDE4F5BD9@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

I have made a rough prototype that can serialize changes to permanent file and
apply after time elapsed from v30 patch. I think the 2PC and restore mechanism
needs more analysis, but I can share codes for discussion. How do you think?

## Interfaces

Not changed from old versions. The subscription parameter "min_apply_delay" is
used to enable the time-delayed logical replication.

## Advantages

Two big problems are solved.

* Apply worker can respond from walsender's keepalive while delaying application.
This is because the process will not sleep.
* Publisher can recycle WALs even if a transaction related with the WAL is not
applied yet. This is because the apply worker flush all the changes to file
and reply that WALs are flushed.

## Disadvantages

Code complexity.

## Basic design

The basic idea is quite simple - create a new file when apply worker receive
BEGIN message, write received changes, and flush them when COMMIT message is come.
The delayed transaction is checked its commit time for every main loop, and applied
when the time exceeds the min_apply_delay.

To handle files APIs that uses plain kernel FDs was used. This approach is
similar to physical walreceiver process. Apart from the physical one, worker
does not flush for every messages - it is done at the end of the transaction.

### For 2PC

The delay is started since COMMIT PREPARED is come. But to avoid the
long-lock-holding issue, the prepared transaction is just written into file
without applying.

When BEGIN PREPARE is received, same as normal transactions, the worker creates
a file and starts to write changes. If we reach the PREPARE message, just writes
a message into file, flushes, and just closes it. This means that no transactions
are prepared on subscriber. When COMMIT PREPARED is received, the worker opens the
file again and write the message. After that we treat same as normal committed
transaction.

### For streamed transaction

Do no special thing when the streaming transaction is come. When it is committed
or prepared, read all the changes and write into permanent file. To read and
write changes apply_spooled_changes() is used, which means the basic workflow
is not changed.

### Restore from files

To check the elapsed time from the commit, all commit_time of delayed transactions
must be stored in the memory. Basically it can store when the worker handle COMMIT
message, but it must do special treatment for restarting.

When an apply worker receives COMMIT/PREPARE/COMMIT PREPARED message, it writes
the message, flush them, and cache the commit_time. When worker restarts, it open
files, check the final message (this is done by seeking some bytes from end of
the file), and then cache the written commit_time.

## Restrictions

* The combination with ALTER SUBSCRIPTION .. SKIP LSN is not considered.

Thanks for Osumi-san to help implementing.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
0001-WIP-Time-delayed-logical-replication-by-serializing-.patch application/octet-stream 105.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2023-03-17 13:43:21 Re: Commitfest 2023-03 starting tomorrow!
Previous Message Aleksander Alekseev 2023-03-17 12:31:33 Re: HOT chain validation in verify_heapam()