RE: Time delayed LR (WAS Re: logical replication restrictions)

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: Önder Kalacı <onderkalaci(at)gmail(dot)com>, "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "vignesh21(at)gmail(dot)com" <vignesh21(at)gmail(dot)com>, "shveta(dot)malik(at)gmail(dot)com" <shveta(dot)malik(at)gmail(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "dilipbalaut(at)gmail(dot)com" <dilipbalaut(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, "euler(at)eulerto(dot)com" <euler(at)eulerto(dot)com>, "m(dot)melihmutlu(at)gmail(dot)com" <m(dot)melihmutlu(at)gmail(dot)com>, "marcos(at)f10(dot)com(dot)br" <marcos(at)f10(dot)com(dot)br>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: RE: Time delayed LR (WAS Re: logical replication restrictions)
Date: 2023-05-22 11:49:12
Message-ID: TYAPR01MB5866C5153713CFF4FE1DBA69F5439@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Amit,

Thank you for giving suggestions.

> > Dear hackers,
> >
> > I rebased and refined my PoC. Followings are the changes:
> >
>
> 1. Is my understanding correct that this patch creates the delay files
> for each transaction? If so, did you consider other approaches such as
> using one file to avoid creating many files?

I have been analyzing the approach which uses only one file per subscription, per
your suggestion. Currently I'm not sure whether it is good approach or not, so could
you please give me any feedbacks?

TL;DR: rotating segment files like WALs may be used, but there are several issues.

# Assumption

* Streamed txns are also serialized to the same permanent file, in the received order.
* No additional sorting is considered.

# Considerations

As a premise, applied txns must be removed from files, otherwise the disk becomes
full in some day and it leads PANIC.

## Naive approach - serialize all the changes to one large file

If workers continue to write received changes from the head naively, it may be
difficult to purge applied txns because there seems not to have a good way to
truncate the first part of the file. I could not find related functions in fd.h.

## Alternative approach - separate the file into segments

Alternative approach I came up with is that the file is divided into some segments
- like WAL - and remove it if all written txns are applied. It may work well in
non-streaming, 1pc case, but may not in other cases.

### Regarding the PREPARE transactions

At that time it is more likely to occur that the segment which contains the
actual txn is differ from the segment where COMMIT PREPARED. Hence the worker
must check all the remained segments to find the actual messages from them. Isn't
it inefficient? There is another approach that workers apply the PREPARE
immediately and spill to file only COMMIT PREPARED, but in this case the worker
have been acquiring the lock and never released it till delay is done.

### Regarding the streamed transactions

As for streaming case, chunks of txns are separated into several segments.
Hence the worker must check all the remained segments to find chunks messages
from them, same as above. Isn't it inefficient too?

Additionally, segments which have prepared or streamed transactions cannot be
removed, so even if the case many files may be generated and remained.

Anyway, it may be difficult to accept to stream in-progress transactions while
delaying the application. IIUC the motivation of steaming is to reduce the lag
between nodes, and it is opposite of this feature. So it might be okay, not sure.

### Regarding the publisher - timing to send schema may be fuzzy

Another issue is that the timing when publisher sends the schema information
cannot be determined on publisher itself. As discussed on hackers, publisher
must send schema information once per segment file, but it is controlled on
subscriber side.
I'm thinking that the walsender cannot recognize the changing of segments and
understand the timing to send them.

That's it. I'm very happy to get idea.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2023-05-22 11:49:47 Re: pgbench: using prepared BEGIN statement in a pipeline could cause an error
Previous Message Hayato Kuroda (Fujitsu) 2023-05-22 10:20:31 RE: [PoC] pg_upgrade: allow to upgrade publisher node