Re: Movement of restart_lsn position movement of logical replication slots is very slow

From: Jammie <shailesh(dot)jamloki(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Movement of restart_lsn position movement of logical replication slots is very slow
Date: 2020-12-15 05:30:11
Message-ID: CAFt1pcq2821FLDe0EffyTTJQaXMsh_d0Ai4Hyi+a0=1NKzYWLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks Amit for the response

We are using pgJDBC sample program here
https://jdbc.postgresql.org/documentation/head/replication.html

the setFlushLSN is coming from the pgJDBC only.

git hub for APIs of pgJDBC methods available.

https://github.com/pgjdbc/pgjdbc

The second slot refers to "private" slot.

So ""we are not doing reading from the stream' ==> It means that we are
having readPending call only from the shared slot then we get the
lastReceivedLSN() from stream and
send it back to stream as confirmed_flush_lsn for both private and shared
slot. We dont do readPending call to private slot. we will use private slot
only when we dont have choice. It is kind of reserver slot for us.

We are also doing forceUpdateStatus for both the slots().

Questions :
1) The confirmed_flush_lsn is updated when we get confirmation
from the downstream node about the flush_lsn but restart_lsn is only
incremented based on the LSN required by the oldest in-progress
transaction. ==> As explained above we are updating (setFlshLSN an API to
update confirmed_flush_lsn) both the slots with same LSN. So dont
understand why one leaves behind.

2) What are the other factors that might cause delay in updating
restart_lsn of the slot ?
3) In PG -13 does this behaviour change ?

Regards
Shailesh

the s

On Mon, Dec 14, 2020 at 4:51 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> On Mon, Dec 14, 2020 at 9:30 AM Jammie <shailesh(dot)jamloki(at)gmail(dot)com> wrote:
> >
> > Hello,
> >
> > We have two logical replication slots in our postgresql database
> (version-11) instance and we are using pgJDBC to stream data from these two
> slots.
> >
>
> IIUC, you are using some out-of-core outputplugin to stream the data?
> Are you using in walsender mechanism to decode the changes from slots
> or via SQL APIs?
>
> > We are ensuring that when we regularly send feedback and update the
> confirmed_flush_lsn (every 10 minutes) for both the slots to the same
> position. However From our data we have seen that the restart_lsn movement
> of the two are not in sync and most of the time one of them lags too far
> behind to hold the WAL files unnecessarily. Here are some data points to
> indicate the problem
> >
> > Thu Dec 10 05:37:13 CET 2020
> > slot_name | restart_lsn |
> confirmed_flush_lsn
> >
> --------------------------------------+---------------+---------------------
> > db_dsn_metadata_src_private | 48FB/F3000208 | 48FB/F3000208
> > db_dsn_metadata_src_shared | 48FB/F3000208 | 48FB/F3000208
> > (2 rows)
> >
> >
> >
> > Thu Dec 10 13:53:46 CET 2020
> > slot_name | restart_lsn |
> confirmed_flush_lsn
> >
> -------------------------------------+---------------+---------------------
> > db_dsn_metadata_src_private | 48FC/2309B150 | 48FC/233AA1D0
> > db_dsn_metadata_src_shared | 48FC/233AA1D0 | 48FC/233AA1D0
> > (2 rows)
> >
> >
> > Thu Dec 10 17:13:51 CET 2020
> > slot_name | restart_lsn |
> confirmed_flush_lsn
> >
> -------------------------------------+---------------+---------------------
> > db_dsn_metadata_src_private | 4900/B4C3AE8 | 4900/94FDF908
> > db_dsn_metadata_src_shared | 48FD/D2F66F10 | 4900/94FDF908
> > (2 rows)
> >
> > Though we are using setFlushLsn() and forceStatusUpdate for both the
> slot's stream regularly still the slot with name private is far behind the
> confirmed_flush_lsn and slot with name shared is also behind with
> confirmed_flush_lsn but not too far. Since the restart_lsn is not moving
> fast enough, causing lot of issues with WAL log file management and not
> allowing to delete them to free up disk space
> >
>
> What is this setFlushLsn? I am not able to find in the PG-code. If it
> is some outside code reference then please provide the link to code.
> In general, the restart_lsn and confirmed_flush_lsn are advanced in
> different ways so you might see some difference but it should not be
> this much. The confirmed_flush_lsn is updated when we get confirmation
> from the downstream node about the flush_lsn but restart_lsn is only
> incremented based on the LSN required by the oldest in-progress
> transaction.
>
> >
> > Please note that for the second slot we are not doing reading from the
> stream rather just sending the feedback.
> >
>
> Here does the second slot refers to 'shared' or 'private'? It is not
> very clear what you mean by "we are not doing reading from the
> stream', do you mean to say that decoding happens in the slot but the
> output plugin just throws away the streamed data and in the end just
> send the feedback?
>
> --
> With Regards,
> Amit Kapila.
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2020-12-15 05:55:00 Re: Misleading comment in prologue of ReorderBufferQueueMessage
Previous Message Michael Paquier 2020-12-15 05:20:33 Re: Proposed patch for key managment