Movement of restart_lsn position movement of logical replication slots is very slow

From: Jammie <shailesh(dot)jamloki(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Movement of restart_lsn position movement of logical replication slots is very slow
Date: 2020-12-14 03:59:54
Message-ID: CAFt1pcpGaT=_nbrY-dW5iryQUco9CinW8UfD-Y70kh=a70eXGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

We have two logical replication slots in our postgresql database
(version-11) instance and we are using pgJDBC to stream data from these two
slots. We are ensuring that when we regularly send feedback and update the
confirmed_flush_lsn (every 10 minutes) for both the slots to the same
position. However From our data we have seen that the restart_lsn movement
of the two are not in sync and most of the time one of them lags too far
behind to hold the WAL files unnecessarily. Here are some data points to
indicate the problem

Thu Dec 10 05:37:13 CET 2020
slot_name | restart_lsn |
confirmed_flush_lsn
--------------------------------------+---------------+---------------------
db_dsn_metadata_src_private | 48FB/F3000208 | 48FB/F3000208
db_dsn_metadata_src_shared | 48FB/F3000208 | 48FB/F3000208
(2 rows)

Thu Dec 10 13:53:46 CET 2020
slot_name | restart_lsn |
confirmed_flush_lsn
-------------------------------------+---------------+---------------------
db_dsn_metadata_src_private | 48FC/2309B150 | 48FC/233AA1D0
db_dsn_metadata_src_shared | 48FC/233AA1D0 | 48FC/233AA1D0
(2 rows)

Thu Dec 10 17:13:51 CET 2020
slot_name | restart_lsn |
confirmed_flush_lsn
-------------------------------------+---------------+---------------------
db_dsn_metadata_src_private | 4900/B4C3AE8 | 4900/94FDF908
db_dsn_metadata_src_shared | 48FD/D2F66F10 | 4900/94FDF908
(2 rows)

Though we are using setFlushLsn() and forceStatusUpdate for both the slot's
stream regularly still the slot with name private is far behind the
confirmed_flush_lsn and slot with name shared is also behind with
confirmed_flush_lsn but not too far. Since the restart_lsn is not moving
fast enough, causing lot of issues with WAL log file management and not
allowing to delete them to free up disk space

Please note that for the second slot we are not doing reading from the
stream rather just sending the feedback.

How can this problem be solved? Are there any general guidelines to
overcome this issue ?

Regards

Shailesh

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-12-14 04:08:48 Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Previous Message Michael Paquier 2020-12-14 03:52:35 Re: please update ps display for recovery checkpoint