Re: BUG #19041: Logical replication locks wal processing

From: Sergey Belyashov <sergey(dot)belyashov(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #19041: Logical replication locks wal processing
Date: 2025-09-03 12:10:32
Message-ID: CAOe0RDw-Su=p=L6AzaFCQ85GMmgagioTJsqZpP+7pRWz_Fpy7Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thank you for your explanation. It is what I'm about.
Is it possible to optimize something to prevent decoding the whole WAL
if there are no affected tables there? For example, just skip a lot of
WAL blocks with unpublished tables. And/or combine WAL decoding in one
separate process for each publication which works for all active
affected subscriptions, if subscription is not active (server down or
too busy) then it is switched to legacy selfdecoding.

Best regards,
Sergey Belyashov

ср, 3 сент. 2025 г. в 14:56, Dilip Kumar <dilipbalaut(at)gmail(dot)com>:
>
> On Wed, Sep 3, 2025 at 4:55 PM PG Bug reporting form
> <noreply(at)postgresql(dot)org> wrote:
> >
> > The following bug has been logged on the website:
> >
> > Bug reference: 19041
> > Logged by: Sergey Belyashov
> > Email address: sergey(dot)belyashov(at)gmail(dot)com
> > PostgreSQL version: 17.6
> > Operating system: Debian bookworm x86_64
> > Description:
> >
> > I have few Postgresql servers: A, B, C, D... Each servers has tables: users,
> > settings and t with partitions t1, t2, t3... Server A publish tables users
> > and settings, and other servers are subscribed on them using one
> > subscription (logical replication is used). Other servers publish tables t
> > and server A subscribed on it (it does not matter, I think). When I create
> > table s (without indexes) on server A and copy huge about of rows (30M+) all
> > workers which do replication users,settings tables loads cpu cores by 35-80%
> > for a long time (5+ hours, I do not wait more). WAL is raising at this time
> > (37GB+). When I drop subscriptions to these tables from servers B, C... then
> > WAL is processed very fast. When I keep only one subscription undropped then
> > WAL is processed (reduced to 2-3 GB) in a hour and only one process eats
> > 35-50% of CPU core time until WAL is not processed. Tables users and
> > settings are not changed or using (there are not foreign keys in the table
> > s) during the issued case. So the impact on their publication/subscriptions
> > is unexpected.
>
> Just to give some context, a separate walsender process will be
> created for each subscription and every walsender will decode(process)
> all the WALs, so that's the reason that when you have multiple
> subscriber there would be multiple walsender process will be their to
> decode the WAL and each will consume CPU for decoding OTOH if you if
> you only keep one subscription then there will only be one walsender
> for decoding the WAL. And WAL can not be removed until all the data
> are streamed to all the subscribers and if you drop all the
> subscribers WAL can be removed by the checkpoint if it is not required
> by any other purpose like hot standby or wal archiver.
>
> --
> Regards,
> Dilip Kumar
> Google

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Álvaro Herrera 2025-09-03 12:28:30 Re: BUG #18960: Mistake in test test_simple_pipeline (libpq_pipeline.c)
Previous Message Dilip Kumar 2025-09-03 11:55:53 Re: BUG #19041: Logical replication locks wal processing