Re: Logical replication timeout problem

From: Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
To: wangw(dot)fnst(at)fujitsu(dot)com
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, kuroda(dot)hayato(at)fujitsu(dot)com, Peter Smith <smithpb2250(at)gmail(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Tang, Haiying/唐 海英 <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: Logical replication timeout problem
Date: 2022-10-20 05:46:50
Message-ID: CAA5-nLDF=fVzN4opizMpUCGJbAiB4mPMaPYZNe+SmuLjCQuwfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Yes the refresh of MV is on the Publisher Side.
Thanks for your draft patch, I'll try it
I'll back to you as soonas possible

One question: why the refresh of the MV is a DDL not a DML?

Regards

Fabrice

On Wed, 19 Oct 2022, 10:15 wangw(dot)fnst(at)fujitsu(dot)com <wangw(dot)fnst(at)fujitsu(dot)com>
wrote:

> On Tue, Oct 18, 2022 at 22:35 PM Fabrice Chapuis <fabrice636861(at)gmail(dot)com>
> wrote:
> > Hello Amit,
> >
> > In version 14.4 the timeout problem for logical replication happens
> again despite
> > the patch provided for this issue in this version. When bulky
> materialized views
> > are reloaded it broke logical replication. It is possible to solve this
> problem by
> > using your new "streaming" option.
> > Have you ever had this issue reported to you?
> >
> > Regards
> >
> > Fabrice
> >
> > 2022-10-10 17:19:02 CEST [538424]: [17-1]
> > user=postgres,db=dbxxxa00,client=[local] CONTEXT: SQL statement "REFRESH
> > MATERIALIZED VIEW sxxxa00.table_base"
> > PL/pgSQL function refresh_materialized_view(text) line 5 at
> EXECUTE
> > 2022-10-10 17:19:02 CEST [538424]: [18-1]
> > user=postgres,db=dbxxxa00,client=[local] STATEMENT: select
> > refresh_materialized_view('sxxxa00.table_base');
> > 2022-10-10 17:19:02 CEST [538424]: [19-1]
> > user=postgres,db=dbxxxa00,client=[local] LOG: duration: 264815.652
> > ms statement: select refresh_materialized_view('sxxxa00.table_base');
> > 2022-10-10 17:19:27 CEST [559156]: [1-1] user=,db=,client= LOG:
> automatic
> > vacuum of table "dbxxxa00.sxxxa00.table_base": index scans: 0
> > pages: 0 removed, 296589 remain, 0 skipped due to pins, 0
> skipped frozen
> > tuples: 0 removed, 48472622 remain, 0 are dead but not yet
> removable,
> > oldest xmin: 1501528
> > index scan not needed: 0 pages from table (0.00% of total) had 0
> dead item
> > identifiers removed
> > I/O timings: read: 1.494 ms, write: 0.000 ms
> > avg read rate: 0.028 MB/s, avg write rate: 107.952 MB/s
> > buffer usage: 593301 hits, 77 misses, 294605 dirtied
> > WAL usage: 296644 records, 46119 full page images, 173652718
> bytes
> > system usage: CPU: user: 17.26 s, system: 0.29 s, elapsed: 21.32
> s
> > 2022-10-10 17:19:28 CEST [559156]: [2-1] user=,db=,client= LOG:
> automatic
> > analyze of table "dbxxxa00.sxxxa00.table_base"
> > I/O timings: read: 0.043 ms, write: 0.000 ms
> > avg read rate: 0.026 MB/s, avg write rate: 0.026 MB/s
> > buffer usage: 30308 hits, 2 misses, 2 dirtied
> > system usage: CPU: user: 0.54 s, system: 0.00 s, elapsed: 0.59 s
> > 2022-10-10 17:19:34 CEST [3898111]: [6840-1] user=,db=,client= LOG:
> checkpoint
> > complete: wrote 1194 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0
> recycled;
> > write=269.551 s, sync=0.002 s, total=269.560 s; sync files=251,
> longest=0.00
> > 1 s, average=0.001 s; distance=583790 kB, estimate=583790 kB
> > 2022-10-10 17:20:02 CEST [716163]: [2-1] user=,db=,client= ERROR:
> terminating
> > logical replication worker due to timeout
> > 2022-10-10 17:20:02 CEST [3897921]: [13-1] user=,db=,client= LOG:
> background
> > worker "logical replication worker" (PID 716163) exited with exit code 1
> > 2022-10-10 17:20:02 CEST [561346]: [1-1] user=,db=,client= LOG: logical
> > replication apply worker for subscription "subxxx_sxxxa00" has started
>
> Thanks for reporting!
>
> There is one thing I want to confirm:
> Is the statement `select refresh_materialized_view('sxxxa00.table_base');`
> executed on the publisher-side?
>
> If so, I think the reason for this timeout problem could be that during DDL
> (`REFRESH MATERIALIZED VIEW`), lots of temporary data is generated due to
> rewrite. Since these temporary data will not be processed by the pgoutput
> plugin, our previous fix for DML had no impact on this case.
> I think setting "streaming" option to "on" could work around this problem.
>
> I tried to write a draft patch (see attachment) on REL_14_4 to fix this.
> I tried it locally and it seems to work.
> Could you please confirm whether this problem is fixed after applying this
> draft patch?
>
> If this draft patch works, I will improve it and try to fix this problem.
>
> Regards,
> Wang wei
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2022-10-20 06:33:34 Re: Documentation refinement for Parallel Scans
Previous Message Amit Kapila 2022-10-20 05:37:24 Re: Logical WAL sender unresponsive during decoding commit