From: | Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, wangw(dot)fnst(at)fujitsu(dot)com |
Subject: | Re: Timeout when changes are filtered out by the core during logical replication |
Date: | 2023-01-09 14:50:47 |
Message-ID: | CAGEoWWTY6_0PfAz=HR3KrmykgMWGFdVs8A-GfCAAEEV1XFZsbw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Dec 23, 2022 at 2:45 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Thu, Dec 22, 2022 at 6:58 PM Ashutosh Bapat
> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> >
> > Hi All,
> > A customer ran a script dropping a few dozens of users in a transaction.
> Before dropping a user they change the ownership of the tables owned by
> that user to another user and revoking all the accesses from that user in
> the same transaction. There were a few thousand tables whose privileges and
> ownership was changed by this transaction. Since all of these changes were
> in catalog table, those changes were filtered out in
> ReorderBufferProcessTXN()
> > by the following code
> > if (!RelationIsLogicallyLogged(relation))
> > goto change_done;
> >
> > I tried to reproduce a similar situation through the attached TAP test.
> For 500 users and 1000 tables, we see that the transaction takes
> significant time but logical decoding does not take much time. So with the
> default 1 min WAL sender and receiver timeout I could not reproduce the
> timeout. Beyond that our TAp test itself times out.
> >
> > But I think there's a possibility that the logical receiver will time
> out this way when decoding a sufficiently large transaction which takes
> more than the timeout amount of time to decode. So I think we need to call
> OutputPluginUpdateProgress() after a regular interval (in terms of time or
> number of changes) to consume any feedback from the subscriber or send a
> keep-alive message.
> >
>
> I don't think it will be a good idea to directly call
> OutputPluginUpdateProgress() from reorderbuffer.c. There is already a
> patch to discuss this problem [1].
>
Yeah. I don't mean to use OutputPluginUpdateProgress() directly. The patch
just showed that it helps calling it there in some way. Thanks for pointing
the other thread. I have reviewed the patch on that thread and continue the
discussion there.
>
> > Following commit
> > ```
> > commit 87c1dd246af8ace926645900f02886905b889718
> > Author: Amit Kapila <akapila(at)postgresql(dot)org>
> > Date: Wed May 11 10:12:23 2022 +0530
> >
> > Fix the logical replication timeout during large transactions.
> >
> > ```
> > fixed a similar problem when the changes were filtered by an output
> plugin, but in this case the changes are not being handed over to the
> output plugin as well. If we fix it in the core we may not need to handle
> it in the output plugin as that commit does. The commit does not have a
> test case which I could run to reproduce the timeout.
> >
>
> It is not evident how to write a stable test for this because
> estimating how many changes are enough for the configured
> wal_receiver_timeout to
> pass on all the buildfarm machines is tricky. If you have good ideas
> then feel free to propose a test patch.
>
Will continue this on the other thread.
--
Best Wishes,
Ashutosh
From | Date | Subject | |
---|---|---|---|
Next Message | Maxim Orlov | 2023-01-09 15:01:37 | Re: Add SHELL_EXIT_CODE to psql |
Previous Message | Bharath Rupireddy | 2023-01-09 14:30:00 | Lazy allocation of pages required for verifying FPI consistency |