From: | "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com> |
---|---|
To: | Peter Smith <smithpb2250(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com> |
Subject: | RE: Logical replication timeout problem |
Date: | 2023-01-20 07:19:28 |
Message-ID: | OS3PR01MB6275CFAEDCCE48884C94072F9EC59@OS3PR01MB6275.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jan 20, 2023 at 10:10 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> Here are some review comments for patch v3-0001.
Thanks for your comments.
> ======
> Commit message
>
> 1.
> The problem is when there is a DDL in a transaction that generates lots of
> temporary data due to rewrite rules, these temporary data will not be
> processed
> by the pgoutput - plugin. Therefore, the previous fix (f95d53e) for DML had no
> impact on this case.
>
> ~
>
> 1a.
> IMO this comment needs to give a bit of background about the original
> problem here, rather than just starting with "The problem is" which is
> describing the flaws of the previous fix.
Added some related message.
> ~
>
> 1b.
> "pgoutput - plugin" -> "pgoutput plugin" ??
Changed.
> ~~~
>
> 2.
>
> To fix this, we introduced a new ReorderBuffer callback -
> 'ReorderBufferUpdateProgressCB'. This callback is called to try to update the
> process after each change has been processed during sending data of a
> transaction (and its subtransactions) to the output plugin.
>
> IIUC it's not really "after each change" - shouldn't this comment
> mention something about the CHANGES_THRESHOLD 100?
Changed.
> ~~~
>
> 4. update_progress_cb_wrapper
>
> +/*
> + * Update progress callback
> + *
> + * Try to update progress and send a keepalive message if too many changes
> were
> + * processed when processing txn.
> + *
> + * For a large transaction, if we don't send any change to the downstream for a
> + * long time (exceeds the wal_receiver_timeout of standby) then it can
> timeout.
> + * This can happen when all or most of the changes are either not published or
> + * got filtered out.
> + */
>
> SUGGESTION (instead of the "Try to update" sentence)
> Send a keepalive message whenever more than <CHANGES_THRESHOLD>
> changes are encountered while processing a transaction.
Since it's possible that keep-alive messages won't be sent even if the
threshold is reached (see function WalSndKeepaliveIfNecessary), I thought it
might be better to use "try to".
And rewrote the comments here because the threshold logic is moved to the
function ReorderBufferProcessTXN.
> ~~~
>
> 5.
>
> +static void
> +update_progress_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
> + ReorderBufferChange *change)
> +{
> + LogicalDecodingContext *ctx = cache->private_data;
> + LogicalErrorCallbackState state;
> + ErrorContextCallback errcallback;
> + static int changes_count = 0; /* Static variable used to accumulate
> + * the number of changes while
> + * processing txn. */
> +
>
> IMO this may be more readable if the static 'changes_count' local var
> was declared first and separated from the other vars by a blank line.
Changed.
> ~~~
>
> 6.
>
> + /*
> + * We don't want to try sending a keepalive message after processing each
> + * change as that can have overhead. Tests revealed that there is no
> + * noticeable overhead in doing it after continuously processing 100 or so
> + * changes.
> + */
> +#define CHANGES_THRESHOLD 100
>
> 6a.
> I think it might be better to define this right at the top of the
> function adjacent to the 'changes_count' variable (e.g. a bit like the
> original HEAD code looked)
Changed.
> ~
>
> 6b.
> SUGGESTION (for the comment)
> Sending keepalive messages after every change has some overhead, but
> testing showed there is no noticeable overhead if keepalive is only
> sent after every ~100 changes.
Changed.
> ~~~
>
> 7.
>
> +
> + /*
> + * After continuously processing CHANGES_THRESHOLD changes, we
> + * try to send a keepalive message if required.
> + */
> + if (++changes_count >= CHANGES_THRESHOLD)
> + {
> + ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
> + changes_count = 0;
> + }
> +
>
> 7a.
> SUGGESTION (for comment)
> Send a keepalive message after every CHANGES_THRESHOLD changes.
Changed.
Regards,
Wang Wei
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Smith | 2023-01-20 07:38:08 | Re: Time delayed LR (WAS Re: logical replication restrictions) |
Previous Message | Amit Langote | 2023-01-20 07:19:08 | Re: generic plans and "initial" pruning |