Re: Initial COPY of Logical Replication is too slow

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Jan Wieck <jan(at)wi3ck(dot)info>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Initial COPY of Logical Replication is too slow
Date: 2026-03-31 19:29:37
Message-ID: CAD21AoC1wHPESMoNvef+3_+G8assoi966Adm0AiYcz+YW9RAmQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 31, 2026 at 4:39 AM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Sawada-san,
>
> Thanks for updating the patch! Few comments.

Thank you for the comments!

>
> 01.
> ```
> +/*
> + * Similar to is_publishable_calss() but checks whether the given OID
> + * is a publishable "table" or not.
> + */
> +static bool
> +is_publishable_table(Oid tableoid)
> ```
>
> s/is_publishable_calss/is_publishable_class/.
>
> 02.
> ```
> + ReleaseSysCache(tuple);
> + return true;
> ```
>
> Is it correct? I expected to return false here.
>
> 03.
> ```
> + /*
> + * Preliminary check if the specified table can be published in the
> + * first place. If not, we can return early without checking the given
> + * publications and the table.
> + */
> + if (filter_by_relid && !is_publishable_table(target_relid))
> + SRF_RETURN_DONE(funcctx);
> ```
>
> I think we must switch to the old context.
>
> 04.
> ```
> +CREATE PUBLICATION pub_all_except FOR ALL TABLES EXCEPT TABLE (tbl_parent, gpt_test_sch.tbl_sch) WITH (publish_via_partition_root = false);
> +CREATE PUBLICATION pub_all_except_no_viaroot FOR ALL TABLES EXCEPT TABLE (tbl_parent, gpt_test_sch.tbl_sch) WITH (publish_via_partition_root = true);
> ```
>
> It needs to be rebased due to 5984ea86.

Agreed with the all above points. I'll fix them in the next version patch.

>
> 05.
> ```
> CREATE VIEW pg_publication_tables AS
> SELECT
> P.pubname AS pubname,
> N.nspname AS schemaname,
> C.relname AS tablename,
> ( SELECT array_agg(a.attname ORDER BY a.attnum)
> FROM pg_attribute a
> WHERE a.attrelid = GPT.relid AND
> a.attnum = ANY(GPT.attrs)
> ) AS attnames,
> pg_get_expr(GPT.qual, GPT.relid) AS rowfilter
> FROM pg_publication P,
> LATERAL pg_get_publication_tables(P.pubname) GPT,
> pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)
> WHERE C.oid = GPT.relid;
> ```
>
> Can we use the new API of pg_get_publication_tables() here? Below change can pass
> tests on my env.
>
> ```
> - LATERAL pg_get_publication_tables(P.pubname) GPT,
> - pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)
> - WHERE C.oid = GPT.relid;
> + pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace),
> + LATERAL pg_get_publication_tables(ARRAY[P.pubname], C.oid) GPT;
> ```

I'm not sure the new API of pg_get_publication_tables() is better here
since this view is going to get the publication information of all
published tables, in which case the existing one might be faster.
Also, if a few tables among a huge number of tables (or whatever
relations) are published, checking all relations with the new API of
pg_get_publication_tables() would be quite slow.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2026-03-31 19:31:39 Re: AIO / read stream heuristics adjustments for index prefetching
Previous Message Melanie Plageman 2026-03-31 19:07:32 Re: Don't synchronously wait for already-in-progress IO in read stream