Re: Initial COPY of Logical Replication is too slow

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Jan Wieck <jan(at)wi3ck(dot)info>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Initial COPY of Logical Replication is too slow
Date: 2026-03-25 05:06:54
Message-ID: CAD21AoBJcxRcaWQot302diaxoDcsnezRhnZa7p8UrPh5AGNeHQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 24, 2026 at 11:57 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Tue, Mar 24, 2026 at 3:47 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Thu, Mar 19, 2026 at 4:59 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Mar 18, 2026 at 3:31 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > >
> > > I've attached the patch to implement this idea. The patch still
> > > introduces a new function but it overloads
> > > pg_get_publication_tables(). We might be able to handle different
> > > input (array or text) in pg_get_publication_tables() better, but it's
> > > enough for discussion at least.
> > >
> >
> > *
> > + /*
> > + * We can pass relid to pg_get_publication_table_info() since
> > + * version 19.
> > + */
> > + appendStringInfo(&cmd,
> > + "SELECT DISTINCT"
> > + " (CASE WHEN (array_length(gpt.attrs, 1) = c.relnatts)"
> > + " THEN NULL ELSE gpt.attrs END)"
> > + " FROM pg_publication p,"
> > + " LATERAL pg_get_publication_tables(p.pubname, %u) gpt,"
> > + " pg_class c"
> > + " WHERE c.oid = gpt.relid"
> > + " AND p.pubname IN ( %s )",
> > + lrel->remoteid,
> > + pub_names->data);
> >
> > Why in the above query we need a join with pg_publication? Can't we
> > directly pass 'pub_names' and 'relid' to pg_get_publication_tables()
> > to get the required information?
>
> Since the 'pub_names' is the list of publication names we cannot
> directly pass it to the pg_get_publication_tables(). But if we make
> pg_get_publication_tables() take {pubname text[], target_relid oid}
> instead of {pubname text, target_relid oid}, yes. And it seems to help
> somewhat simplify the patch. If having both
> pg_get_publication_tables(VARIADIC text[]) and
> pg_get_publication_tables(text[], oid) is not odd, it would be worth
> trying it.
>

I figured out that the join with pg_publication works as a filter;
non-existence publication names are not passed to the function. If we
pass the list of publication names to the new function signature,
while we can simplify the patch and avoid a join, we would change the
existing function behavior so that it ignores non-existence
publications.

I've attached the updated patch. The 0001 patch just incorporated the
review comments so far, and the 0002 patch is a draft change for the
above idea. Since pg_get_publication_tables(VARIADIC text) is not a
documented function, I think we can accept small behavior changes. So
I'm going to go with this direction. Feedback is very welcome.

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v4-0002-POC-pass-the-list-of-publications-to-pg_get_publi.patch text/x-patch 21.1 KB
v4-0001-Avoid-full-table-scans-when-getting-publication-t.patch text/x-patch 26.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2026-03-25 05:11:00 Re: [PATCH v2] Re: Cancel problems of query to pg_stat_statements
Previous Message Michael Paquier 2026-03-25 05:02:27 Re: access numeric data in module