| From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
|---|---|
| To: | Marcos Pegoraro <marcos(at)f10(dot)com(dot)br> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Initial COPY of Logical Replication is too slow |
| Date: | 2026-02-25 19:03:08 |
| Message-ID: | CAD21AoDQM62GOtaTzD_CVMSsFhv6o9c0Au1dSM1QuxeKFkWAKw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Jan 26, 2026 at 12:30 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Jan 19, 2026 at 9:44 AM Marcos Pegoraro <marcos(at)f10(dot)com(dot)br> wrote:
> >
> > Em sex., 19 de dez. de 2025 às 22:59, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> escreveu:
> >>
> >> Yeah, if we pass a publication that a lot of tables belong to to
> >> pg_get_publication_tables(), it could take a long time to return as it
> >> needs to construct many entries.
> >
> >
> > Well, I don't know how to help but I'm sure it's working badly.
> > Today I added some fields on my server, then seeing logs I could see how slow this process is.
> >
> > duration: 2213.872 ms statement: SELECT DISTINCT (CASE WHEN (array_length(gpt.attrs, 1) = c.relnatts) THEN NULL ELSE gpt.attrs END) FROM pg_publication p, LATERAL pg_get_publication_tables(p.pubname) gpt, pg_class c WHERE gpt.relid = 274376788 AND c.oid = gpt.relid AND p.pubname IN ( 'mypub' )
> >
> > 2 seconds to get the list of fields of a table is really too slow.
> > How can we solve this ?
>
> After more investigation of slowness, it seems that the
> list_concat_unique_oid() called below is quite slow when the database
> has a lot of tables to publish:
>
> relids = GetPublicationRelations(pub_elem->oid,
> pub_elem->pubviaroot ?
> PUBLICATION_PART_ROOT :
> PUBLICATION_PART_LEAF);
> schemarelids = GetAllSchemaPublicationRelations(pub_elem->oid,
> pub_elem->pubviaroot ?
> PUBLICATION_PART_ROOT :
> PUBLICATION_PART_LEAF);
> pub_elem_tables = list_concat_unique_oid(relids, schemarelids);
>
> This is simply because it's O(n^2), where n is the number of oids in
> schemarelids in the test case. A simple change would be to do sort &
> dedup instead. With the attached experimental patch, the
> pg_get_publication_tables() execution time gets halved in my
> environment (796ms -> 430ms with 50k tables). If the number of tables
> is not large, this method might be slower than today but it's not a
> huge regression.
>
> In the initial tablesync cases, it could be optimized further in a way
> that we introduce a new SQL function that gets the column list and
> expr of the specific table. This way, we can filter the result by
> relid at an early stage instead of getting all information and
> filtering by relid as the tablesync worker does today, avoiding
> overheads of gathering system catalog scan results.
I've drafted this idea and I find it looks like a better approach. The
patch introduces the pg_get_publication_table_info() SQL function that
returns the column list and row filter expression like
pg_get_publication_tables() returns but it checks only the specific
table unlike pg_get_publication_tables(). On my env, the tablesync
worker's query in question becomes 0.6ms from 288 ms with 50k tables
in one publication. Feedback is very welcome.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
| Attachment | Content-Type | Size |
|---|---|---|
| 0001-Add-pg_get_publication_table_info-to-optimize-logica.patch | text/x-patch | 9.7 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Antonin Houska | 2026-02-25 19:04:13 | Re: Adding REPACK [concurrently] |
| Previous Message | Matheus Alcantara | 2026-02-25 18:51:02 | Re: Enable partitionwise join for partition keys wrapped by RelabelType |