Improve pg_dump dumping publication tables

From: "Hsu, John" <hsuchen(at)amazon(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Improve pg_dump dumping publication tables
Date: 2020-09-04 18:03:39
Message-ID: EA6F1B1A-8081-4200-A83F-FB9DAC6B49D0@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

I was wondering if there's a good reason in pg_dump getPublicationTables()
to iterate through all tables one by one and querying to see if it has a
corresponding publication other than memory concerns?

Attached is a patch to construct the publications by querying for all publications
at once and then finding the corresponding tableInfo information in tblinfoindex.

It seems more likely that users will have a lot more tables than publications so
this should be a lot faster, especially when there's millions of tables.

Setup:
time pg_dump --schema-only -d postgres -f dump.sql -v 2> dump.log
10k tables, each table having its own publication.

Before patch:
real 0m6.409s
user 0m0.234s
sys 0m0.584s

With patch:
real 0m5.483s
user 0m0.399s
sys 0m0.443s

10k tables, no publications.

Before patch:
real 0m6.193s
user 0m0.276s
sys 0m0.531s

With patch:
real 0m5.261s
user 0m0.254s
sys 0m0.377s

Thanks,

John H

Attachment Content-Type Size
0001-pg_dump-Iterate-through-all-publication-tables-in-pg.patch application/octet-stream 5.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexey Kondratov 2020-09-04 18:31:14 Re: Global snapshots
Previous Message Andrew Dunstan 2020-09-04 18:00:13 Re: pg_dump bug for extension owned tables