Re: Can we get rid of repeated queries from pg_dump?

From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Can we get rid of repeated queries from pg_dump?
Date: 2021-08-30 07:44:43
Message-ID: 20210830074442.GB15241@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Fri, Aug 27, 2021 at 05:23:23PM -0400, Tom Lane wrote:
> I experimented with the attached, very quick-n-dirty patch to collect
> format_type results during the initial scan of pg_type, instead. On the
> regression database in HEAD, it reduces the number of queries pg_dump
> issues from 3260 to 2905; but I'm having a hard time detecting any net
> performance change.

Hi,
So, I applied it to brand new HEAD from git, Result:

From total of 9173 queries it went down to 4178.
Originally 5000 type queries, now 19!
This is actually strange given that previously it was asking querying
about 83 separate type oids. But, as far as I was able to check with
"pg_restore -l" (from -Fc dump), results are the same.

Dump time down from 17m 22s to 8m 12s.

Then, I applied the patch from
https://www.postgresql.org/message-id/1082810.1630189581%40sss.pgh.pa.us

without removing first one, as you said they are quite independent.

With both patches applied I got 3884 queries total, and dump from
original db in 7m 35s.

So this clearly helps. A LOT.

But since we're looking at it, and with both patches applied, I looked
at the next most common query. Which is:

#v+
SELECT
proretset,
prosrc,
probin,
provolatile,
proisstrict,
prosecdef,
lanname,
proconfig,
procost,
prorows,
pg_catalog.pg_get_function_arguments(p.oid) AS funcargs,
pg_catalog.pg_get_function_identity_arguments(p.oid) AS funciargs,
pg_catalog.pg_get_function_result(p.oid) AS funcresult,
proleakproof,
array_to_string(protrftypes, ' ') AS protrftypes,
proparallel,
prokind,
prosupport,
NULL AS prosqlbody
FROM pg_catalog.pg_proc p, pg_catalog.pg_language l
WHERE p.oid = '25491'::pg_catalog.oid AND l.oid = p.prolang
#v-

From the 3884 in the current pg_dump (with both patches applied) - these
queries were called 1804 times. All of these calls where with different oids,
so it's possible that there is nothing to be done about it, but figured I'll
let you know.

The thing is - even though it was called 1804 times, dump contains data only
about 107 functions (pg_restore -l schema.dump | grep -c FUNCTION), so it kinda
seems that 94% of these calls is not needed.

Anyway, even if we can't get any help for function queries, improvement of over
50% is great.

Best regards,

depesz

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Boyapalli, Kousal 2021-08-30 08:21:32 RE: user creation time for audit
Previous Message Mladen Gogala 2021-08-29 20:35:47 Re: database design with temporary tables

Browse pgsql-hackers by date

  From Date Subject
Next Message Rajkumar Raghuwanshi 2021-08-30 07:51:21 Re: Multi-Column List Partitioning
Previous Message Bharath Rupireddy 2021-08-30 07:32:05 improve pg_receivewal code