Re: Can we get rid of repeated queries from pg_dump?

From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Can we get rid of repeated queries from pg_dump?
Date: 2021-08-26 16:06:44
Message-ID: 20210826160644.GA3120@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Thu, Aug 26, 2021 at 10:20:29AM -0400, Tom Lane wrote:
> Well, you could move it forward by doing the legwork to identify which
> queries are worth merging. Is it really sane to do a global "select
> format_type() from pg_type" query and save all the results on the client
> side? I wonder whether there are cases where that'd be a net loss.
> You could do the experimentation to figure that out without necessarily
> having the C skills to make pg_dump actually do it.

So, I got some info.

First, some stats. The DB contains:

- 14 extensions
- 1 aggregate
- 107 functions
- 5 schemas
- 5 sequences
- 188 logged tables
- 1 unlogged table
- 206 "normal" indexes
- 30 unique indexes
- 15 materialized views
- 16 triggers
- 87 types
- 26 views

pg_dump -s of it is ~ 670kB.

Interestingly, while dumping (pg_dump -s -v), we can see progress going on, and then, after:

====
...
pg_dump: reading publications
pg_dump: reading publication membership
pg_dump: reading subscriptions
pg_dump: reading dependency data
pg_dump: saving encoding = UTF8
pg_dump: saving standard_conforming_strings = on
pg_dump: saving search_path =
====

It stops (progress visible in console). And then, in pg logs I see queries like:

#v+
SELECT
proretset,
prosrc,
probin,
provolatile,
proisstrict,
prosecdef,
lanname,
proconfig,
procost,
prorows,
pg_catalog.pg_get_function_arguments(p.oid) AS funcargs,
pg_catalog.pg_get_function_identity_arguments(p.oid) AS funciargs,
pg_catalog.pg_get_function_result(p.oid) AS funcresult,
proleakproof,
array_to_string(protrftypes, ' ') AS protrftypes,
proparallel,
prokind,
prosupport,
NULL AS prosqlbody
FROM pg_catalog.pg_proc p, pg_catalog.pg_language l
WHERE p.oid = '43875'::pg_catalog.oid AND l.oid = p.prolang
#v-

Now for query stats.

To dump it all, pg_dump needed 9173 queries (logged by
log_min_duration_statement = 0 for this user).

I extracted all queries to separate files, and made stats. In total there were
only 4257 unique queries.

Then I checked for repeated queries. Top 10 most repeated offenders were:

615 times : SELECT pg_catalog.format_type('25'::pg_catalog.oid, NULL)
599 times : SELECT pg_catalog.format_type('23'::pg_catalog.oid, NULL)
579 times : SELECT pg_catalog.format_type('2281'::pg_catalog.oid, NULL)
578 times : SELECT pg_catalog.format_type('41946'::pg_catalog.oid, NULL)
523 times : SELECT pg_catalog.format_type('701'::pg_catalog.oid, NULL)
459 times : SELECT pg_catalog.format_type('42923'::pg_catalog.oid, NULL)
258 times : SELECT pg_catalog.format_type('16'::pg_catalog.oid, NULL)
176 times : SELECT pg_catalog.format_type('19'::pg_catalog.oid, NULL)
110 times : SELECT pg_catalog.format_type('21'::pg_catalog.oid, NULL)
106 times : SELECT pg_catalog.format_type('42604'::pg_catalog.oid, NULL)

In total, there were 5000 queries:
SELECT pg_catalog.format_type('[0-9]+'::pg_catalog.oid, NULL)

But there were only 83 separate oids that were scanned.

The only other repeated command was:
SELECT pg_catalog.set_config('search_path', '', false);
and it was called only twice.

Based on my reading of queries in order it seems to follow the pattern of:

One call for:

SELECT proretset, prosrc, probin, provolatile, proisstrict, prosecdef, lanname, proconfig, procost, prorows, pg_catalog.pg_get_function_arguments(p.oid) AS funcargs, pg_catalog.pg_get_function_identity_arguments(p.oid) AS funciargs, pg_catalog.pg_get_function_re
sult(p.oid) AS funcresult, proleakproof, array_to_string(protrftypes, ' ') AS protrftypes, proparallel, prokind, prosupport, NULL AS prosqlbody FROM pg_catalog.pg_proc p, pg_catalog.pg_language l WHERE p.oid = 'SOME_NUMBER'::pg_catalog.oid AND l.oid = p.prolang

and then one or more:

SELECT pg_catalog.format_type('SOME_NUMBER'::pg_catalog.oid, NULL)

In one case, after proc query, there were 94 concecutive
pg_catalog.format_type queries.

I hope it helps.

Best regards,

depesz

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2021-08-26 19:43:10 Re: lower() and unaccent() not leakproof
Previous Message Daniel Gustafsson 2021-08-26 15:46:14 Re: lower() and unaccent() not leakproof

Browse pgsql-hackers by date

  From Date Subject
Next Message Vladimir Sitnikov 2021-08-26 16:08:54 Re: speed up verifying UTF-8
Previous Message Robert Haas 2021-08-26 15:52:00 Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)