Re: Assorted improvements in pg_dump

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hans Buschmann <buschmann(at)nidsa(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Assorted improvements in pg_dump
Date: 2021-10-25 19:30:24
Message-ID: 20211025193024.ssesntsq25muk7py@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-10-24 17:10:55 -0400, Tom Lane wrote:
> 0004 is not committable as-is, because it assumes that the source
> server has single-array unnest(), which is not true before 8.4.
> We could fix that by using "oid = ANY(array-constant)" conditions
> instead, but I'm unsure about the performance properties of that
> for large OID arrays on those old server versions.

It doesn't seem bad at all. 8.3 assert:

CREATE TABLE foo(oid oid primary key);
INSERT INTO foo SELECT generate_series(1, 1000000);
postgres[1164129][1]=# explain ANALYZE SELECT count(*) FROM foo WHERE oid = ANY(ARRAY(SELECT generate_series(1100000, 1, -1)));
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Aggregate (cost=81.54..81.55 rows=1 width=0) (actual time=2433.656..2433.656 rows=1 loops=1) │
│ InitPlan │
│ -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.004..149.425 rows=1100000 loops=1) │
│ -> Bitmap Heap Scan on foo (cost=42.70..81.50 rows=10 width=0) (actual time=2275.137..2369.478 rows=1000000 loops=1) │
│ Recheck Cond: (oid = ANY (($0)::oid[])) │
│ -> Bitmap Index Scan on foo_pkey (cost=0.00..42.69 rows=10 width=0) (actual time=2274.077..2274.077 rows=1000000 loops=1) │
│ Index Cond: (oid = ANY (($0)::oid[])) │
│ Total runtime: 2436.201 ms │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(8 rows)

Time: 2437.568 ms (00:02.438)

> Lastly, 0006 implements the other idea we'd discussed in the other
> thread: for queries that are issued repetitively but not within a
> single pg_dump function invocation, use PREPARE/EXECUTE to cut down
> the overhead. This gets only diminishing returns after 0004, but
> it still brings "pg_dump -s regression" down from 0.39s to 0.33s,
> so maybe it's worth doing.

I think it's worth doing. There's things that the batch approach won't help
with and even if it doesn't help a lot with the regression test database, I'd
expect it to help plenty with other cases.

A test database I had around with lots of functions got drastically faster to
dump (7.4s to 2.5s), even though the number of queries didn't change
significantly. According to pg_stat_statements plan_time for the dumpFunc
query went from 2352ms to 0.4ms - interestingly execution time nearly halves
as well.

> I stopped after caching the plans for
> functions/aggregates/operators/types, though. The remaining sorts
> of objects aren't likely to appear in typical databases enough times
> to be worth worrying over. (This patch will be a net loss if there
> are more than zero but less than perhaps 10 instances of an object type,
> so there's definitely reasons beyond laziness for not doing more.)

Seems reasonable.

> @@ -7340,25 +7340,37 @@ getDomainConstraints(Archive *fout, TypeInfo *tyinfo)
> i_consrc;
> int ntups;
>
> - query = createPQExpBuffer();
> + static bool query_prepared = false;
>
> - if (fout->remoteVersion >= 90100)
> - appendPQExpBuffer(query, "SELECT tableoid, oid, conname, "
> - "pg_catalog.pg_get_constraintdef(oid) AS consrc, "
> - "convalidated "
> - "FROM pg_catalog.pg_constraint "
> - "WHERE contypid = '%u'::pg_catalog.oid "
> - "ORDER BY conname",
> - tyinfo->dobj.catId.oid);
> + if (!query_prepared)
> + {

I wonder if it'd be better to store this in Archive or such. The approach with
static variables might run into problems with parallel pg_dump at some
point. These objects aren't dumped in parallel yet, but still...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-10-25 19:30:31 Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)
Previous Message Robert Haas 2021-10-25 19:25:32 Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)