Quick Links

Re: [PATCH] pg_dump: lock tables in batches

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: [PATCH] pg_dump: lock tables in batches
Date:	2022-12-07 23:45:05
Message-ID:	20221207234505.3bj7d65vx57o7xij@awork3.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 2022-12-07 17:53:05 -0500, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > With an artificial delay of 100ms, the perf difference between the batching
> > patch and not using the batching patch is huge. Huge enough that I don't have
> > the patience to wait for the non-batched case to complete.
>
> Clearly, if you insert a sufficiently large artificial round-trip delay,
> even squeezing a single command out of a pg_dump run will appear
> worthwhile. What I'm unsure about is whether it's worthwhile at
> realistic round-trip delays (where "realistic" means that the dump
> performance would otherwise be acceptable). I think the reason I didn't
> pursue this last year is that experimentation convinced me the answer
> was "no".

It seems to be a win even without any artificial delay. Not a *huge* win, but
a noticable win. And even just a few ms make it quite painful.

> > With batching pg_dump -s -h localhost t10000 took 0:16.23 elapsed, without I
> > cancelled after 603 tables had been locked, which took 2:06.43.
>
> Is "-s" mode actually a relevant criterion here? With per-table COPY
> commands added into the mix you could not possibly get better than 2x
> improvement, and likely a good deal less.

Well, -s isn't something used all that rarely, so it'd not be insane to
optimize it in isolation. But more importantly, I think the potential win
without -s is far bigger than 2x, because the COPYs can be done in parallel,
whereas the locking happens in the non-parallel stage.

With just a 5ms delay, very well within normal network latency range, I get:

pg_dump.master -h localhost -j10 -f /tmp/pg_dump_backup -Fd t10000
2m7.830s

pg_dump.pipeline -h localhost -j10 -f /tmp/pg_dump_backup -Fd t10000
0m24.183s

pg_dump.batch -h localhost -j10 -f /tmp/pg_dump_backup -Fd t10000
0m24.321s

Greetings,

Andres Freund

In response to

Re: [PATCH] pg_dump: lock tables in batches at 2022-12-07 22:53:05 from Tom Lane

Responses

Re: [PATCH] pg_dump: lock tables in batches at 2022-12-08 00:03:00 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	samay sharma	2022-12-07 23:49:19	Re: PGDOCS - Logical replication GUCs - added some xrefs
Previous Message	Andres Freund	2022-12-07 23:35:18	Re: Error-safe user functions