Re: [PATCH] pg_dump: lock tables in batches

From: Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] pg_dump: lock tables in batches
Date: 2022-12-07 21:14:01
Message-ID: CAFcNs+rwNf7oUYg6T76_in8a8Ct5oHP7MspG3JO83fJHQ314fA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 7, 2022 at 2:28 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2022-12-07 10:44:33 -0500, Tom Lane wrote:
> >> I have a strong sense of deja vu here. I'm pretty sure I experimented
> >> with this idea last year and gave up on it. I don't recall exactly
> >> why, but either it didn't show any meaningful performance improvement
> >> for me or there was some actual downside (that I'm not remembering
> >> right now).
>
> > IIRC the case we were looking at around 989596152 were CPU bound
workloads,
> > rather than latency bound workloads. It'd not be surprising to have
cases
> > where batching LOCKs helps latency, but not CPU bound.
>
> Yeah, perhaps. Anyway my main point is that I don't want to just assume
> this is a win; I want to see some actual performance tests.
>

Here we have some numbers about the Aleksander's patch:

1) Setup script

CREATE DATABASE t1000;
CREATE DATABASE t10000;
CREATE DATABASE t100000;

\c t1000
SELECT format('CREATE TABLE t%s(c1 INTEGER PRIMARY KEY, c2 TEXT, c3
TIMESTAMPTZ);', i) FROM generate_series(1, 1000) AS i \gexec

\c t10000
SELECT format('CREATE TABLE t%s(c1 INTEGER PRIMARY KEY, c2 TEXT, c3
TIMESTAMPTZ);', i) FROM generate_series(1, 10000) AS i \gexec

\c t100000
SELECT format('CREATE TABLE t%s(c1 INTEGER PRIMARY KEY, c2 TEXT, c3
TIMESTAMPTZ);', i) FROM generate_series(1, 100000) AS i \gexec

2) Execution script

time pg_dump -s t1000 > /dev/null
time pg_dump -s t10000 > /dev/null
time pg_dump -s t100000 > /dev/null

3) HEAD execution

$ time pg_dump -s t1000 > /dev/null
0.02user 0.01system 0:00.36elapsed 8%CPU (0avgtext+0avgdata
11680maxresident)k
0inputs+0outputs (0major+1883minor)pagefaults 0swaps

$ time pg_dump -s t10000 > /dev/null
0.30user 0.10system 0:05.04elapsed 8%CPU (0avgtext+0avgdata
57772maxresident)k
0inputs+0outputs (0major+14042minor)pagefaults 0swaps

$ time pg_dump -s t100000 > /dev/null
3.42user 2.13system 7:50.09elapsed 1%CPU (0avgtext+0avgdata
517900maxresident)k
0inputs+0outputs (0major+134636minor)pagefaults 0swaps

4) PATCH execution

$ time pg_dump -s t1000 > /dev/null
0.02user 0.00system 0:00.28elapsed 9%CPU (0avgtext+0avgdata
11700maxresident)k
0inputs+0outputs (0major+1886minor)pagefaults 0swaps

$ time pg_dump -s t10000 > /dev/null
0.18user 0.03system 0:02.17elapsed 10%CPU (0avgtext+0avgdata
57592maxresident)k
0inputs+0outputs (0major+14072minor)pagefaults 0swaps

$ time pg_dump -s t100000 > /dev/null
1.97user 0.32system 0:21.39elapsed 10%CPU (0avgtext+0avgdata
517932maxresident)k
0inputs+0outputs (0major+134892minor)pagefaults 0swaps

5) Summary

HEAD patch
1k tables 0:00.36 0:00.28
10k tables 0:05.04 0:02.17
100k tables 7:50.09 0:21.39

Seems we get very good performance gain using Aleksander's patch. I used
the "-s" to not waste time issuing COPY for each relation (even all is
empty) and evidence the difference due to roundtrip for LOCK TABLE. This
patch will also improve the pg_upgrade execution over database with
thousands of relations.

Regards,

--
Fabrízio de Royes Mello

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2022-12-07 21:30:16 Re: Transaction timeout
Previous Message Tom Lane 2022-12-07 20:16:16 Re: Error-safe user functions