From: | Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: [PATCH] pg_dump: lock tables in batches |
Date: | 2022-12-07 21:14:01 |
Message-ID: | CAFcNs+rwNf7oUYg6T76_in8a8Ct5oHP7MspG3JO83fJHQ314fA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Dec 7, 2022 at 2:28 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2022-12-07 10:44:33 -0500, Tom Lane wrote:
> >> I have a strong sense of deja vu here. I'm pretty sure I experimented
> >> with this idea last year and gave up on it. I don't recall exactly
> >> why, but either it didn't show any meaningful performance improvement
> >> for me or there was some actual downside (that I'm not remembering
> >> right now).
>
> > IIRC the case we were looking at around 989596152 were CPU bound
workloads,
> > rather than latency bound workloads. It'd not be surprising to have
cases
> > where batching LOCKs helps latency, but not CPU bound.
>
> Yeah, perhaps. Anyway my main point is that I don't want to just assume
> this is a win; I want to see some actual performance tests.
>
Here we have some numbers about the Aleksander's patch:
1) Setup script
CREATE DATABASE t1000;
CREATE DATABASE t10000;
CREATE DATABASE t100000;
\c t1000
SELECT format('CREATE TABLE t%s(c1 INTEGER PRIMARY KEY, c2 TEXT, c3
TIMESTAMPTZ);', i) FROM generate_series(1, 1000) AS i \gexec
\c t10000
SELECT format('CREATE TABLE t%s(c1 INTEGER PRIMARY KEY, c2 TEXT, c3
TIMESTAMPTZ);', i) FROM generate_series(1, 10000) AS i \gexec
\c t100000
SELECT format('CREATE TABLE t%s(c1 INTEGER PRIMARY KEY, c2 TEXT, c3
TIMESTAMPTZ);', i) FROM generate_series(1, 100000) AS i \gexec
2) Execution script
time pg_dump -s t1000 > /dev/null
time pg_dump -s t10000 > /dev/null
time pg_dump -s t100000 > /dev/null
3) HEAD execution
$ time pg_dump -s t1000 > /dev/null
0.02user 0.01system 0:00.36elapsed 8%CPU (0avgtext+0avgdata
11680maxresident)k
0inputs+0outputs (0major+1883minor)pagefaults 0swaps
$ time pg_dump -s t10000 > /dev/null
0.30user 0.10system 0:05.04elapsed 8%CPU (0avgtext+0avgdata
57772maxresident)k
0inputs+0outputs (0major+14042minor)pagefaults 0swaps
$ time pg_dump -s t100000 > /dev/null
3.42user 2.13system 7:50.09elapsed 1%CPU (0avgtext+0avgdata
517900maxresident)k
0inputs+0outputs (0major+134636minor)pagefaults 0swaps
4) PATCH execution
$ time pg_dump -s t1000 > /dev/null
0.02user 0.00system 0:00.28elapsed 9%CPU (0avgtext+0avgdata
11700maxresident)k
0inputs+0outputs (0major+1886minor)pagefaults 0swaps
$ time pg_dump -s t10000 > /dev/null
0.18user 0.03system 0:02.17elapsed 10%CPU (0avgtext+0avgdata
57592maxresident)k
0inputs+0outputs (0major+14072minor)pagefaults 0swaps
$ time pg_dump -s t100000 > /dev/null
1.97user 0.32system 0:21.39elapsed 10%CPU (0avgtext+0avgdata
517932maxresident)k
0inputs+0outputs (0major+134892minor)pagefaults 0swaps
5) Summary
HEAD patch
1k tables 0:00.36 0:00.28
10k tables 0:05.04 0:02.17
100k tables 7:50.09 0:21.39
Seems we get very good performance gain using Aleksander's patch. I used
the "-s" to not waste time issuing COPY for each relation (even all is
empty) and evidence the difference due to roundtrip for LOCK TABLE. This
patch will also improve the pg_upgrade execution over database with
thousands of relations.
Regards,
--
Fabrízio de Royes Mello
From | Date | Subject | |
---|---|---|---|
Next Message | Andrey Borodin | 2022-12-07 21:30:16 | Re: Transaction timeout |
Previous Message | Tom Lane | 2022-12-07 20:16:16 | Re: Error-safe user functions |