| From: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com> |
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | COPY TO BLACKHOLE / pg_dump -j -Fb |
| Date: | 2026-05-20 07:16:28 |
| Message-ID: | CAKZiRmzorOYZH-CvyN98+Kz3aGdJW5Kj-MirtM0tmjYO3PtASQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi -hackers,
From time to time we hit some corruption issue and usually we end up checking
for corruption with COPY to /dev/null and/or with verify_heapam(). Both seem
to detect different kind of corruption types, so I'm assuming they are somehow
complementary (e.g. seems that COPY is slower in cached case, but exercises
TOAST way harder than the amcheck routing even with check_toast=>true). Also
there's usecase that we often ask people to 'just run pg_dump -j', but that
requires -Fd <dir> which then requires plenty of disk space if you want to
verify whole DB in parallel, you need plenty of space (which is unrealistic on
big installations)
I was thining if we could add COPY <t> TO BLACKHOLE, so we could get rid of
those two limitations / inefficencies. When starting I was hoping for more than
11-15% runtime optimization (see below), but at least it visible and bigger
benefit seems to be coming from being able to do something like:
`pg_dump -j <N> -Fp -f /dev/null`
which today is impossible today due to:
pg_dump: error: parallel backup only supported by the directory format
0002 allows to do: `pg_dump -j <N> -Fb` and generates no output(if no errors)
and takes no space.
-- hot:
postgres=# COPY pgbench_accounts to '/dev/null';
COPY 10000000
Time: 1576.752 ms (00:01.577)
postgres=# COPY pgbench_accounts to '/dev/null';
COPY 10000000
Time: 1539.565 ms (00:01.540)
postgres=# COPY pgbench_accounts to '/dev/null';
COPY 10000000
Time: 1587.900 ms (00:01.588)
postgres=# COPY pgbench_accounts to blackhole;
COPY 10000000
Time: 1365.206 ms (00:01.365)
postgres=# COPY pgbench_accounts to blackhole;
COPY 10000000
Time: 1370.007 ms (00:01.370)
postgres=# COPY pgbench_accounts to blackhole;
COPY 10000000
Time: 1367.661 ms (00:01.368)
postgres=#
so ~1.14x
-- cold (after 3 to drop_caches sysctl + buffercache_evict_all):
postgres=# select * from verify_heapam('pgbench_accounts',
check_toast => true);
[..]
Time: 1747.927 ms (00:01.748)
-- cold (after 3 to drop_caches sysctl + buffercache_evict_all):
postgres=# COPY pgbench_accounts to blackhole;
COPY 10000000
Time: 1429.400 ms (00:01.429)
-- cold (after 3 to drop_caches sysctl + buffercache_evict_all):
postgres=# COPY pgbench_accounts to '/dev/null';
COPY 10000000
Time: 1600.803 ms (00:01.601)
yields ~1.11x
Patch attached, no docs yet there, as I'm not sure community finds it useful.
-J.
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Add-COPY-TO-BLACKHOLE.patch | text/x-patch | 10.9 KB |
| v1-0002-Add-BLACKHOLE-destination-format-in-pg_dump-Fb.patch | text/x-patch | 11.1 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chao Li | 2026-05-20 07:21:08 | Set notice receiver before libpq connection startup |
| Previous Message | Nisha Moond | 2026-05-20 07:09:46 | Re: PSQL - prevent describe listing tables that are already in listed schemas |