COPY TO BLACKHOLE / pg_dump -j -Fb

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: COPY TO BLACKHOLE / pg_dump -j -Fb
Date: 2026-05-20 07:16:28
Message-ID: CAKZiRmzorOYZH-CvyN98+Kz3aGdJW5Kj-MirtM0tmjYO3PtASQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi -hackers,

From time to time we hit some corruption issue and usually we end up checking
for corruption with COPY to /dev/null and/or with verify_heapam(). Both seem
to detect different kind of corruption types, so I'm assuming they are somehow
complementary (e.g. seems that COPY is slower in cached case, but exercises
TOAST way harder than the amcheck routing even with check_toast=>true). Also
there's usecase that we often ask people to 'just run pg_dump -j', but that
requires -Fd <dir> which then requires plenty of disk space if you want to
verify whole DB in parallel, you need plenty of space (which is unrealistic on
big installations)

I was thining if we could add COPY <t> TO BLACKHOLE, so we could get rid of
those two limitations / inefficencies. When starting I was hoping for more than
11-15% runtime optimization (see below), but at least it visible and bigger
benefit seems to be coming from being able to do something like:
`pg_dump -j <N> -Fp -f /dev/null`
which today is impossible today due to:
pg_dump: error: parallel backup only supported by the directory format

0002 allows to do: `pg_dump -j <N> -Fb` and generates no output(if no errors)
and takes no space.

-- hot:
postgres=# COPY pgbench_accounts to '/dev/null';
COPY 10000000
Time: 1576.752 ms (00:01.577)
postgres=# COPY pgbench_accounts to '/dev/null';
COPY 10000000
Time: 1539.565 ms (00:01.540)
postgres=# COPY pgbench_accounts to '/dev/null';
COPY 10000000
Time: 1587.900 ms (00:01.588)

postgres=# COPY pgbench_accounts to blackhole;
COPY 10000000
Time: 1365.206 ms (00:01.365)
postgres=# COPY pgbench_accounts to blackhole;
COPY 10000000
Time: 1370.007 ms (00:01.370)
postgres=# COPY pgbench_accounts to blackhole;
COPY 10000000
Time: 1367.661 ms (00:01.368)
postgres=#

so ~1.14x

-- cold (after 3 to drop_caches sysctl + buffercache_evict_all):
postgres=# select * from verify_heapam('pgbench_accounts',
check_toast => true);
[..]
Time: 1747.927 ms (00:01.748)

-- cold (after 3 to drop_caches sysctl + buffercache_evict_all):
postgres=# COPY pgbench_accounts to blackhole;
COPY 10000000
Time: 1429.400 ms (00:01.429)

-- cold (after 3 to drop_caches sysctl + buffercache_evict_all):
postgres=# COPY pgbench_accounts to '/dev/null';
COPY 10000000
Time: 1600.803 ms (00:01.601)

yields ~1.11x

Patch attached, no docs yet there, as I'm not sure community finds it useful.

-J.

Attachment Content-Type Size
v1-0001-Add-COPY-TO-BLACKHOLE.patch text/x-patch 10.9 KB
v1-0002-Add-BLACKHOLE-destination-format-in-pg_dump-Fb.patch text/x-patch 11.1 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2026-05-20 07:21:08 Set notice receiver before libpq connection startup
Previous Message Nisha Moond 2026-05-20 07:09:46 Re: PSQL - prevent describe listing tables that are already in listed schemas