[WIP PATCH v2] Implement "pg_restore --data-only --clean" as a way to skip WAL

From: Dimitrios Apostolou <jimis(at)gmx(dot)net>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: [WIP PATCH v2] Implement "pg_restore --data-only --clean" as a way to skip WAL
Date: 2025-06-10 00:26:28
Message-ID: c61263f2-7472-5dd8-703d-01e683421f61@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 14 Apr 2025, Dimitrios Apostolou wrote:

> Hello list,
>
> I implemented --clean support for --data-only, in order to avoid logging to
> the WAL while populating the database. The attached patch issues a TRUNCATE
> before COPY on each worker process, and provides a significant speed
> advantage if the cluster is configure with wal_level=minimal.
>
> It also provides a safer way to load the database, as avoiding WAL logging
> also avoids potential and painful ENOSPACE on the WAL partition as I
> experienced in [1]. In other words it makes things much better for my use
> case.
>
> [1] https://www.postgresql.org/message-id/flat/076464ad-3d70-dd25-9e8f-e84f27decfba%40gmx.net
>

Rebased and attached v2 of the patch.

It needed some adjustments for the new flags --with-schema and
--with-data.

I have used this patch several times to pg_restore terabytes of tables
without logging through the WAL, and it performs great.

> But it has some rough edges. I would appreciate guidance and feedback.

The rough edges remain: TRUNCATE fails if there are foreign keys. So if
you try pg_restore --data-only --clean to a table referenced via foreign
keys, the patch will not work, as mentioned below.

>
> * When the table-to-be-TRUNCATEd is referenced as foreign key from other
> table, the whole transaction fails with:
>
> ERROR: cannot truncate a table referenced in a foreign key constraint
>
> 1. As a first step, when TRUNCATE fails I want to try a DELETE FROM
> instead, which has more chances of succeeding, and continuing with
> the COPY. How to detect the failure of ahprintf("TRUNCATE") and do
> the alternative without failing the whole transaction?
>
> 2. Why doesn't --disable-triggers help?
> To test this, I have manually issued
>
> ALTER TABLE x DISABLE TRIGGER ALL
>
> to every table and issued manual TRUNCATE still fails. Shouldn't
> postgres skip the referential integrity checks?
>
> 3. In my tests, all my tables start empty since I have just created the
> schema. Then pg_restore --data-only --clean first populates
> the /referencing/ tables, which is allowed because of disabled
> triggers, and then it tries to load the /referenced/ table.
>
> At this point the referential integrity is already broken. Getting an
> error when TRUNCATing the empty /referenced/ table doesn't make
> sense.

So is there a way to turn off the referential checks for a TRUNCATE?
Do you have any other feedback for this patch?

Thanks,
Dimitris

Attachment Content-Type Size
v2-0001-pg_restore-clean-data-only.patch text/x-patch 7.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shinya Kato 2025-06-10 00:43:10 Re: Extend COPY FROM with HEADER <integer> to skip multiple lines
Previous Message Sami Imseih 2025-06-10 00:14:28 Re: add function for creating/attaching hash table in DSM registry