Re: pg_upgrade --copy-file-range

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_upgrade --copy-file-range
Date: 2024-03-22 17:22:29
Message-ID: f54da04c-a64b-45aa-a30d-7a95d7632b8c@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/22/24 17:42, Robert Haas wrote:
> On Fri, Mar 22, 2024 at 10:40 AM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>> There's one question, though. As it stands, there's a bit of asymmetry
>> between handling CopyFile() on WIN32 and the clone/copy_file_range on
>> other platforms). On WIN32, we simply automatically switch to CopyFile
>> automatically, if we don't need to calculate checksum. But with the
>> other methods, error out if the user requests those and we need to
>> calculate the checksum.
>
> That seems completely broken. copy_file() needs to have the ability to
> calculate a checksum if one is required; when one isn't required, it
> can do whatever it likes. So we should always fall back to the
> block-by-block method if we need a checksum. Whatever option the user
> specified should only be applied when we don't need a checksum.
>
> Consider, for example:
>
> pg_basebackup -D sunday -c fast --manifest-checksums=CRC32C
> pg_basebackup -D monday -c fast --manifest-checksums=SHA224
> --incremental sunday/backup_manifest
> pg_combinebackup sunday monday -o tuesday --manifest-checksums=CRC32C --clone
>
> Any files that are copied in their entirety from Sunday's backup can
> be cloned, if we have support for cloning. But any files copied from
> Monday's backup will need to be re-checksummed, since the checksum
> algorithms don't match. With what you're describing, it sounds like
> pg_combinebackup would just fail in this case; I don't think that's
> the behavior that the user is going to want.
>

Right, this will happen:

pg_combinebackup: error: unable to use accelerated copy when manifest
checksums are to be calculated. Use --no-manifest

Are you saying we should just silently override the copy method and do
the copy block by block? I'm not strongly opposed to that, but it feels
wrong to just ignore that the user explicitly requested cloning, and I'm
not sure why should this be different from any other case when the user
requests incompatible combination of options and/or options that are not
supported on the current configuration.

Why not just to tell the user to use the correct parameters, i.e. either
remove --clone or add --no-manifest?

FWIW I now realize it actually fails a bit earlier than I thought - when
parsing the options, not in copy_file. But then some checks (if a given
copy method is supported) happen in the copy functions. I wonder if it'd
be better/possible to do all of that in one place, not sure.

Also, the message only suggests to use --no-manifest. It probably should
suggest removing --clone too.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Japin Li 2024-03-22 17:22:56 Re: Cannot find a working 64-bit integer type on Illumos
Previous Message Robert Haas 2024-03-22 17:17:10 Re: psql not responding to SIGINT upon db reconnection