Re: Add system identifier to backup manifest

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add system identifier to backup manifest
Date: 2024-02-14 06:59:07
Message-ID: CAAJ_b95tfUo9CyCX-=0V1L=VNzzvcf4dGLT6ae7=kz2ELy2Edg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 2, 2024 at 12:03 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Thu, Feb 1, 2024 at 2:18 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> > I intended to minimize the out param of parse_manifest_file(), which
> currently
> > returns manifest_files_hash and manifest_wal_range, and I need two more
> --
> > manifest versions and the system identifier.
>
> Sure, but you could do context.ht = manifest_data->files instead of
> context.manifest = manifest_data. The question isn't whether you
> should return the whole manifest_data from parse_manifest_file -- I
> agree with that decision -- but rather whether you should feed the
> whole thing through into the context, or just the file hash.
>
> > Yeah, we can do that, but I think it is a bit inefficient to have
> strcmp()
> > check for the pg_control file on each verify_backup_file() call,
> despite, we
> > know that path. Also, I think, we need additional handling to ensure
> that the
> > system identifier has been verified in verify_backup_file(), what if the
> > pg_control file itself missing from the backup -- might be a rare case,
> but
> > possible.
> >
> > For now, we can do the system identifier validation after
> > verify_backup_directory().
>
> Yes, that's another option, but I don't think it's as good.
>
> Suppose you do it that way. Then what will happen when the file is
> altogether missing or inaccessible? I think verify_backup_file() will
> complain, and then you'll have to do something ugly to avoid having
> verify_system_identifier() emit the same complaint all over again.
> Remember, unless you find some complicated way of passing data around,
> it won't know whether verify_backup_file() emitted a warning or not --
> it can redo the stat() and see what happens, but it's not absolutely
> guaranteed to be the same as what happened before. Making sure that
> you always emit any given complaint once rather than twice or zero
> times is going to be tricky.
>
> It seems more natural to me to just accept the (small) cost of a
> strcmp() in verify_backup_file(). If the initial stat() fails, it
> emits whatever complaint is appropriate and returns and the logic to
> check the system identifier is never reached. If it succeeds, you can
> proceed to try to open the file and do what you need to do.
>

Ok, I did that way in the attached version, I have passed the control file's
full path as a second argument to verify_system_identifier() what we gets in
verify_backup_file(), but that is not doing any useful things with it,
since we
were using get_controlfile() to open the control file, which takes the
directory as an input and computes the full path on its own.

Regards,
Amul

Attachment Content-Type Size
v6-0002-Add-system-identifier-to-backup-manifest.patch application/octet-stream 29.9 KB
v6-0001-pg_verifybackup-code-refactor.patch application/octet-stream 8.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-02-14 07:10:14 Re: index prefetching
Previous Message Michael Paquier 2024-02-14 06:52:36 Re: Make COPY format extendable: Extract COPY TO format implementations